Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proto.Size takes up to 30% memory usage in excesively large range requests #12835

Closed
chaochn47 opened this issue Apr 6, 2021 · 7 comments · Fixed by #12871
Closed

proto.Size takes up to 30% memory usage in excesively large range requests #12835

chaochn47 opened this issue Apr 6, 2021 · 7 comments · Fixed by #12871
Assignees
Milestone

Comments

@chaochn47
Copy link
Member

chaochn47 commented Apr 6, 2021

It can be reproduced by listing 3000 kubernetes pods across all namespaces with 50 concurrency which takes up to 70% of 8GiB RAM

The following is the heap_alloc profiling data

leader_heap_alloc

Looks like proto.Size takes up to 30% memory usage in excesively large range requests
Personally I think it may reduce some cost on RAM and reduce the possibility of etcd out of memory.

After making a custom patch to remove the proto.Size call in warn logging.

func warnOfExpensiveReadOnlyRangeRequest(lg *zap.Logger, now time.Time, reqStringer fmt.Stringer, rangeResponse *pb.RangeResponse, err error) {
	var resp string
	if !isNil(rangeResponse) {
		// resp = fmt.Sprintf("range_response_count:%d size:%d", len(rangeResponse.Kvs), proto.Size(rangeResponse))
		resp = fmt.Sprintf("range_response_count:%d size:%s", len(rangeResponse.Kvs), "TBD")
	}
	warnOfExpensiveGenericRequest(lg, now, reqStringer, "read-only range ", resp, err)
}

We do see overall 7% memory usage drop, why it's not 30% theoretically is the UnMarshal in MVCC layer claims more memory than last time.

Screen Shot 2021-04-06 at 10 48 22 AM

2021-04-04-i-09b638bcd740e26da-heap-alloc-leader1

Can we get some insights from grpc/etcd experts to explain this behavior and what's the next step? @gyuho

@ptabor
Copy link
Contributor

ptabor commented Apr 7, 2021

  1. It's interesting that proto.Size() is performing full marshalling process to compute the size. Somehow similar to discussion we had in raft: disable XXX_NoUnkeyedLiteral, XXX_unrecognized, and XXX_sizecache fields in protos #12790 about XXX_sizecache in RAFT.

  2. In general the data should get paginated or streamed (see Yxjetcd rangestream #12343) to reduce memory footprint on etcd.

  3. heap_alloc is tracking where the memory is allocated. If the memory is deallocated quickly, its not contributing to the overall heap usage. So turning off response.Size() reduced some temporary 'buffer' memory but not the overall Range request payload that is dominating here. I'm rather wonder what's the impact of 'Size()' call for the overall server throughput.

@ptabor
Copy link
Contributor

ptabor commented Apr 7, 2021

BTW: Please verify what happens if you call rangeResponse.Size() instead.

The implementation seems to not require 'marshalling' to compute the size:

func (m *Request) Size() (n int) {

@chaochn47
Copy link
Member Author

Will come back to the thread after getting the results.

@gyuho
Copy link
Contributor

gyuho commented Apr 8, 2021

Could be related #12842, in general for etcd memory usage.

@gyuho gyuho added this to the etcd-v3.5 milestone Apr 13, 2021
@chaochn47
Copy link
Member Author

chaochn47 commented Apr 16, 2021

Here is the before and after comparision regarding proto.Size() to rangeResponse.Size()

We only set up a single etcd node cluster with 8 RAM, 2vCPU with 2 kube-apiserver. This helps us be 100% sure all the client requests landing on the same etcd instance.

  1. mem_used_percent reduced to half while CPU is roughly the same

Screen Shot 2021-04-16 at 1 55 52 AM

  1. kube-apiserver list pods call pattern is similar, it's unpaginated and across all kubernetes namespaces

Screen Shot 2021-04-16 at 1 38 31 AM

@chaochn47
Copy link
Member Author

chaochn47 commented Apr 16, 2021

However, I was not turning off the madvise on linux, so the RSS mem_percent_used may not be accurate, will return back when get the result after GODEBUG=madvdontneed=1

ref:
[1] golang/go#42330
[2] golang/go#33376

@ptabor
Copy link
Contributor

ptabor commented Apr 16, 2021

But I think we are sufficiently confident that proto.Size() -> rangeResponse.Size() [and similar] is a good move to justify a PR.
Thank you for making this experiments and proposing this improvement.

chaochn47 added a commit to chaochn47/etcd that referenced this issue Apr 16, 2021
chaochn47 added a commit to chaochn47/etcd that referenced this issue Apr 16, 2021
xiang90 pushed a commit that referenced this issue Apr 16, 2021
* etcdserver/util.go: reduce memory when logging range requests

Fixes #12835

* Update CHANGELOG-3.5.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants