`proto.Size` takes up to 30% memory usage in excesively large range requests #12835

chaochn47 · 2021-04-06T17:41:03Z

It can be reproduced by listing 3000 kubernetes pods across all namespaces with 50 concurrency which takes up to 70% of 8GiB RAM

The following is the heap_alloc profiling data

Looks like proto.Size takes up to 30% memory usage in excesively large range requests
Personally I think it may reduce some cost on RAM and reduce the possibility of etcd out of memory.

After making a custom patch to remove the proto.Size call in warn logging.

func warnOfExpensiveReadOnlyRangeRequest(lg *zap.Logger, now time.Time, reqStringer fmt.Stringer, rangeResponse *pb.RangeResponse, err error) {
	var resp string
	if !isNil(rangeResponse) {
		// resp = fmt.Sprintf("range_response_count:%d size:%d", len(rangeResponse.Kvs), proto.Size(rangeResponse))
		resp = fmt.Sprintf("range_response_count:%d size:%s", len(rangeResponse.Kvs), "TBD")
	}
	warnOfExpensiveGenericRequest(lg, now, reqStringer, "read-only range ", resp, err)
}

We do see overall 7% memory usage drop, why it's not 30% theoretically is the UnMarshal in MVCC layer claims more memory than last time.

Can we get some insights from grpc/etcd experts to explain this behavior and what's the next step? @gyuho

The text was updated successfully, but these errors were encountered:

ptabor · 2021-04-07T07:56:13Z

It's interesting that proto.Size() is performing full marshalling process to compute the size. Somehow similar to discussion we had in raft: disable XXX_NoUnkeyedLiteral, XXX_unrecognized, and XXX_sizecache fields in protos #12790 about XXX_sizecache in RAFT.
In general the data should get paginated or streamed (see Yxjetcd rangestream #12343) to reduce memory footprint on etcd.
heap_alloc is tracking where the memory is allocated. If the memory is deallocated quickly, its not contributing to the overall heap usage. So turning off response.Size() reduced some temporary 'buffer' memory but not the overall Range request payload that is dominating here. I'm rather wonder what's the impact of 'Size()' call for the overall server throughput.

ptabor · 2021-04-07T08:03:04Z

BTW: Please verify what happens if you call rangeResponse.Size() instead.

The implementation seems to not require 'marshalling' to compute the size:

etcd/api/etcdserverpb/etcdserver.pb.go

Line 337 in 7168409

func (m *Request) Size() (n int) {

chaochn47 · 2021-04-07T18:03:16Z

Will come back to the thread after getting the results.

gyuho · 2021-04-08T18:10:42Z

Could be related #12842, in general for etcd memory usage.

chaochn47 · 2021-04-16T08:59:44Z

Here is the before and after comparision regarding proto.Size() to rangeResponse.Size()

We only set up a single etcd node cluster with 8 RAM, 2vCPU with 2 kube-apiserver. This helps us be 100% sure all the client requests landing on the same etcd instance.

mem_used_percent reduced to half while CPU is roughly the same

kube-apiserver list pods call pattern is similar, it's unpaginated and across all kubernetes namespaces

chaochn47 · 2021-04-16T09:01:52Z

However, I was not turning off the madvise on linux, so the RSS mem_percent_used may not be accurate, will return back when get the result after GODEBUG=madvdontneed=1

ref:
[1] golang/go#42330
[2] golang/go#33376

ptabor · 2021-04-16T09:59:06Z

But I think we are sufficiently confident that proto.Size() -> rangeResponse.Size() [and similar] is a good move to justify a PR.
Thank you for making this experiments and proposing this improvement.

Fixes etcd-io#12835

* etcdserver/util.go: reduce memory when logging range requests Fixes #12835 * Update CHANGELOG-3.5.md

gyuho assigned chaochn47 and gyuho Apr 13, 2021

gyuho added this to the etcd-v3.5 milestone Apr 13, 2021

chaochn47 mentioned this issue Apr 16, 2021

etcdserver/util.go: reduce memory when logging range requests #12871

Merged

chaochn47 added a commit to chaochn47/etcd that referenced this issue Apr 16, 2021

etcdserver/util.go: reduce memory when logging range requests

7e70ef8

Fixes etcd-io#12835

chaochn47 added a commit to chaochn47/etcd that referenced this issue Apr 16, 2021

etcdserver/util.go: reduce memory when logging range requests

9ea6dd2

Fixes etcd-io#12835

xiang90 closed this as completed in #12871 Apr 16, 2021

xiang90 pushed a commit that referenced this issue Apr 16, 2021

etcdserver/util.go: reduce memory when logging range requests (#12871)

80586c5

* etcdserver/util.go: reduce memory when logging range requests Fixes #12835 * Update CHANGELOG-3.5.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`proto.Size` takes up to 30% memory usage in excesively large range requests #12835

`proto.Size` takes up to 30% memory usage in excesively large range requests #12835

chaochn47 commented Apr 6, 2021 •

edited

ptabor commented Apr 7, 2021

ptabor commented Apr 7, 2021

chaochn47 commented Apr 7, 2021

gyuho commented Apr 8, 2021 •

edited

chaochn47 commented Apr 16, 2021 •

edited

chaochn47 commented Apr 16, 2021 •

edited

ptabor commented Apr 16, 2021

proto.Size takes up to 30% memory usage in excesively large range requests #12835

proto.Size takes up to 30% memory usage in excesively large range requests #12835

Comments

chaochn47 commented Apr 6, 2021 • edited

ptabor commented Apr 7, 2021

ptabor commented Apr 7, 2021

chaochn47 commented Apr 7, 2021

gyuho commented Apr 8, 2021 • edited

chaochn47 commented Apr 16, 2021 • edited

chaochn47 commented Apr 16, 2021 • edited

ptabor commented Apr 16, 2021

`proto.Size` takes up to 30% memory usage in excesively large range requests #12835

`proto.Size` takes up to 30% memory usage in excesively large range requests #12835

chaochn47 commented Apr 6, 2021 •

edited

gyuho commented Apr 8, 2021 •

edited

chaochn47 commented Apr 16, 2021 •

edited

chaochn47 commented Apr 16, 2021 •

edited