New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
giant memory usage overhead on leader when recovering progress of dead follower(40x) #2662
Comments
I think this should be more clear... The reason is that leader is keeping a pointer to the old snapshot for the follower, since our snapshot is not MVCC. It is not caused by the death of the follower, but caused by the snapshot sending. |
@yichengq Also I cannot reproduce this. If I put 200k 128-byte into 3 member etcd, the memory usage would be more than 500MB (not 300MB). Moreover, it is unclear why I need to
(I understand you want to trigger a snapshot, but it is unclear to people who is not familiar with the internal) |
@xiang90 That one is restarted from a 3-member cluster with 200k keys in the store. Another instruction that bootstraps a brand-new cluster:
|
@yichengq Can you update the issue to
This is more clear. |
We might be affecting this as one of our primary etcd instances take around 2 GiB of RES memory when the actual amount of stored data is just about 100 KiB. |
@garo Could you share anything special that happenes, the log of etcd instance and the size of snapshot under $data_dir/member/snap with us? |
@yichengq The snap dir size is 24 MiB. I was using 2.0.5 previously and I've just upgraded to 2.0.9 to see if the problem persist so I don't have the logs anymore. Will update later. |
@garo Our testing is done at master branch, and it is 40x. We are wondering how it performs on 2.0.9 too and i will measure/confirm the number tomorrow. |
@heyitsanthony Probably we can test this for v3 storage again? I assume the memory pressure for leader would be much lower as we do not do any copy while sending snapshot. |
Ran the benchmark on 3 nodes with V3. Initialization: 18MB RSS Overall, looks good. |
@yichengq @heyitsanthony Closing this since v3 looks good and we are not going to improve v2 store. |
It brings more memory pressure than #2657 (20x)
The overhead is around 40x
The text was updated successfully, but these errors were encountered: