-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Increase quota-backend-bytes default to 8GB #9771
Comments
Interesting. How did we measure restore latencies? |
What is your memory size? When dB size grows beyond the memory size, the perf will decrease significantly |
A more interesting test is how etcd performs when the free list contains a lot of pages and the write size is small. |
For restore latency, we measured how long |
@xiang90 Good idea. I'll try a test where we do a bunch of puts/deletes with small objects and see what happens at the bbolt layer when we're allocating against a large freelist with high fragmentation. |
@jpbetz also for the restore test, we now are testing how IO layer perform mostly. I would expect the index rebuilding dominates the time when the number of keys grows. |
I think the big jump around 52GB is because of this issue. |
The main motivation to soft-limit database size was to limit mean time to recovery. So, I would also measure how long it takes to rebuild mvcc states on restart (using |
Sounds good. I've added the machine stats to the description. I'll try with a range of object sizes incl. very small to get more data on worst case recovery times. |
I've updated the testing based on the feedback here. The new flow creates a larger number of small objects, many of which get deleted over time and compacted, producing free list entries in bolt as well as putting pressure on the snapshot and restore operations. |
If we randomly write to a 100M keyspace at a rate around 15k/s, the possibility of key overwriting is pretty low. So I am not sure if the compaction actually does anything. |
regardless, I think 16GB is a reasonable goal that we can achieve with some effort in short term. |
Hm. Running time was about 1 day, so about 12 writes per key. Not a lot of compaction or history. I'll drop that down to 1M keys and see what happens the next time we run this. |
@jpbetz I also want to see what happens when the majority of the 16GB db pages are all in its free pages. That is the extreme case. If boltdb can still perform well, then great :P. Or we might need to do some optimization there. |
I'm seeing the similar pattern with small values (100 KB). I also logged growing freelist size (if we keep writing data, the freelist size grows up to 2 GB for 10 GB DB). And it doesn't seem to have much effect (writing large values would slow down quicker, with much less freelist). Tested with 10 GB db file restore, and see most of time spent on rebuilding MVCC storage here Will keep experimenting. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Now that the bbolt freelist no longer persisted, we should consider increasing the etcd default storage limit to 8GB. We could potentially go higher, but this keeps the snapshot/restore operations to roughly 1 minute. We can always increase this further in future releases based on feedback.
Based on the data below:
The benchmark constructed using the following flow:
Debian/linux 4.9.0-amd64
6 core - Intel(R) Xeon(R) @ 3.60GHz
64 GB memory (4x 16GiB DIMM DDR4 2400 MHz)
HDD
cc @gyuho @wenjiaswe
The text was updated successfully, but these errors were encountered: