New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use segregated hmap to boost the freelist allocate and release performace #138
Conversation
@xiang90 ptal |
@WIZARD-CXY Interesting! Could you share how you measure the performance improvement? |
} | ||
|
||
// allocate returns the starting page id of a contiguous list of pages of a given size. | ||
// If a contiguous block cannot be found then 0 is returned. | ||
func (f *freelist) allocate(txid txid, n int) pgid { | ||
if len(f.ids) == 0 { | ||
if n == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to add this special case handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, common use case we don't request a 0-size page, this is only wanted by unit test in allocating
f.allocate(1, 0)
if x := f.free_count(); x != 2 {
t.Fatalf("exp=2; got=%v", x)
}
@mitake we use bboltdb in etcd. In the original approach, when we have a very large db(~50GB), and do the put op(~5000op/s), we find the db spill time is very large(~8s), the put latency is incredibly large too. We look into the code, add some debugging info and find that freelist allocation is the bottleneck.It is because when the freelist size is large and its inner fragmentation is very common. The original algorithm in |
@mitake boltdb/bolt#640 this is the same case |
63549ff
to
b685ae3
Compare
@WIZARD-CXY Let us make this an option instead of entirely replacing the array based freelist implementation. Probably add an option called FreelistType: (array/seglist) |
/cc @jpbetz The original freelist is managed by an array. Both allocate/release operation can be O(N). After a bulk deletion (in etcd case, a compaction), the key put can take up to 8 seconds when the freelist size reaches O(1,000,000). This PR uses a seglist approach to solve the problem like traditional memory allocator does. |
working on it |
0601199
to
14455c5
Compare
342fae7
to
fbbf463
Compare
@WIZARD-CXY Can you create a new PR? It can be cleaner. |
@WIZARD-CXY thanks for sharing the detail and sorry for my delayed reply. It's really interesting. Do you mean you already have etcd cluster with 50GB of data? Then are you seeing availability problem comes from the large snapshot? |
@mitake yeah, we stored many data in etcd since we have large clusters |
In this pr, I use segregated bitmap to replace the original freeids allocating and releasing approach.
It is much faster than the original version, especially when the db size is large or the fragmentation in the db is large, we can gain 1000x faster performance.