Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use segregated hmap to boost the freelist allocate and release performace #138

Closed
wants to merge 1 commit into from

Conversation

WIZARD-CXY
Copy link
Contributor

In this pr, I use segregated bitmap to replace the original freeids allocating and releasing approach.
It is much faster than the original version, especially when the db size is large or the fragmentation in the db is large, we can gain 1000x faster performance.

@WIZARD-CXY
Copy link
Contributor Author

@xiang90 ptal

freelist.go Outdated Show resolved Hide resolved
@WIZARD-CXY WIZARD-CXY changed the title use segregated bitmap to boost the freelist allocate and release performace use segregated hmap to boost the freelist allocate and release performace Jan 15, 2019
@WIZARD-CXY
Copy link
Contributor Author

@benbjohnson

@mitake
Copy link

mitake commented Jan 15, 2019

@WIZARD-CXY Interesting! Could you share how you measure the performance improvement?

freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
}

// allocate returns the starting page id of a contiguous list of pages of a given size.
// If a contiguous block cannot be found then 0 is returned.
func (f *freelist) allocate(txid txid, n int) pgid {
if len(f.ids) == 0 {
if n == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to add this special case handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, common use case we don't request a 0-size page, this is only wanted by unit test in allocating

f.allocate(1, 0)
	if x := f.free_count(); x != 2 {
		t.Fatalf("exp=2; got=%v", x)
	}

freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Show resolved Hide resolved
@WIZARD-CXY
Copy link
Contributor Author

WIZARD-CXY commented Jan 16, 2019

@mitake we use bboltdb in etcd. In the original approach, when we have a very large db(~50GB), and do the put op(~5000op/s), we find the db spill time is very large(~8s), the put latency is incredibly large too. We look into the code, add some debugging info and find that freelist allocation is the bottleneck.It is because when the freelist size is large and its inner fragmentation is very common. The original algorithm in allocate tries so hard to get a start page id while in my approach we use a hash map to hold the same size of starting pgids. So the allocation is super fast(nearly O(1)) and the release performance is also boosted too(O(1), removed the original page id sort operation)

@WIZARD-CXY
Copy link
Contributor Author

@mitake boltdb/bolt#640 this is the same case

@xiang90
Copy link
Contributor

xiang90 commented Jan 16, 2019

@WIZARD-CXY Let us make this an option instead of entirely replacing the array based freelist implementation.

Probably add an option called FreelistType: (array/seglist)

freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
@xiang90
Copy link
Contributor

xiang90 commented Jan 16, 2019

/cc @jpbetz

The original freelist is managed by an array. Both allocate/release operation can be O(N). After a bulk deletion (in etcd case, a compaction), the key put can take up to 8 seconds when the freelist size reaches O(1,000,000).

This PR uses a seglist approach to solve the problem like traditional memory allocator does.

@WIZARD-CXY
Copy link
Contributor Author

WIZARD-CXY commented Jan 16, 2019

@WIZARD-CXY Let us make this an option instead of entirely replacing the array based freelist implementation.Probably add an option called FreelistType: (array/seglist)

working on it

@WIZARD-CXY
Copy link
Contributor Author

@mitake @xiang90 @hormes petal add a commit to add an option of using the new approach.

db.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
freelist.go Outdated Show resolved Hide resolved
allocate_test.go Outdated Show resolved Hide resolved
allocate_test.go Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
@xiang90
Copy link
Contributor

xiang90 commented Jan 22, 2019

@WIZARD-CXY Can you create a new PR? It can be cleaner.

@WIZARD-CXY WIZARD-CXY closed this Jan 22, 2019
@mitake
Copy link

mitake commented Jan 22, 2019

@WIZARD-CXY thanks for sharing the detail and sorry for my delayed reply. It's really interesting. Do you mean you already have etcd cluster with 50GB of data? Then are you seeing availability problem comes from the large snapshot?
I'm asking for understanding your use case and for curious: I heard alibaba has very large clusters. Is this related that?

@WIZARD-CXY
Copy link
Contributor Author

@mitake yeah, we stored many data in etcd since we have large clusters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants