Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small patch to allow resetting the btree. #20

Closed
wants to merge 1 commit into from

Conversation

cstockton
Copy link

I am serializing snapshots of the btree to be restored later. For saving I only need a read lock, for restoring I obtain a write lock on the tree. I was able to amortize all the costs of serialization and cheapen the restore process with a generous free list capacity. For larger restores the process of removing all the items from an existing tree is taxing. You have to visit N nodes to store in a temporary list (I cached this and reset[0:0] each restore), then iterate the temporary list to make N calls to delete.

I tried some other possibility such as creating an entirely new tree and replace the existing one, but without first deleting all the items in the tree my FreeList is empty so the allocations kill me instead. I could probably work around this with rotating between two trees and deleting in the background, but this would be much simpler if accepted. It provides around 2.5-3x speedup for removing all items from a btree and lowers my latency below the downstream tick rate.

Small note: I included "Inserts" in the benchmarks because I was running into that issue that pops up sometimes with the benchmark timers where it runs forever on the Reset() test otherwise and I nothing I tried fixed it.

BenchmarkDeleteAll-24 100 11680090 ns/op 523832 B/op 709 allocs/op
BenchmarkReset-24 300 6648212 ns/op 367997 B/op 723 allocs/op

This patch provides around a 3 time speed up for my restore process

@gconnell
Copy link
Collaborator

Thanks for the code!

I'm somewhat worried that this is overly complex (and slow), especially in cases where the tree is large. I guess I'm wondering why someone would do tree.Reset() instead of tree = new BTree()?

I'm not opposed to adding a reset function at all, but I think if we did, we might just want do it without repopulating the freelist (or fill the freelist separately, without traversing the original btree).

Also, in the current code, I'm somewhat confused as to why root.reset() is called, then freeNode is called on it and its children individually (in the BTree.Reset function). Wouldn't root.reset() call freeNode on the root and its children already?

How about this for Reset code?

func (t *BTree) Reset() {
  t.root, t.lenght = nil, 0
}

@cstockton
Copy link
Author

I'm somewhat worried that this is overly complex (and slow), especially in cases where the tree is large. I guess I'm wondering why someone would do tree.Reset() instead of tree = new BTree()?

Do you have suggestions on how to reset the tree in a less complex and faster way? I don't know how I could achieve either of those two while still truly being a Reset() and not just a dump to the GC. I gave a reason why I am doing this in detail in my first post, so you can refer to that on the use case. I'm open to a better way, but I haven't found one.

Creating a new tree means the old tree has to live in memory while the new one is created. This means that the FreeList is not leveraged not only for the btree internal state, but also the user space metadata associated with whatever their implementation of Item is. Below is an example of the performance difference in using a Reset vs New().

 BenchmarkRestore/New-24         	     100	  12087504 ns/op	  587689 B/op	    1147 allocs/op
BenchmarkRestore/NewWithFreeList-24         	     100	  12587314 ns/op	  590343 B/op	    1147 allocs/op
BenchmarkRestore/Reset-24                   	     200	   5448843 ns/op	    1387 B/op	       3 allocs/op
func BenchmarkRestore(b *testing.B) {
	items := perm(16392)
	b.ResetTimer()
	b.Run(`New`, func(b *testing.B) {
		tr := New(*btreeDegree)
		for _, v := range items {
			tr.ReplaceOrInsert(v)
		}
		b.ReportAllocs()
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			cpy := New(*btreeDegree)
			for _, v := range items {
				cpy.ReplaceOrInsert(v)
			}
		}
	})
	b.Run(`NewWithFreeList`, func(b *testing.B) {
		fl := NewFreeList(16392)
		tr := NewWithFreeList(*btreeDegree, fl)
		for _, v := range items {
			tr.ReplaceOrInsert(v)
		}
		b.ReportAllocs()
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			dels := make([]Item, 0, tr.Len())
			tr.Ascend(ItemIterator(func(b Item) bool {
				dels = append(dels, b)
				return true
			}))
			for _, del := range dels {
				tr.Delete(del)
			}

			cpy := New(*btreeDegree)
			for _, v := range items {
				cpy.ReplaceOrInsert(v)
			}
		}
	})
	b.Run(`Reset`, func(b *testing.B) {
		fl := NewFreeList(16392)
		tr := NewWithFreeList(*btreeDegree, fl)
		for _, v := range items {
			tr.ReplaceOrInsert(v)
		}
		b.ReportAllocs()
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			tr.Reset()
			for _, v := range items {
				tr.ReplaceOrInsert(v)
			}
		}
	})
}

Also, in the current code, I'm somewhat confused as to why root.reset() is called, then freeNode is called on it and its children individually (in the BTree.Reset function). Wouldn't root.reset() call freeNode on the root and its children already?

How about this for Reset code?

I don't know of an algorithm that would allow each node to remove it's own children without adding transient state or traversal complexity. This method keeps complexity as low as possible with the caveat that each parent must free it's children and itself. BTree is the parent to root, so it has to free those children or they will leak.

@gconnell
Copy link
Collaborator

Note that there's a few bugs in your benchmark:

  1. in NewWithFreeList, you create a cpy tree every time, but never delete from it. The first loop iteration empties your freelist, and all subsequent iterations are forced to create new nodes from scratch.
  2. also in NewWithFreeList, your first iteration deletes all elements from tr, and they're never replaced, so subsequent iterations get an empty list to "reset"

With those bugs fixed, though, your original point stands: deleting elements individually is slower than doing it with reset(). There aren't additional allocs in NewWithFreeList (it does what it's supposed to), but there is the cost of finding each element, deleting it individually, and updating other tree nodes based on the update.

I've created an alternative PR at #21 which simplifies the reset methods, breaks out early if the freelist is full, and adds many more comments on actual runtime. I'd really be interested in any thoughts you might have on it as a different approach. Even with that simplification, though, I do worry that this is overly complex for the simple act of throwing away a tree... looking at the 4 different runtimes based on 4 different criteria, the potential of saving a bit of time due to freelist vs. allocs seems overwhelmed by coder complexity. Considering your use case in particular (serializing/deserializing a tree), would the actual in-process time creating the new tree be drowned out by the cost of reading the serialization from persistent storage anyway, making this optimization unnecessary? I worry that we're optimizing prematurely based on microbenchmarks instead of real-world use-cases.

In short, I'm somewhat tempted to keep Reset/Clear out of btree's functionality, and just stick with:

tr := btree.New(...)
// add stuff
// now to clear the tree:
tr = btree.New(...)
// and we're good

Yeah, there's a few more allocs in that case, but only N/degree of them.

@cstockton
Copy link
Author

The benchmarks were just an attempt to better illustrate, I think 1 is fair but 2 isn't realistic for many real world applications that wrap the tree in a rwmutex. For me I am restoring a snapshot of []byte encoded btree state, so I have a few options:

  1. You obtain write exclusion each delete through your entry wrapped Delete method causing heavy reader contention and requiring N locks.
  2. Hold write lock blocking all readers for the duration of the copy. This is going to vary across app (the cost of creating/copying/moving the tree items).
  3. Hold reader lock to block writers only (much less frequent in my case) and restore the new tree while the old one stays in tact. This is what the second test emulates and what provided me the best performance.

Your pull request looks great and I think it would be a good improvement. To be clear though this did stem from a real world bottleneck and not benchmarks, I tried a lot of methods using Clone(), different variations of the 3 things I listed above. Short of adding a lot more complexity to an already complex area of my code (snapshots) I'm not sure I can get the right trade offs to keep the pauses low enough. I understand if you are hesitant to add this in though, I can always run a small fork of the tree I've been using it for a long while (first #1 !) and am fine with maintaining a fork while you think about this for a while or omit it all together. Thanks!

@cstockton
Copy link
Author

Just noticed a patch covering this was merged, thanks a lot, closing it.

@cstockton cstockton closed this Jun 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants