New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix stack overflow during btree iteration #90
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a branch which implements prev iterator differently (but forgot to do a PR for it), I wonder if tail recursion work on it master...cheme:parity-db:last_state_iterator (the branch avoid seek when changing direction and remove some code redundancy).
But regarding this PR, looks good to me, except the missing line in comment (I did not check code logic, just ensure the code path is the same).
More generally, I prefer not having the rec call, but I'll be interested to know under which condition the stack overflow happen, certainly not the depth of the btree, there may be something else here.
@i1i1 can you run it under gdb to catch backtrace when stack overflow happens? |
loop { | ||
// Lock log over function call (no btree struct change). | ||
let commit_overlay = self.commit_overlay.read(); | ||
self.from_seek |= self.direction == IterDirection::Forward; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I think may have happen is that we touch a removed value in commit overlay, and we did change direction, so even if on recursive call (former line 156) there is a self.from_seek = false
it is set again here due to direction switch and then we loop more than 3 thousand time on next function, probably exit the loop when commit value get flushed.
So before recursive call direction would need to be updated.
With current PR, we will be having some very long call to next in this case.
So this needs fixing.
An alternative to fixing these direction would be to try to use my branch I did mention, but that is more review work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't review code, but feel free to open pr with it and adding me as a reviewer!
Tested that branch, and it also exits with stack overflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested that branch, and it also exits with stack overflow
By any chance, do you have the stack trace? (I did not read the same issue).
Realize I am wrong in my earlier statement since direction should have been updated in call to next_backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok got it, your process did probably remove something like more than 3000 values, and then the iterator goes over the commit map in a recursive way.
So fixing the way you did is correct.
@i1i1 thanks for the stacktrace and the PR. Small question what is the size (in number for keys) of your column ? (One of my first implementation for the btree index did use a dedicated file for nodes which we replace by just using table values entry to simplify a bit as perf was not the first concern, but maybe I should try to resurrect the branch IIRC it is an easy perf gain and not that much code in fact). |
In that case, it was around 3000 keys. Scenario was as follows:
So everything blows at iterating over db as we hit some case when we need to recur after removing key in commit overlay |
@cheme can you merge it please? Or are we waiting for something? |
I think it's good. @arkpar I am merging let me know if anything off. |
Migrating from rocksdb to paritydb (here) I discovered that stack randomly overflows during
BTreeIterator::{next, prev}
calls.Seems like rust compiler failed to optimize some recursive functions with no state with TCO (even in
release
mode). Looks like as all optimizations, TCO is also eventual and is not guaranteed in any way. More on the history of it in the compiler here.So this pr basically rewrites recursive functions using loops so that we would guarantee that we will reuse memory and won't allocate stack frame.
Better to review pr commit by commit, and for the first one it would be helpful to ignore indentations.