Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stack overflow during btree iteration #90

Merged
merged 4 commits into from Aug 31, 2022

Conversation

i1i1
Copy link
Contributor

@i1i1 i1i1 commented Aug 29, 2022

Migrating from rocksdb to paritydb (here) I discovered that stack randomly overflows during BTreeIterator::{next, prev} calls.

Seems like rust compiler failed to optimize some recursive functions with no state with TCO (even in release mode). Looks like as all optimizations, TCO is also eventual and is not guaranteed in any way. More on the history of it in the compiler here.

So this pr basically rewrites recursive functions using loops so that we would guarantee that we will reuse memory and won't allocate stack frame.

Better to review pr commit by commit, and for the first one it would be helpful to ignore indentations.

@arkpar arkpar requested a review from cheme August 30, 2022 10:53
Copy link
Collaborator

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a branch which implements prev iterator differently (but forgot to do a PR for it), I wonder if tail recursion work on it master...cheme:parity-db:last_state_iterator (the branch avoid seek when changing direction and remove some code redundancy).

But regarding this PR, looks good to me, except the missing line in comment (I did not check code logic, just ensure the code path is the same).

More generally, I prefer not having the rec call, but I'll be interested to know under which condition the stack overflow happen, certainly not the depth of the btree, there may be something else here.

src/btree/node.rs Show resolved Hide resolved
@nazar-pc
Copy link
Contributor

@i1i1 can you run it under gdb to catch backtrace when stack overflow happens?

@i1i1
Copy link
Contributor Author

i1i1 commented Aug 30, 2022

@cheme Here is the backtrace.
You can reproduce it on this commit with following command:

$ cargo run --release -p subspace-farmer -- \
    --base-path some-path-with-empty-3G-of-space \
    bench \
      --plot-size 3G \
      --write-pieces-size 6G

CC: @nazar-pc

@i1i1 i1i1 requested a review from cheme August 30, 2022 13:25
loop {
// Lock log over function call (no btree struct change).
let commit_overlay = self.commit_overlay.read();
self.from_seek |= self.direction == IterDirection::Forward;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I think may have happen is that we touch a removed value in commit overlay, and we did change direction, so even if on recursive call (former line 156) there is a self.from_seek = false it is set again here due to direction switch and then we loop more than 3 thousand time on next function, probably exit the loop when commit value get flushed.
So before recursive call direction would need to be updated.
With current PR, we will be having some very long call to next in this case.
So this needs fixing.

An alternative to fixing these direction would be to try to use my branch I did mention, but that is more review work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review code, but feel free to open pr with it and adding me as a reviewer!

Tested that branch, and it also exits with stack overflow

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested that branch, and it also exits with stack overflow

By any chance, do you have the stack trace? (I did not read the same issue).

Realize I am wrong in my earlier statement since direction should have been updated in call to next_backend.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok got it, your process did probably remove something like more than 3000 values, and then the iterator goes over the commit map in a recursive way.
So fixing the way you did is correct.

@cheme
Copy link
Collaborator

cheme commented Aug 30, 2022

@i1i1 thanks for the stacktrace and the PR. Small question what is the size (in number for keys) of your column ? (One of my first implementation for the btree index did use a dedicated file for nodes which we replace by just using table values entry to simplify a bit as perf was not the first concern, but maybe I should try to resurrect the branch IIRC it is an easy perf gain and not that much code in fact).

@i1i1
Copy link
Contributor Author

i1i1 commented Aug 30, 2022

Small question what is the size (in number for keys) of your column?

In that case, it was around 3000 keys. Scenario was as follows:

  1. At first, we insert 3000 key values in batches of 256 I believe
  2. Then we do those things in a loop:
    1. try to find N keys starting from the beginning of the db and N keys from the end
    1. remove those keys and add some new ones

So everything blows at iterating over db as we hit some case when we need to recur after removing key in commit overlay

@i1i1
Copy link
Contributor Author

i1i1 commented Aug 31, 2022

@cheme can you merge it please? Or are we waiting for something?

@cheme
Copy link
Collaborator

cheme commented Aug 31, 2022

I think it's good. @arkpar I am merging let me know if anything off.

@cheme cheme merged commit d324a26 into paritytech:master Aug 31, 2022
@i1i1 i1i1 deleted the fix-stack-overflow branch August 31, 2022 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants