Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing shadows from BucketList #2175

Closed
marta-lokhova opened this issue Jun 29, 2019 · 1 comment
Closed

Consider removing shadows from BucketList #2175

marta-lokhova opened this issue Jun 29, 2019 · 1 comment
Assignees
Labels

Comments

@marta-lokhova
Copy link
Contributor

BucketList is a complex beast. Shadows are a part of them, and are used for merges. The basic idea is that if a ledger entry is constantly updated, shadows elide those updates into one, preventing redundant entries being spilled to lower levels. While this makes sense, I think the bucketlist will be much simpler (and merges will be faster) if we remove shadows. I'll base my argument around the fact that shadows come at a high cost, but are most useful in a narrow range of scenarios. Additionally, i think its usefulness degrades at lower levels.

The cost

For each level i, we store curr and snap for every level = i - 2 and higher. A glimpse of production data shows that shadows can be ~100-200MBs in size, and will keep growing at the ledger grows. Additionally, during a merge, we read all shadows for that level, which affects the total time to perform a merge (see graphs below)

Best use case

For level i, best case scenario is when during a merge nothing contained in shadows is actually written to that level. So in order to take maximum advantage of shadows at a particular level, we assume that nothing in levels above is shadowed (otherwise, we'd never even get to a merge at that level, as everything above is already shadowed). This also means that if we want to take advantage of shadows at level i, shadows at all above levels are not useful. Essentially, as we try to take advantage of lower-level shadows, we're carrying the burden of all (useless) shadows from the above levels, so the output size savings shrink as we go to lower levels.

Note that this best case is also for patterns of unique ledger entry updates of various lengths, depending on the level, which is somewhat unlikely to happen in the real world. If any other workload is introduced in the middle of these sequences, shadows might only cover parts of the ledger entry updates.

On the other hand, removing shadows means that we would preserve all the ledger entry updates. The next section shows comparison of size and merge speeds.

Experimental Comparison

Re-doing merges for 300,000-ledger replay showed that the amount of data to write is slightly bigger, while the merge time costs reduced dramatically.

Size costs

Current state with shadows:
Screen Shot 2019-06-28 at 5 10 03 PM

Without shadows:
Screen Shot 2019-06-28 at 5 10 10 PM

Merge time cost

Current state with shadows:
Screen Shot 2019-06-28 at 5 14 57 PM

Without shadows:
Screen Shot 2019-06-28 at 5 15 05 PM

@marta-lokhova
Copy link
Contributor Author

Resolved by #2205

@marta-lokhova marta-lokhova self-assigned this Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants