You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BucketList is a complex beast. Shadows are a part of them, and are used for merges. The basic idea is that if a ledger entry is constantly updated, shadows elide those updates into one, preventing redundant entries being spilled to lower levels. While this makes sense, I think the bucketlist will be much simpler (and merges will be faster) if we remove shadows. I'll base my argument around the fact that shadows come at a high cost, but are most useful in a narrow range of scenarios. Additionally, i think its usefulness degrades at lower levels.
The cost
For each level i, we store curr and snap for every level = i - 2 and higher. A glimpse of production data shows that shadows can be ~100-200MBs in size, and will keep growing at the ledger grows. Additionally, during a merge, we read all shadows for that level, which affects the total time to perform a merge (see graphs below)
Best use case
For level i, best case scenario is when during a merge nothing contained in shadows is actually written to that level. So in order to take maximum advantage of shadows at a particular level, we assume that nothing in levels above is shadowed (otherwise, we'd never even get to a merge at that level, as everything above is already shadowed). This also means that if we want to take advantage of shadows at level i, shadows at all above levels are not useful. Essentially, as we try to take advantage of lower-level shadows, we're carrying the burden of all (useless) shadows from the above levels, so the output size savings shrink as we go to lower levels.
Note that this best case is also for patterns of unique ledger entry updates of various lengths, depending on the level, which is somewhat unlikely to happen in the real world. If any other workload is introduced in the middle of these sequences, shadows might only cover parts of the ledger entry updates.
On the other hand, removing shadows means that we would preserve all the ledger entry updates. The next section shows comparison of size and merge speeds.
Experimental Comparison
Re-doing merges for 300,000-ledger replay showed that the amount of data to write is slightly bigger, while the merge time costs reduced dramatically.
Size costs
Current state with shadows:
Without shadows:
Merge time cost
Current state with shadows:
Without shadows:
The text was updated successfully, but these errors were encountered:
BucketList is a complex beast. Shadows are a part of them, and are used for merges. The basic idea is that if a ledger entry is constantly updated, shadows elide those updates into one, preventing redundant entries being spilled to lower levels. While this makes sense, I think the bucketlist will be much simpler (and merges will be faster) if we remove shadows. I'll base my argument around the fact that shadows come at a high cost, but are most useful in a narrow range of scenarios. Additionally, i think its usefulness degrades at lower levels.
The cost
For each level i, we store curr and snap for every level = i - 2 and higher. A glimpse of production data shows that shadows can be ~100-200MBs in size, and will keep growing at the ledger grows. Additionally, during a merge, we read all shadows for that level, which affects the total time to perform a merge (see graphs below)
Best use case
For level i, best case scenario is when during a merge nothing contained in shadows is actually written to that level. So in order to take maximum advantage of shadows at a particular level, we assume that nothing in levels above is shadowed (otherwise, we'd never even get to a merge at that level, as everything above is already shadowed). This also means that if we want to take advantage of shadows at level i, shadows at all above levels are not useful. Essentially, as we try to take advantage of lower-level shadows, we're carrying the burden of all (useless) shadows from the above levels, so the output size savings shrink as we go to lower levels.
Note that this best case is also for patterns of unique ledger entry updates of various lengths, depending on the level, which is somewhat unlikely to happen in the real world. If any other workload is introduced in the middle of these sequences, shadows might only cover parts of the ledger entry updates.
On the other hand, removing shadows means that we would preserve all the ledger entry updates. The next section shows comparison of size and merge speeds.
Experimental Comparison
Re-doing merges for 300,000-ledger replay showed that the amount of data to write is slightly bigger, while the merge time costs reduced dramatically.
Size costs
Current state with shadows:
Without shadows:
Merge time cost
Current state with shadows:
Without shadows:
The text was updated successfully, but these errors were encountered: