This repository has been archived by the owner on Aug 23, 2023. It is now read-only.
tszlong: make encoding more compact by conservatively resetting leading/trailing bitcount #2005
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While profiling memory usage for Metrictank we found that ~15% of memory usage came from the raw bstream. This wasn't surprisingly high, but we decided to look for possible optimizations.
We found that 67% of those allocations came from this one line. This line is for writing the significant bits when the value isn't exactly the same as the previous, but falls within the leading/trailing zeroes that previously had been written. However, since the leading/trailing is never reset, a single errant value can cause a lot of significant bits to be written each time. This is especially true for floating point values. For example (playground):
math.Float64bits(126.45) ^ math.Float64bits(127.45)
only has a difference of 1 bit (17 leading zeroes/46 trailing)math.Float64bits(127.45) ^ math.Float64bits(128.45)
has only 10 leading zeroes and 0 trailing.So in a lot of cases we could end up writing many bits of extra sig figs (lots of leading/trailing zeroes) when it could be better to just re-encode the leading/trailing zeroes and write fewer sig figs for some amount of time.
I added some more scenarios to the benchmarks to test various data patterns, 120 datapoints each (chosen as that is the recommended chunk size):
As seen, this approach performs no worse than the existing in terms of chunk size, however in some cases it is significantly better. I have yet to plug this into a running MT instance and see if the result is true across a large range of real-world data.