-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show number of blocks compressed and not compressed when using "sst_dump --show_compression_sizes". #1474
Comments
For
Example output:
|
Are you interested in sending a pull request for this? |
I am interested in a statistic for how many compressed blocks were not used because they failed to meet the compression threshold. At the moment I am testing using
can currently be logged. While I can now see that this metric can be calculated from the existing metrics available, the way the names I can send a PR for the change that is isolated to |
Closing this via automation due to lack of activity. If discussion is still needed here, please re-open or create a new/updated issue. |
For: facebook#1474 Helps show when the 12.5% threshold for GoodCompressionRatio (originally from ldb) is hit.
For: facebook#1474 Helps show when the 12.5% threshold for GoodCompressionRatio (originally from ldb) is hit.
…ook#5791) Summary: Closes facebook#1474 Helps show when the 12.5% threshold for GoodCompressionRatio (originally from ldb) is hit. Example output: ``` > ./sst_dump --file=/tmp/test.sst --command=recompress from [] to [] Process /tmp/test.sst Sst file format: block-based Block Size: 16384 Compression: kNoCompression Size: 122579836 Blocks: 2300 Compressed: 0 ( 0.0%) Not compressed (ratio): 2300 (100.0%) Not compressed (abort): 0 ( 0.0%) Compression: kSnappyCompression Size: 46289962 Blocks: 2300 Compressed: 2119 ( 92.1%) Not compressed (ratio): 181 ( 7.9%) Not compressed (abort): 0 ( 0.0%) Compression: kZlibCompression Size: 29689825 Blocks: 2300 Compressed: 2301 (100.0%) Not compressed (ratio): 0 ( 0.0%) Not compressed (abort): 0 ( 0.0%) Unsupported compression type: kBZip2Compression. Compression: kLZ4Compression Size: 44785490 Blocks: 2300 Compressed: 1950 ( 84.8%) Not compressed (ratio): 350 ( 15.2%) Not compressed (abort): 0 ( 0.0%) Compression: kLZ4HCCompression Size: 37498895 Blocks: 2300 Compressed: 2301 (100.0%) Not compressed (ratio): 0 ( 0.0%) Not compressed (abort): 0 ( 0.0%) Unsupported compression type: kXpressCompression. Compression: kZSTD Size: 32208707 Blocks: 2300 Compressed: 2301 (100.0%) Not compressed (ratio): 0 ( 0.0%) Not compressed (abort): 0 ( 0.0%) ``` Pull Request resolved: facebook#5791 Differential Revision: D17347870 fbshipit-source-id: af10849c010b46b20e54162b70123c2805ffe526
Hello and thank you for RocksDB,
Rather than
sst_dump --show_compression_sizes
just showing the compression sizes:I found also showing the number of blocks compressed and not compressed to be valuable:
I am testing compression options on values that are already compressed with LZ4, and have bumped up against the 12.5% threshold for
GoodCompressionRatio
(originally from ldb) as I have datasets where many blocks compress by 10%.Currently
NUMBER_BLOCK_NOT_COMPRESSED
is only incremented when compression is aborted because the block is too big or did not pass verification ( added in f43c826 ) , it does not requireShouldReportDetailedTime(r->ioptions.env, r->ioptions.statistics)
to be true, it is not incremented when a block is not compressed because it does not compress by 12.5%.NUMBER_BLOCK_COMPRESSED
is incremented when a compressed block is used/written, it requriesShouldReportDetailedTime(r->ioptions.env, r->ioptions.statistics)
to be true, it was added in 9430333 .When the code ( with
NUMBER_BLOCK_NOT_COMPRESSED
(aborted compression) ) hadNUMBER_BLOCK_COMPRESSED
(compression meets threshold) added, I suspect counting blocks that fail to meet the threshold was overlooked, or maybe just not a priority.The output above with the number of blocks not compressed uses a change that changes
NUMBER_BLOCK_NOT_COMPRESSED
to also be incremented when a block does not compress by 12.5%. This is a change in behaviour that is likely not acceptable:Alternatively
NUMBER_BLOCK_NOT_COMPRESSED
could be left unchanged and a new metric,NUMBER_BLOCK_COMPRESSION_RATIO_FAIL
added. Avoiding any change in behaviour (for existing counters) for consumers.Or
NUMBER_BLOCK_ABORT_COMPRESSION
could be added and used asNUMBER_BLOCK_NOT_COMPRESSED
is used now (aborts), andNUMBER_BLOCK_NOT_COMPRESSED
used for ratio fails (when ShouldReportDetailedTime) and aborted compression due to size or failed verify.The text was updated successfully, but these errors were encountered: