New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow sst_dump to check size of different compression levels and report time #6634
Allow sst_dump to check size of different compression levels and report time #6634
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
1e7d33c
to
6f5eae3
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
tools/sst_dump_tool.cc
Outdated
size_t block_size = 0; | ||
std::vector<std::pair<CompressionType, const char*>> compression_types; | ||
uint64_t total_num_files = 0; | ||
uint64_t total_num_data_blocks = 0; | ||
uint64_t total_data_block_size = 0; | ||
uint64_t total_index_block_size = 0; | ||
uint64_t total_filter_block_size = 0; | ||
int32_t compression_level_from = 0; | ||
int32_t compression_level_to = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be initialized to kDefaultCompressionLevel
.
int32_t compress_level_to) { | ||
|
||
fprintf(stdout, "Block Size: %" ROCKSDB_PRIszt "\n", block_size); | ||
for (auto& i : compression_types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic can be simplified by defaulting compress_level_from
and compress_level_to
to kDefaultCompressionLevel
and not requiring show_compression_levels
to be passed.
tools/sst_dump_tool.cc
Outdated
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start); | ||
fprintf(stdout, " Size: %10" PRIu64, file_size); | ||
fprintf(stdout, " Blocks: %6" PRIu64, num_data_blocks); | ||
fprintf(stdout, " Time Taken: %10s microsecs", std::to_string(duration.count()).c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make format
to ensure <= 80 columns
6f5eae3
to
d5f87b8
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of more comments. HISTORY.md needs to be updated to mention this new feature.
tools/sst_dump_tool.cc
Outdated
0 /* sample_for_compression */, compress_opt, | ||
false /* skip_filters */, column_family_name, unknown_level); | ||
uint64_t num_data_blocks = 0; | ||
auto start = std::chrono::steady_clock::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to avoid using auto
- https://google.github.io/styleguide/cppguide.html#Type_deduction. In this context, its better to be explicit so the unit of time is clear
tools/sst_dump_tool.cc
Outdated
auto start = std::chrono::steady_clock::now(); | ||
uint64_t file_size = | ||
CalculateCompressedTableSize(tb_opts, block_size, &num_data_blocks); | ||
auto end = std::chrono::steady_clock::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous comment
} | ||
const uint64_t ratio_not_compressed_blocks = | ||
(num_data_blocks - compressed_blocks) - not_compressed_blocks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formula looks wrong (it'll always evaluate to 0?). What exactly are we trying to calculate here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the code it seems "num_data_blocks = compressed_blocks + not_compressed_blocks" only when
"compressed_blocks + not_compressed_blocks > num_data_blocks" so that
ratio_not_compressed_blocks will be 0 instead of -ve value.
Also, its mentioned that
//When the option enable_index_compression is true,
// NUMBER_BLOCK_COMPRESSED is incremented for index block(s).
so I guess in that case, compressed_blocks increases which makes the total sum > num_data_blocks.
When I run the tool, for instance, num_data_blocks = 3320, compressed_blocks = 3321, non_compressed_blocks = 0 (before changing the num_data_blocks value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Thanks for digging into this. Its still not clear when "compressed_blocks + not_compressed_blocks < num_data_blocks", which is when "ratio_not_compressed_blocks" will be > 0. Anyways, since it was there before, its a separate issue and we can probably follow it up outside this PR.
d5f87b8
to
3b33754
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
3b33754
to
1a2c8b8
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
…rt time Summary : 1. Add two arguments --compression_level_from and --compression_level_to to check the compression size with different compression level in the given range. Users must specify one compression type else it will error out. Both from and to levels must also be specified togther. 2. Display the time taken to compress each file with different compressions by default. Test Plan : make -j64 check
1a2c8b8
to
d2e7efb
Compare
@akankshamahajan15 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@akankshamahajan15 merged this pull request in 75b13ea. |
Summary : 1. Add two arguments --compression_level_from and --compression_level_to to check
the compression size with different compression level in the given range. Users must
specify one compression type else it will error out. Both from and to levels must
also be specified together.
2. Display the time taken to compress each file with different compressions by default.
Test Plan : make -j64 check