Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sst_dump to check size of different compression levels and report time #6634

Closed

Conversation

akankshamahajan15
Copy link
Contributor

Summary : 1. Add two arguments --compression_level_from and --compression_level_to to check
the compression size with different compression level in the given range. Users must
specify one compression type else it will error out. Both from and to levels must
also be specified together.
2. Display the time taken to compress each file with different compressions by default.

Test Plan : make -j64 check

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

size_t block_size = 0;
std::vector<std::pair<CompressionType, const char*>> compression_types;
uint64_t total_num_files = 0;
uint64_t total_num_data_blocks = 0;
uint64_t total_data_block_size = 0;
uint64_t total_index_block_size = 0;
uint64_t total_filter_block_size = 0;
int32_t compression_level_from = 0;
int32_t compression_level_to = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be initialized to kDefaultCompressionLevel.

int32_t compress_level_to) {

fprintf(stdout, "Block Size: %" ROCKSDB_PRIszt "\n", block_size);
for (auto& i : compression_types) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic can be simplified by defaulting compress_level_from and compress_level_to to kDefaultCompressionLevel and not requiring show_compression_levels to be passed.

auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
fprintf(stdout, " Size: %10" PRIu64, file_size);
fprintf(stdout, " Blocks: %6" PRIu64, num_data_blocks);
fprintf(stdout, " Time Taken: %10s microsecs", std::to_string(duration.count()).c_str());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make format to ensure <= 80 columns

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of more comments. HISTORY.md needs to be updated to mention this new feature.

0 /* sample_for_compression */, compress_opt,
false /* skip_filters */, column_family_name, unknown_level);
uint64_t num_data_blocks = 0;
auto start = std::chrono::steady_clock::now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid using auto - https://google.github.io/styleguide/cppguide.html#Type_deduction. In this context, its better to be explicit so the unit of time is clear

auto start = std::chrono::steady_clock::now();
uint64_t file_size =
CalculateCompressedTableSize(tb_opts, block_size, &num_data_blocks);
auto end = std::chrono::steady_clock::now();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment

}
const uint64_t ratio_not_compressed_blocks =
(num_data_blocks - compressed_blocks) - not_compressed_blocks;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formula looks wrong (it'll always evaluate to 0?). What exactly are we trying to calculate here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the code it seems "num_data_blocks = compressed_blocks + not_compressed_blocks" only when
"compressed_blocks + not_compressed_blocks > num_data_blocks" so that
ratio_not_compressed_blocks will be 0 instead of -ve value.

Also, its mentioned that
//When the option enable_index_compression is true,
// NUMBER_BLOCK_COMPRESSED is incremented for index block(s).
so I guess in that case, compressed_blocks increases which makes the total sum > num_data_blocks.

When I run the tool, for instance, num_data_blocks = 3320, compressed_blocks = 3321, non_compressed_blocks = 0 (before changing the num_data_blocks value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Thanks for digging into this. Its still not clear when "compressed_blocks + not_compressed_blocks < num_data_blocks", which is when "ratio_not_compressed_blocks" will be > 0. Anyways, since it was there before, its a separate issue and we can probably follow it up outside this PR.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…rt time

Summary : 1. Add two arguments --compression_level_from and --compression_level_to to check
	  the compression size with different compression level in the given range. Users must
          specify one compression type else it will error out. Both from and to levels must
	  also be specified togther.
	  2. Display the time taken to compress each file with different compressions by default.

Test Plan : make -j64 check
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 merged this pull request in 75b13ea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants