New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSTFileWriter don't report file size when enabled zstd dictionary training #11146
Comments
With zstd dictionary compression, there is a "buffered" stage where the supposedly written data is kept in memory. This is used to generate a compression dictionary from buffered data. This causes sstFileWriter.FileSize() to return 0 in buffered stage right now, and you should see non-zero file size once buffered data is written to file (e.g. after Finish()). EDIT: this is a reasonable feature, but there is no plan to add support yet. A temporary workaround is to limit buffer size ( |
I haven't digged into it, but I guess there's more to that, I use sth like this to write the sst files: for key, value in input:
if sstWriter.FileSize() > 128m:
sstWriter.Finish()
sstWriter.Open(next file)
sstWriter.Put(key, value) When using zstd dictionary compression, it'll keep generating 2gb sst file and never rotate. Should I just set |
Or maybe we should support setting the target file size in sst file writer? (https://github.com/facebook/rocksdb/blob/main/table/sst_file_writer.cc#L323) |
What do you think is the best practice to rotate sst files based on file size? |
I think target file size works the same as setting rocksdb/table/block_based/block_based_table_builder.cc Lines 452 to 459 in 54d7208
max_dict_buffer_bytes for now.
|
I seems don't work, still don't rotate even if the size is more then 1g, with target file size 128m. |
Can you share your compression options, including bottommost compression options? I thought setting max_dict_buffer_bytes=128m would cause |
I was trying to limit the file size when bulk loading with SSTFileWriter, and I find the
FileSize()
always return0
when zstd dictionary training is enabled.Expected behavior
sstFileWriter.FileSize()
should report current progress.Actual behavior
sstFileWriter.FileSize()
always return0
when zstd dictionary training is enabled.Steps to reproduce the behavior
The text was updated successfully, but these errors were encountered: