ZSTD_TrainDictionary runs even when the compression is set to kNoCompression for a given level #12409

kwadhwa18 · 2024-03-06T02:00:48Z

ZSTD_TrainDictionary [link] runs for SSTFileWriter::Finish even when bottommost_compression option is set to kNoCompression. This reduces throughput for SstFileWriter::Finish

We construct rocksdb options using ZSTD compression for levels including 2 and above. For levels 0 and 1, we set it to kNoCompression. We also set zstd_max_train_bytes to a non-zero positive value (which is applicable for levels with ZSTD compression enabled). These options are used for the database and also passed to SstFileWriter for creating sst files to be later added to that database. Since the BlockBasedTableBuilder::Finish [link] only checks for zstd_max_train_bytes to be non-zero positive value, it runs ZSTD_TrainDictionary even when it shouldn't since SSTFileWriter is operating at bottommost level

Expected behavior

If the bottommost_compression or compression_per_level for a level is set to kNoCompression, then don't run ZSTD_TrainDictionary

Actual behavior

ZSTD_TrainDictionary is also run for level which has kNoCompression set

The text was updated successfully, but these errors were encountered:

ajkr · 2024-03-06T22:37:25Z

Another case is max_dict_bytes > 0 will build a dictionary even when compression type is kNoCompression. These sound like good sanitizations to add. Would you be interested in adding some of them?

kwadhwa18 · 2024-03-08T00:12:37Z

are you referring to

rocksdb/table/block_based/block_based_table_builder.cc

Lines 1874 to 1887 in 210c8df

    
           for (size_t i = 0; 
        
                i < kNumBlocksBuffered && compression_dict_samples.size() < kSampleBytes; 
        
                ++i) { 
        
             size_t copy_len = std::min(kSampleBytes - compression_dict_samples.size(), 
        
                                        r->data_block_buffers[buffer_idx].size()); 
        
             compression_dict_samples.append(r->data_block_buffers[buffer_idx], 0, 
        
                                             copy_len); 
        
             compression_dict_sample_lens.emplace_back(copy_len); 
        
             buffer_idx += kPrimeGeneratorRemainder; 
        
             if (buffer_idx >= kNumBlocksBuffered) { 
        
               buffer_idx -= kNumBlocksBuffered; 
        
             } 
        
           }

?

I can help with the sanitizations - is the level information available inside BlockBasedTableBuilder?

ajkr · 2024-03-08T02:08:10Z

are you referring to

rocksdb/table/block_based/block_based_table_builder.cc

Lines 1874 to 1887 in 210c8df

for (size_t i = 0;

i < kNumBlocksBuffered && compression_dict_samples.size() < kSampleBytes;

++i) {

size_t copy_len = std::min(kSampleBytes - compression_dict_samples.size(),

r->data_block_buffers[buffer_idx].size());

compression_dict_samples.append(r->data_block_buffers[buffer_idx], 0,

copy_len);

compression_dict_sample_lens.emplace_back(copy_len);

buffer_idx += kPrimeGeneratorRemainder;

if (buffer_idx >= kNumBlocksBuffered) {

buffer_idx -= kNumBlocksBuffered;

}

}

?

Yes.

I can help with the sanitizations - is the level information available inside BlockBasedTableBuilder?

Yes , BlockBasedTableBuilder::Rep has compression_type and compression_opts, which are the settings specific to the level for which the table is being built

edit: Technically the answer to your question is no, but my point is BlockBasedTableBuilder::Rep has everything you need without it

kwadhwa18 · 2024-03-13T18:46:53Z

I have attempted a fix #12420. PTAL!

kwadhwa18 mentioned this issue Mar 12, 2024

don't run ZSTD_TrainDictionary in BlockBasedTableBuilder if there isn't compression needed #12420

Closed

kwadhwa18 mentioned this issue Mar 18, 2024

don't run ZSTD_TrainDictionary in BlockBasedTableBuilder if there isn't compression needed #12453

Closed

facebook-github-bot closed this as completed in 4ce1dc9 Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZSTD_TrainDictionary runs even when the compression is set to kNoCompression for a given level #12409

ZSTD_TrainDictionary runs even when the compression is set to kNoCompression for a given level #12409

kwadhwa18 commented Mar 6, 2024

ajkr commented Mar 6, 2024

kwadhwa18 commented Mar 8, 2024

ajkr commented Mar 8, 2024 •

edited

kwadhwa18 commented Mar 13, 2024

ZSTD_TrainDictionary runs even when the compression is set to kNoCompression for a given level #12409

ZSTD_TrainDictionary runs even when the compression is set to kNoCompression for a given level #12409

Comments

kwadhwa18 commented Mar 6, 2024

Expected behavior

Actual behavior

ajkr commented Mar 6, 2024

kwadhwa18 commented Mar 8, 2024

ajkr commented Mar 8, 2024 • edited

kwadhwa18 commented Mar 13, 2024

ajkr commented Mar 8, 2024 •

edited