Avoid trivial move if SST file is far smaller than the target size #8306

iFA88 · 2021-05-17T07:57:41Z

Greetings everyone!

I'm using RocksDB version: 6.8.1 with many column families. I would like to grab just one CF out for examine, because i can not figure out why doesnt do the compaction what i excepted. This CF got just INSERT without any DELETE OP, and the DB got just few restarts during his life.

LOG stats about this CF:

** Compaction Stats [table:blocks:data] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0    2.81 MB   0.5      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0    106.7      0.03              0.01         2    0.016       0      0
  L4     27/0   63.62 MB   1.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L5     33/0   90.66 MB   1.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L6    363/0   910.27 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Sum    424/0    1.04 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0    106.7      0.03              0.01         2    0.016       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

** Compaction Stats [table:blocks:data] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
High      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0    105.7      0.02              0.01         1    0.016       0      0
User      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0    107.7      0.02              0.00         1    0.017       0      0
Uptime(secs): 2417.6 total, 600.0 interval
Flush(GB): cumulative 0.003, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction _bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

Options for this CF:

[CFOptions "table\:blocks\:data"]
  sample_for_compression=0
  compaction_pri=kMinOverlappingRatio
  merge_operator=nullptr
  compaction_filter_factory=nullptr
  memtable_factory=SkipListFactory
  memtable_insert_with_hint_prefix_extractor=nullptr
  comparator=leveldb.BytewiseComparator
  target_file_size_base=134217728
  max_sequential_skip_in_iterations=8
  compaction_style=kCompactionStyleLevel
  max_bytes_for_level_base=67108864
  bloom_locality=0
  write_buffer_size=16777216
  compression_per_level=
  memtable_huge_page_size=0
  max_successive_merges=0
  arena_block_size=2097152
  memtable_whole_key_filtering=false
  target_file_size_multiplier=1
  max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1
  num_levels=7
  min_write_buffer_number_to_merge=1
  max_write_buffer_number_to_maintain=0
  max_write_buffer_number=2
  compression=kSnappyCompression
  level0_stop_writes_trigger=36
  level0_slowdown_writes_trigger=20
  compaction_filter=nullptr
  level0_file_num_compaction_trigger=2
  max_compaction_bytes=3355443200
  compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;min_merge_width=2;size_ratio=1;}
  memtable_prefix_bloom_size_ratio=0.000000
  max_write_buffer_size_to_maintain=0
  hard_pending_compaction_bytes_limit=274877906944
  ttl=0
  table_factory=BlockBasedTable
  soft_pending_compaction_bytes_limit=68719476736
  prefix_extractor=nullptr
  bottommost_compression=kDisableCompressionOption
  force_consistency_checks=false
  paranoid_file_checks=true
  compaction_options_fifo={allow_compaction=false;max_table_files_size=1073741824;}
  max_bytes_for_level_multiplier=10.000000
    optimize_filters_for_hits=false
  level_compaction_dynamic_level_bytes=true
  inplace_update_num_locks=10000
  inplace_update_support=false
  periodic_compaction_seconds=0
  disable_auto_compactions=false
  report_bg_io_stats=false

[TableOptions/BlockBasedTable "table\:blocks\:data"]
  pin_top_level_index_and_filter=true
  enable_index_compression=false
  read_amp_bytes_per_bit=21474836480
  format_version=5
  block_align=false
  metadata_block_size=4096
  block_size_deviation=10
  partition_filters=false
  block_size=32768
  index_block_restart_interval=1
  no_block_cache=true
  checksum=kCRC32c
  whole_key_filtering=true
  index_shortening=kShortenSeparators
  data_block_index_type=kDataBlockBinarySearch
  index_type=kBinarySearch
  verify_compression=false
  filter_policy=nullptr
  data_block_hash_table_util_ratio=0.750000
  pin_l0_filter_and_index_blocks_in_cache=false
  block_restart_interval=16
  cache_index_and_filter_blocks_with_high_priority=true
  cache_index_and_filter_blocks=false
  hash_index_allow_collision=true
  flush_block_policy_factory=FlushBlockBySizePolicyFactory

So the target file size should be 128MB, but when we calculate level nr 6 file sizes then i have average 2.5mb. Optimal it should be just about 8 files in the level nr 6.

Thanks for the help!

The text was updated successfully, but these errors were encountered:

jay-zhuang · 2021-05-17T22:50:18Z

One possibility is the small files are trivial moved from L0 -> L6 (as they're already sorted, compaction is skipped), especially in your log, cumulative compaction write is 0.
You can try run a manual compaction to see if L6 files will be compacted together?

iFA88 · 2021-05-18T06:23:51Z

I have 928 lines with trivial_move in 11 hour log.

{"time_micros": 1621317290331689, "job": 1686, "event": "trivial_move", "destination_level": 6, "files": 1, "total_files_size": 29179
469}

The only possible way is the manual compaction where the files are finally merged/compacted together, but that takes long time and my biggest problem that the whole DB is not accessible during this time. It would be great when they would do during normal work.
It seems the compaction has just the job to put WAL files to SST without any real compaction. Even I don't know why they move more than one level in one step.. that is definitely wrong.
This database (secondary) has now 15k files and takes 795GB storage. I have an other database (primary) with almost the same data which has 30k files and takes 769 GB. The secondary have one week before manual compacted.
There is the secondary LOG (13MB) : https://www.fusionsolutions.io/doc/LOG

jay-zhuang · 2021-05-18T16:57:02Z

One suggestion we have is to set bottommost_compression, to ZSTD for example:

rocksdb/include/rocksdb/options.h

Line 216 in a639c02

CompressionType bottommost_compression = kDisableCompressionOption;

Because it's different from your other level compression (snappy from your log), it will force the compaction on the last level, to reduce the file number and also reduce your storage footprint (zstd compress much better than snappy).

The only possible way is the manual compaction where the files are finally merged/compacted together

Just fyi. you may need to set manual compaction BottommostLevelCompaction to kForce to force the compaction:

rocksdb/include/rocksdb/options.h

Line 1578 in a639c02

kForce,

In general, Rocksdb try to reduce write amplification by avoid any unnecessary compaction. The trade-off is more small files.

but that takes long time and my biggest problem that the whole DB is not accessible during this time. It would be great when they would do during normal work.

Manual compaction is triggered by the user and running in the background, the DB is still accessible during that time.

It seems the compaction has just the job to put WAL files to SST without any real compaction. Even I don't know why they move more than one level in one step.. that is definitely wrong.

I guess you have level_compaction_dynamic_level_bytes enabled, which again try to reduce write amplification by moving the sst files to higher level faster.

rocksdb/include/rocksdb/advanced_options.h

Lines 472 to 473 in a4919d6

    
           // We will pick a base level b >= 1. L0 will be directly merged into level b, 
        
           // instead of always into level 1. Level 1 to b-1 need to be empty.

I think your usecase is special:

you have lots of column families, so it's likely generate small sst files on Level 0. Increase write buffer/wal might help.
the write is strictly time serial based, so the output could be trivial moved, which is good for write amplification,
but seems you prefer large files vs. low write amplification.

So I think change bottomost_compression is the best workaround for that. I'm going to close this issue, please feel free to re-open.

iFA88 · 2021-05-20T08:27:27Z

I need to reopen this issue, sadly your suggestions did not help.

You can see here that an SST file has been created and immediately trivial moved from lvl1 to lvl6 without any compaction:

> cat LOG | grep 123814
2021/05/20-09:59:33.609574 7f720b7fe700 [flush_job.cc:350] [table:blocks:index:byTimestamp:data] [JOB 23459] Level-0 flush table #123814: started
2021/05/20-09:59:33.612678 7f720b7fe700 EVENT_LOG_v1 {"time_micros": 1621497573612658, "cf_name": "table:blocks:index:byTimestamp:data", "job": 23459, "event": "table_file_creation", "file_number": 123814, "file_size": 85799, "table_properties": {"data_size": 84926, "index_size": 90, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 1, "index_value_is_delta_encoded": 1, "filter_size": 0, "raw_key_size": 130432, "raw_average_key_size": 16, "raw_value_size": 0, "raw_average_value_size": 0, "num_data_blocks": 5, "num_entries": 8152, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "", "column_family_name": "table:blocks:index:byTimestamp:data", "column_family_id": 13, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "Snappy", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1621495878, "oldest_key_time": 1621495878, "file_creation_time": 1621497573}}
2021/05/20-09:59:33.612691 7f720b7fe700 [flush_job.cc:401] [table:blocks:index:byTimestamp:data] [JOB 23459] Level-0 flush table #123814: 85799 bytes OK
2021/05/20-09:59:33.613561 7f720b7fe700 (Original Log Time 2021/05/20-09:59:33.612850) [memtable_list.cc:447] [table:blocks:index:byTimestamp:data] Level-0 commit table #123814 started
2021/05/20-09:59:33.613565 7f720b7fe700 (Original Log Time 2021/05/20-09:59:33.613457) [memtable_list.cc:503] [table:blocks:index:byTimestamp:data] Level-0 commit table #123814: memtable #1 done
2021/05/20-09:59:34.251224 7f7231ffb700 (Original Log Time 2021/05/20-09:59:34.249607) [db_impl/db_impl_compaction_flush.cc:2680] [table:blocks:index:byTimestamp:data] Moving #123814 to level-4 85799 bytes
2021/05/20-09:59:35.334217 7f7231ffb700 (Original Log Time 2021/05/20-09:59:35.333278) [db_impl/db_impl_compaction_flush.cc:2680] [table:blocks:index:byTimestamp:data] Moving #123814 to level-5 85799 bytes
2021/05/20-09:59:40.689105 7f7231ffb700 (Original Log Time 2021/05/20-09:59:40.688077) [db_impl/db_impl_compaction_flush.cc:2680] [table:blocks:index:byTimestamp:data] Moving #123814 to level-6 85799 bytes

I know that the data size is little, but the result is simple CRAZY:

** Compaction Stats [table:blocks:index:byTimestamp:data] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   16.33 KB   0.5      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     10.3      0.53              0.20        99    0.005       0      0
  L4     88/0    4.00 MB   1.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L5    100/0    5.53 MB   1.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L6    338/0   55.52 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
 Sum    527/0   65.06 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     10.3      0.53              0.20        99    0.005       0      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

** Compaction Stats [table:blocks:index:byTimestamp:data] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
High      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     10.2      0.53              0.20        98    0.005       0      0
User      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     14.5      0.00              0.00         1    0.001       0      0
Uptime(secs): 133817.4 total, 600.0 interval
Flush(GB): cumulative 0.005, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.01 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.5 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, inter
val 0 total count

The files NEED to be merged together and the DB did not do that and i dont know why not..
Options for this CF:

CFOptions "table\:blocks\:index\:byTimestamp\:data"]
  sample_for_compression=0
  compaction_pri=kByCompensatedSize
  merge_operator=nullptr
  compaction_filter_factory=nullptr
  memtable_factory=SkipListFactory
  memtable_insert_with_hint_prefix_extractor=nullptr
  comparator=leveldb.BytewiseComparator
  target_file_size_base=33554432
  max_sequential_skip_in_iterations=8
  compaction_style=kCompactionStyleLevel
  max_bytes_for_level_base=4194304
  bloom_locality=0
  write_buffer_size=8388608
  compression_per_level=
  memtable_huge_page_size=0
  max_successive_merges=0
  arena_block_size=1048576
  memtable_whole_key_filtering=false
  target_file_size_multiplier=1
  max_bytes_for_level_multiplier_additional=1:1:1:1:1:1:1
  num_levels=7
  min_write_buffer_number_to_merge=2
  max_write_buffer_number_to_maintain=0
  max_write_buffer_number=4
  compression=kSnappyCompression
  level0_stop_writes_trigger=36
  level0_slowdown_writes_trigger=20
  compaction_filter=nullptr
  level0_file_num_compaction_trigger=2
  max_compaction_bytes=536870912
  compaction_options_universal={allow_trivial_move=false;stop_style=kCompactionStopStyleTotalSize;compression_size_percent=-1;max_size_amplification_percent=200;max_merge_width=4294967295;min_merge_width=2;size_ratio=1;}
  memtable_prefix_bloom_size_ratio=0.000000
  max_write_buffer_size_to_maintain=0
  hard_pending_compaction_bytes_limit=274877906944
  ttl=0
  table_factory=BlockBasedTable
  soft_pending_compaction_bytes_limit=68719476736
  prefix_extractor=nullptr
  bottommost_compression=kDisableCompressionOption
  force_consistency_checks=false
  paranoid_file_checks=true
  compaction_options_fifo={allow_compaction=false;max_table_files_size=1073741824;}
  max_bytes_for_level_multiplier=10.000000
  optimize_filters_for_hits=false
  level_compaction_dynamic_level_bytes=true
  inplace_update_num_locks=10000
  inplace_update_support=false
  periodic_compaction_seconds=0
  disable_auto_compactions=false
  report_bg_io_stats=false
  
[TableOptions/BlockBasedTable "table\:blocks\:index\:byTimestamp\:data"]
  pin_top_level_index_and_filter=true
  enable_index_compression=false
  read_amp_bytes_per_bit=21474836480
  format_version=5
  block_align=false
  metadata_block_size=4096
  block_size_deviation=10
  partition_filters=false
  block_size=32768
  index_block_restart_interval=1
  no_block_cache=true
  checksum=kCRC32c
  whole_key_filtering=true
  index_shortening=kShortenSeparators
  data_block_index_type=kDataBlockBinarySearch
  index_type=kBinarySearch
  verify_compression=false
  filter_policy=nullptr
  data_block_hash_table_util_ratio=0.750000
  pin_l0_filter_and_index_blocks_in_cache=false
  block_restart_interval=16
  cache_index_and_filter_blocks_with_high_priority=true
  cache_index_and_filter_blocks=false
  hash_index_allow_collision=true
  flush_block_policy_factory=FlushBlockBySizePolicyFactory

jay-zhuang · 2021-05-20T14:36:27Z

Your bottommost_compression is still disabled:

bottommost_compression=kDisableCompressionOption

iFA88 · 2021-05-20T16:11:41Z

Your bottommost_compression is still disabled:
bottommost_compression=kDisableCompressionOption

Okay, but I dont need that. At this time all levels (except 0) will be compressed by snappy and i'm fine with that.

Even the compression (should) not do anything with the COMPACTION.

There is two questions:

Why they move small files trivial to higher levels without any merge?
I have an target_file_size_base=33554432, why is that completely ignored?

jay-zhuang · 2021-05-20T16:34:28Z

FYI., for Rocksdb, bottommost level is the highest level, Level 6 in your case.

Why they move small files trivial to higher levels without any merge?

Rocksdb try to reduce the write amplification, merging small files to a large one needs to do IO, which increases the write amplification. target_file_size is a target, to reduce IO, it may not always be the target file size.
Again, for your use case, seems you don't care about extra IO, change bottommost compression will force the merge that you want.

iFA88 · 2021-05-20T16:46:46Z

Yeah, i have much bigger problem when i have 60k files in my directory. FYI after manual compaction will have about 15k files and about 20-30GB less size which are matters a lot.. you know, lot of small files greatly reduces performance.
I can compare my DB with other applications which are also uses older rocksdb, and there compaction works great.
Sadly my python-rocksdb did not support bottommost_compression, i need code that first.
Thanks.

ajkr · 2021-05-20T17:47:53Z

Maybe we should change RocksDB to not do trivial move on input files whose size is way different from the configured target output file size. I can think of two cases this condition would be expected: (1) compacting from L0 to base level; or (2) compacting base level+ to the next level when target_file_size_multiplier > 1.

iFA88 · 2021-05-22T09:51:37Z

Hey! Sadly the bottommost_compression did not solved the trivial_move issue. I ran my DB for 12 hours and there is a lot of trivial_move without compacting/compressing. I have set for LZ4 which is different as the other levels compression (snappy).
May there is a size limit for compacting files?

jay-zhuang · 2021-05-26T22:44:31Z

Hey! Sadly the bottommost_compression did not solved the trivial_move issue. I ran my DB for 12 hours and there is a lot of trivial_move without compacting/compressing. I have set for LZ4 which is different as the other levels compression (snappy).
May there is a size limit for compacting files?

It only do non-trivial-move compaction for the bottommost level, not the other levels. On the other hand, bottommost level has the majority of your data, so it should compact most of the file together. You may need to run a manual compaction after changing the compression option. You can check if there's still small files compressed with LZ4, or bottommost files are not compressed with LZ4.

jay-zhuang · 2021-05-26T23:22:06Z

Maybe we should change RocksDB to not do trivial move on input files whose size is way different from the configured target output file size.

@ajkr how should we improve the compaction picker to do that? If a small SST is selected for compaction and has no overlap in the next level, should it add or wait another file on the same input level to do the compaction? or force to compaction with other files on the output level which has no overlap?

iFA88 · 2021-07-12T21:09:51Z

Hey, sorry for the late response, i have just resynced my database with the latest options.

A long LOG file, options and stats:
https://www.fusionsolutions.io/doc/resynclog.tar.gz

My DB is now 616 GB with 69754 file. After compaction this would be about 24k file and 610 GB.
Some column families have a lot of small files, and every time for these compaction did not happen just trivial move.

iFA88 · 2021-07-15T09:21:59Z

DB sync almost done.
892,367,654K bytes in 91748 files

** Compaction Stats [table:blocks:index:byBaseFeePerGas:data] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0    3.17 KB   0.5      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     18.8      0.73              0.61        65    0.011       0      0
  L1    240/0    3.98 MB   1.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0
  L2    428/0   14.19 MB   0.4      0.0     0.0      0.0       0.0      0.0       0.0   0.2     24.4      4.1      0.56              0.48        65    0.009   2135K      0
 Sum    669/0   18.17 MB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.2     10.6     12.4      1.28              1.09       130    0.010   2135K      0
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0

** Compaction Stats [table:blocks:index:byBaseFeePerGas:data] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Low      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0     24.4      4.1      0.56              0.48        65    0.009   2135K      0
High      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     18.7      0.72              0.61        64    0.011       0      0
User      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     26.5      0.01              0.00         1    0.007       0      0
Uptime(secs): 324679.4 total, 600.0 interval
Flush(GB): cumulative 0.013, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.02 GB write, 0.00 MB/s write, 0.01 GB read, 0.00 MB/s read, 1.3 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for

iFA88 · 2021-07-16T06:02:05Z

My db has been synced and uses: 917,179,880K bytes in 95133 files. After manual compaction the final results are: 905,813,589K bytes in 14690 files

solicomo · 2023-05-22T16:50:27Z

Maybe we should change RocksDB to not do trivial move on input files whose size is way different from the configured target output file size.

@ajkr how should we improve the compaction picker to do that? If a small SST is selected for compaction and has no overlap in the next level, should it add or wait another file on the same input level to do the compaction? or force to compaction with other files on the output level which has no overlap?

It'd be easier to expand inputs to include one more file on the same input level.
If there is no objection I can create a PR.

jay-zhuang closed this as completed May 18, 2021

jay-zhuang changed the title ~~Too many small files in higher levels~~ Avoid trivial move if the file is far smaller than the target size May 26, 2021

jay-zhuang reopened this May 26, 2021

jay-zhuang changed the title ~~Avoid trivial move if the file is far smaller than the target size~~ Avoid trivial move if SST file is far smaller than the target size May 26, 2021

jay-zhuang added up-for-grabs Up for grabs compaction labels May 26, 2021

ajkr mentioned this issue Oct 17, 2021

Compaction to cut output file if it can avoid to overlap a grandparent file #1963

Closed

zaidoon1 mentioned this issue Nov 17, 2023

level_compaction_dynamic_level_bytes causing long db start up time? #12073

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid trivial move if SST file is far smaller than the target size #8306

Avoid trivial move if SST file is far smaller than the target size #8306

iFA88 commented May 17, 2021

jay-zhuang commented May 17, 2021 •

edited

iFA88 commented May 18, 2021

jay-zhuang commented May 18, 2021

iFA88 commented May 20, 2021

jay-zhuang commented May 20, 2021

iFA88 commented May 20, 2021

jay-zhuang commented May 20, 2021

iFA88 commented May 20, 2021

ajkr commented May 20, 2021

iFA88 commented May 22, 2021

jay-zhuang commented May 26, 2021

jay-zhuang commented May 26, 2021

iFA88 commented Jul 12, 2021

iFA88 commented Jul 15, 2021

iFA88 commented Jul 16, 2021 •

edited

solicomo commented May 22, 2023

Avoid trivial move if SST file is far smaller than the target size #8306

Avoid trivial move if SST file is far smaller than the target size #8306

Comments

iFA88 commented May 17, 2021

jay-zhuang commented May 17, 2021 • edited

iFA88 commented May 18, 2021

jay-zhuang commented May 18, 2021

iFA88 commented May 20, 2021

jay-zhuang commented May 20, 2021

iFA88 commented May 20, 2021

jay-zhuang commented May 20, 2021

iFA88 commented May 20, 2021

ajkr commented May 20, 2021

iFA88 commented May 22, 2021

jay-zhuang commented May 26, 2021

jay-zhuang commented May 26, 2021

iFA88 commented Jul 12, 2021

iFA88 commented Jul 15, 2021

iFA88 commented Jul 16, 2021 • edited

solicomo commented May 22, 2023

jay-zhuang commented May 17, 2021 •

edited

iFA88 commented Jul 16, 2021 •

edited