Skip to content

my.cnf tuning

Yoshinori Matsunobu edited this page Sep 6, 2020 · 21 revisions

MyRocks configuration example for uses other than linkbench

[mysqld]
rocksdb
default-storage-engine=rocksdb
skip-innodb
default-tmp-storage-engine=MyISAM
binlog_format=ROW
collation-server=latin1_bin
transaction-isolation=READ-COMMITTED

rocksdb_max_open_files=-1
rocksdb_wal_recovery_mode=2
rocksdb_max_background_jobs=8
rocksdb_max_total_wal_size=4G
rocksdb_block_size=16384
rocksdb_block_cache_size=32G
rocksdb_table_cache_numshardbits=6

# rate limiter
rocksdb_bytes_per_sync=4194304
rocksdb_wal_bytes_per_sync=4194304
rocksdb_rate_limiter_bytes_per_sec=104857600 #100MB/s. Increase if you're running on higher spec machines

# triggering compaction if there are many sequential deletes (Deletion Triggered Compaction
rocksdb_compaction_sequential_deletes_count_sd=1
rocksdb_compaction_sequential_deletes=199999
rocksdb_compaction_sequential_deletes_window=200000

# read free replication
rocksdb_rpl_lookup_rows=0

rocksdb_default_cf_options=write_buffer_size=128m;target_file_size_base=32m;max_bytes_for_level_base=512m;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;max_write_buffer_number=2;compression_per_level=kLZ4Compression;bottommost_compression=kZSTD;compression_opts=-14:6:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;compaction_pri=kMinOverlappingRatio

MyRocks configuration example for linkbench

[mysqld]
rocksdb
default-storage-engine=rocksdb
skip-innodb
default-tmp-storage-engine=MyISAM
binlog_format=ROW
collation-server=latin1_bin
transaction-isolation=READ-COMMITTED

rocksdb_max_open_files=-1
rocksdb_wal_recovery_mode=2
rocksdb_max_background_jobs=8
rocksdb_max_total_wal_size=4G
rocksdb_block_size=16384
rocksdb_block_cache_size=32G
rocksdb_table_cache_numshardbits=6

# rate limiter
rocksdb_bytes_per_sync=4194304
rocksdb_wal_bytes_per_sync=4194304
rocksdb_rate_limiter_bytes_per_sec=104857600 #100MB/s

# triggering compaction if there are many sequential deletes
rocksdb_compaction_sequential_deletes_count_sd=1
rocksdb_compaction_sequential_deletes=199999
rocksdb_compaction_sequential_deletes_window=200000

# read free replication
rocksdb_rpl_lookup_rows=0

rocksdb_default_cf_options=write_buffer_size=128m;target_file_size_base=32m;max_bytes_for_level_base=512m;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;max_write_buffer_number=2;compression_per_level=kLZ4Compression;bottommost_compression=kZSTD;compression_opts=-14:6:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0};level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;compaction_pri=kMinOverlappingRatio

rocksdb_override_cf_options=cf_link_pk={prefix_extractor=capped:20};rev:cf_link_id1_type={prefix_extractor=capped:20}

Tuning Tips

  • Character Sets
    • MyRocks gives better performance with case sensitive collations (latin1_bin, utf8_bin, binary).
  • Transaction
    • Read Committed isolation level is recommended. MyRocks's transaction isolation implementation is different from InnoDB, but close to PostgreSQL. Default tx isolation in PostgreSQL is Read Committed.
  • Compression
    • Set kNoCompression (or kLZ4Compression) on L0-1 or L0-2
    • In the bottommost level, using stronger compression algorithm (ZSTD) is recommended.
    • For ZSTD compression, set compression level accordingly. The above example (compression_opts=-14:1:0) uses ZSTD compression level 1. If your application is not write intensive, setting (compression_opts=-14:6:0) will give better space savings (using ZSTD compression level 6).
    • For other levels, set kLZ4Compression.
  • Data blocks, files and compactions
    • Set level_compaction_dynamic_level_bytes=true
    • Set proper rocksdb_block_size (default 4096). Larger block size will reduce space but increase CPU overhead because MyRocks has to uncompress many more bytes. There is a trade-off between space and CPU usage.
    • Set rocksdb_max_open_files=-1. If setting greater than 0, RocksDB still uses table_cache, which will lock a mutex every time you access the file. I think you'll see much greater benefit with -1 because then you will not need to go through LRUCache to get the table you need.
    • Set reasonable rocksdb_max_background_jobs
    • Set not small target_file_size_base (32MB is generally sufficient). Default is 4MB, which is generally too small and creates too many sst files. Too many sst files makes operations more difficult.
    • Set Rate Limiter. Without rate limiter, compaction very often writes 300~500MB/s on pure flash, which may cause short stalls. On 4x MyRocks testing, 40MB/s rate limiter per instance gave pretty stable results (less than 200MB/s peak from iostat).
  • Bloom Filter
    • Configure bloom filter and Prefix Extractor. Full Filter is recommended (Block based filter does not work for Get() + prefix bloom). Prefix extractor can be configured per column family and uses the first prefix_extractor bits as the key. If using one BIGINT column as a primary key, recommended bloom filter size is 12 (first 4 bytes are for internal index id + 8 byte BIGINT).
    • Configure Memtable bloom filter. Memtable bloom filter is useful to reduce CPU usage, if you see high CPU usage at rocksdb::MemTable::KeyComparator. Size depends on Memtable size. Set memtable_prefix_bloom_bits=41943040 for 128MB Memtable (30/128M=4M keys * 10 bits per key)
  • Cache
    • Do not set block_cache at rocksdb_default_cf_options (block_based_table_factory). If you do provide a block cache size on a default column family, the same cache is NOT reused for all such column families.
    • Consider setting shared write buffer size (db_write_buffer_size)
    • Consider using compaction_pri=kMinOverlappingRatio for writing less on compaction.
    • Newer MyRocks/RocksDB has an option rocksdb_cache_dump=OFF, which reduces jemalloc metadata fragmentation (using smaller memory in total).
  • Newer RocksDB revision has several optimizations for rocksdb_default_cf_options. For example:
    • max_write_buffer_size_to_maintain=16m will stop wasting more than 16m immutable MemTable size, which reduces overall memory usage.
    • Index block size reduction and perf -- block_based_table_factory={cache_index_and_filter_blocks=1;enable_index_compression=false;format_version=4;index_block_restart_interval=16;filter_policy=bloomfilter:10:false;whole_key_filtering=0} -- These block_based_table_factory options helps to save index block size, and an overhead to load (CPU) on loading index blocks.
    • ttl=0;periodic_compaction_seconds=0 -- controlling Periodic Compaction (triggering compactions for old SST files).

Verifying parameters

To verify if configurations are set correctly, view LOG file and search parameter name. LOG file is located at $datadir/.rocksdb/LOG.

Clone this wiki locally