Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MyRocks data size is greater than InnoDB #44

Closed
hermanlee opened this issue Sep 28, 2015 · 12 comments
Closed

MyRocks data size is greater than InnoDB #44

hermanlee opened this issue Sep 28, 2015 · 12 comments
Labels

Comments

@hermanlee
Copy link
Contributor

Issue by BohuTANG
Thursday Jun 04, 2015 at 11:20 GMT
Originally opened as MySQLOnRocksDB#80


From our benchmarks under the same datasets for MyRocks/InnoDB/TokuDB, data sizes are:

MyRocks: 43GB ( the ./rocksdb dir)
InnoDB:   33GB (without compress)
TokuDB:  15GB (compression is zlib, and compress-ratio is 2, so the raw abous 30GB)

All configuration of MyRocks is in defaults, the 'show engine rocksdb status' as follows:

mysql> show engine rocksdb status\G;
*************************** 1. row ***************************
  Type: DBSTATS
  Name: rocksdb
Status: 
** DB Stats **
Uptime(secs): 79985.6 total, 1704.4 interval
Cumulative writes: 54K writes, 280M keys, 54K batches, 1.0 writes per batch, ingest: 27.78 GB, 0.36 MB/s
Cumulative WAL: 54K writes, 54K syncs, 1.00 writes per sync, written: 27.78 GB, 0.36 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 0 writes, 0 keys, 0 batches, 0.0 writes per batch, ingest: 0.00 MB, 0.00 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent

*************************** 2. row ***************************
  Type: CF_COMPACTION
  Name: __system__
Status: 
** Compaction Stats [__system__] **
Level    Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)  KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      3.3         0       222    0.002          0       0      0
  L1      1/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.9     12.8      6.1         0        56    0.004          0    110K   110K
 Sum      1/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.9      4.2      4.2         1       278    0.002          0    110K   110K
 Int      0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000          0       0      0
Flush(GB): cumulative 0.002, interval 0.000
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard

*************************** 3. row ***************************
  Type: CF_COMPACTION
  Name: default
Status: 
** Compaction Stats [default] **
Level    Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)  KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      2/0          1   0.5      0.0     0.0      0.0      27.7     27.7       0.0   0.0      0.0     62.3       456      7841    0.058          0       0      0
  L1      8/0          8   0.8     30.8    27.8      3.0      30.7     27.7       0.0   1.1     41.8     41.7       753      3239    0.233          0    280M      0
  L2     68/0         99   1.0      0.0     0.0      0.0       0.0      0.0      27.7   0.0      0.0      0.0         0         0    0.000          0       0      0
  L3    543/0        998   1.0      0.0     0.0      0.0       0.0      0.0      27.7   0.0      0.0      0.0         0         0    0.000          0       0      0
  L4   5078/0       9998   1.0      4.7     3.2      1.5       4.5      3.1      24.5   1.4     30.1     29.3       158       786    0.202          0     57M   669K
  L5  15910/0      31843   0.3     45.9    24.3     21.5      45.3     23.7       3.3   1.9     19.4     19.1      2427      4600    0.528          0    252M  1024K
 Sum  21609/0      42947   0.0     81.3    55.3     26.0     108.2     82.2      83.1   3.9     21.9     29.2      3794     16466    0.230          0    589M  1693K
 Int      0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000          0       0      0
Flush(GB): cumulative 27.749, interval 0.000
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard

3 rows in set (0.00 sec)

and

mysql> select * from ROCKSDB_CF_OPTIONS where value like '%snappy%'\G;
*************************** 1. row ***************************
    CF_NAME: __system__
OPTION_TYPE: COMPRESSION_TYPE
      VALUE: kSnappyCompression
*************************** 2. row ***************************
    CF_NAME: default
OPTION_TYPE: COMPRESSION_TYPE
      VALUE: kSnappyCompression
2 rows in set (0.00 sec)

ERROR: 
No query specified
@hermanlee
Copy link
Contributor Author

Comment by mdcallag
Thursday Jun 04, 2015 at 11:22 GMT


Excellent, can you also send me the values of any rocksdb config options set in my.cnf? Will take me a few hours to respond.

@hermanlee
Copy link
Contributor Author

Comment by BohuTANG
Thursday Jun 04, 2015 at 11:52 GMT


There is no rocksdb configuration in my.cnf, so all is default.

@hermanlee
Copy link
Contributor Author

Comment by mdcallag
Thursday Jun 04, 2015 at 13:28 GMT


Can you tell me what is in your RocksDB LOG file (name "LOG") for "Compression algorithms supported"? Mine shows:

2015/06/04-04:33:23.366528 7faff86d38c0 Compression algorithms supported:
2015/06/04-04:33:23.366530 7faff86d38c0 Snappy supported: 1
2015/06/04-04:33:23.366531 7faff86d38c0 Zlib supported: 1
2015/06/04-04:33:23.366540 7faff86d38c0 Bzip supported: 1
2015/06/04-04:33:23.366542 7faff86d38c0 LZ4 supported: 1

@hermanlee
Copy link
Contributor Author

Comment by yoshinorim
Thursday Jun 04, 2015 at 16:25 GMT


Could you try the following my.cnf settings and share results?
This makes MyRocks use zlib level 2 compression for most levels (compression_per_level and compression_opts), and locates files more efficiently (level_compaction_dynamic_level_bytes). By default RocksDB block size is 4KB, and increasing to 16KB will reduce some space.

rocksdb_block_size=16384
rocksdb_max_total_wal_size=4096000000
rocksdb_block_cache_size=12G
rocksdb_default_cf_options=write_buffer_size=128m;target_file_size_base=32m;max_bytes_for_level_base=512m;level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=10;level0_stop_writes_trigger=15;max_write_buffer_number=4;compression_per_level=kNoCompression:kNoCompression:kSnappyCompression:kZlibCompression:kZlibCompression:kZlibCompression:kZlibCompression;compression_opts=-14:2:0;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0;};prefix_extractor=capped:20;level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true

@hermanlee
Copy link
Contributor Author

Comment by mdcallag
Thursday Jun 04, 2015 at 17:12 GMT


Yoshi - before trying to tune we need to confirm that compression was enabled during his RocksDB build. Then we can tune. MyRocks has a lousy RocksDB configuration and this issue can be kept open for that. Using my test server the defaults are:

Started an instance locally with default my.cnf:
Options.max_open_files: 5000
Options.max_background_compactions: 1
Options.max_background_flushes: 1
--> max_open_files should be larger, max_background_compactions and max_background_flushes should be >= 4 for many systems

Compression algorithms supported:
Snappy supported: 1
Zlib supported: 1
Bzip supported: 1
LZ4 supported: 1

cache_index_and_filter_blocks: 1
index_type: 0
hash_index_allow_collision: 1
checksum: 1
no_block_cache: 0
block_cache: 0x7faff4c88078
block_cache_size: 8388608
block_cache_compressed: (nil)
block_size: 4096
block_size_deviation: 10
block_restart_interval: 16
filter_policy: nullptr
format_version: 2
--> block_size should be larger, since many blocks will be compressed and 4kb compressed to much less than 4kb can waste IO when file system page size is 4k

Options.write_buffer_size: 4194304
Options.max_write_buffer_number: 2
Options.compression: Snappy
Options.num_levels: 7
--> should use a larger value for write_buffer_size, default is 4M, maybe 64M

Options.min_write_buffer_number_to_merge: 1
--> probably OK

Options.level0_file_num_compaction_trigger: 4
Options.level0_slowdown_writes_trigger: 20
Options.level0_stop_writes_trigger: 24
--> probably OK

Options.target_file_size_base: 2097152
Options.max_bytes_for_level_base: 10485760
--> ugh, maybe 32MB for target_file_size_base and 512MB for max_bytes_for_level_base. Default here means that sizeof(L0) is 10M

Options.level_compaction_dynamic_level_bytes: 0
--> we want this to be 1

Options.soft_rate_limit: 0.00
Options.hard_rate_limit: 0.00
--> want these to be set, maybe 2.5 for soft and 3.0 for hard

@hermanlee
Copy link
Contributor Author

Comment by BohuTANG
Friday Jun 05, 2015 at 09:31 GMT


This issue due to: *.sst not properly cleaned on DROP DATABASE.
I clean the .rocksdb dir and re-install database and to the same benchmark with yoshinorim's configurations, it's OK now to me:
datasize: 19GB (snappy)

one 33MB sst dump:

$./sst_dump --show_properties --file=../../myrocks_mysql/data/.rocksdb/002120.sst
from [] to []
Process ../../myrocks_mysql/data/.rocksdb/002120.sst
Sst file format: block-based
Table Properties:
------------------------------
  # data blocks: 3840
  # entries: 310960
  raw key size: 4975360
  raw average key size: 16.000000
  raw value size: 57838560
  raw average value size: 186.000000
  data block size: 33559362
  index block size: 126490
  filter block size: 0
  (estimated) table size: 33685852
  filter policy name: rocksdb.BuiltinBloomFilter
  # deleted keys: 0

(4975360+57838560)/33559362 ~1.8X ratio

Another question: how to find the mapping between table and *.sst files?

@hermanlee
Copy link
Contributor Author

Comment by mdcallag
Friday Jun 05, 2015 at 13:46 GMT


Space is eventually reclaimed as compaction runs. If one table accounts for
the majority of space then that will be easy to notice. Yoshi might know
whether there is a way to force manual compaction after running DROP TABLE.

There is also a way to use different column families for different tables
which can make this easier to manage and monitor. I don't know whether we
have documented that yet but I will share some details soon.

We are working, or have worked, on ways to gather per-table metrics when
many tables are in the same column family. It is a hard problem, but
required when most tables are in one column family. I wil ask internally
about the status of that.

On Fri, Jun 5, 2015 at 5:31 AM, BohuTANG notifications@github.com wrote:

This issue due to: *.sst not properly cleaned on DROP DATABASE.
I clean the .rocksdb dir and re-install database and to the same benchmark
with yoshinorim's configurations, it's OK now to me:
datasize: 19GB (snappy)

Another question: how to find the mapping between table and *.sst files?


Reply to this email directly or view it on GitHub
MySQLOnRocksDB#80 (comment)
.

Mark Callaghan
mdcallag@gmail.com

@hermanlee
Copy link
Contributor Author

Comment by yoshinorim
Friday Jun 05, 2015 at 15:05 GMT


We have not started implementing mappings between table and *.sst files yet. I'll file another task to track this.

@hermanlee
Copy link
Contributor Author

Comment by hermanlee
Friday Jun 05, 2015 at 16:00 GMT


We can also dump out some of the rocksdb configuration options out through information schema, rather than have to look through the rocksdb LOG file:

select * from information_schema.rocksdb_cf_options;

The db options are mostly available through:

show global variables like 'rocksdb%';

@hermanlee
Copy link
Contributor Author

Comment by yoshinorim
Friday Jun 05, 2015 at 17:54 GMT


@BohuTANG : BTW you can get MyRocks each table size via usual MySQL commands (SHOW TABLE STATUS or SELECT FROM information_schema.tables). Use these commands and compare compression ratio between tables. MyRocks calculates statistics every 600 seconds, and can be configured via rocksdb_stats_dump_period_sec global variable. And note that SHOW TABLE STATUS / I_S do not include size in Memstore (we're working in progress to include size from Memstore, not only from *sst).

@hermanlee
Copy link
Contributor Author

Comment by maykov
Friday Jun 05, 2015 at 19:16 GMT


BohuTANG, optimize table t1; will run manual compaction for the table. However, if you already dropped the table, I can't think of an easy way to trigger compaction. One thing which you can do is to stop mysql and then use the ldb tool to run compaction. If the space is important than the deletion speed, may be you can do truncate table t1;optimize table t1;drop table t1; after https://reviews.facebook.net/D39579 is pushed.

There is no correspondence between .sst files and tables or databases. The data is spread out among sst files in the order of insertion and then intermixed through compaction process.

Yoshi, I have this task: MySQLOnRocksDB#55 to expose what is stored in each sst files through the information schema.

@yoshinorim
Copy link
Contributor

We're aware of an issue that DROP TABLE does not claim space correctly. We're working on a fix and issue#60 is tracking the problem. Closing this issue and will update at #60.

spetrunia added a commit that referenced this issue Jan 5, 2016
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Test Plan: Added unit test, enabled MTR tests

Reviewers: maykov, jtolmer, yoshinorim, hermanlee4

Reviewed By: hermanlee4

Differential Revision: https://reviews.facebook.net/D38265
hermanlee pushed a commit that referenced this issue Jan 31, 2017
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Test Plan: Added unit test, enabled MTR tests

Reviewers: maykov, jtolmer, yoshinorim, hermanlee4

Reviewed By: hermanlee4

Differential Revision: https://reviews.facebook.net/D38265
VitaliyLi pushed a commit to VitaliyLi/mysql-5.6 that referenced this issue Feb 9, 2017
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Test Plan: Added unit test, enabled MTR tests

Reviewers: maykov, jtolmer, yoshinorim, hermanlee4

Reviewed By: hermanlee4

Differential Revision: https://reviews.facebook.net/D38265
facebook-github-bot pushed a commit that referenced this issue Dec 23, 2019
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: a9cdd12
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Aug 12, 2020
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: a9cdd12
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Sep 9, 2020
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: a9cdd12
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Sep 16, 2020
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: a9cdd12
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Oct 5, 2020
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: a9cdd12
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Nov 11, 2020
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: a9cdd12
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Mar 11, 2021
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: 374184b65bd
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Aug 16, 2021
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: 374184b65bd
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Aug 30, 2021
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: 374184b65bd
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Sep 1, 2021
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: 374184b65bd
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Sep 2, 2021
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265

fbshipit-source-id: 374184b65bd
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Jan 17, 2022
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Apr 26, 2022
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265
laurynas-biveinis pushed a commit to laurynas-biveinis/mysql-5.6 that referenced this issue Aug 11, 2022
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Mar 28, 2023
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Jun 1, 2023
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265
inikep pushed a commit to inikep/mysql-5.6 that referenced this issue Jun 14, 2023
Summary:
Add support for read and read write locks in RocksDB's LockTable.
The implementation is more concerned with correctness than with
concurrency.

Differential Revision: https://reviews.facebook.net/D38265
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants