Skip to content

MyRocks optimizer stats

Reid Horuff edited this page Mar 16, 2017 · 5 revisions

Every sst file contains data which belongs to multiple SQL indexes. To compute optimizer stats, we store the following information per index, per sst file:

  • number of rows
  • size of the index on disk (ie compressed)
  • raw data size
  • index cardinality (ie a number of distinct keys for each prefix size)

Since SST files are immutable, this data is computed once, stored in a file and is never changed again. Since a single index is split among multiple sst files, MyRocks merges per-index stats of all files to come up with a global index statistics. Ie, every time a file is saved, its stats are added to global per-index stats. Every time a file is dropped, its stats are deducted from per index stats. It is possible to recreate stats for all indexes in a table through analyze table statement.

To approximate a number of records between two keys, MyRocks gets an approximate data size between such two keys then divides this value by the average record size for that table to arrive at an approximate record count for that key range.

For indexes which are located in memtable and haven't been flushed to disk a special logic is implemented. Each record is approximated at 100 bytes to get a number of rows from the size data occupies in memtable.

Adding stats for the same index from different files is a good approximation, but it has some issues. Multiple sst files might contain different values for the same key. Some of these values might be delete markers (ie tombstones). For instance, after massive deletes an LSM tree might contain both values and tombstones for each key.

There is also the table information_schema.rocksdb_index_file_map which contains information pertaining to the size and count of records residing in SST files.

Clone this wiki locally