Skip to content

Data Loading

Yoshinori Matsunobu edited this page Aug 31, 2020 · 5 revisions

One of the most common causes of stalls in LSM-tree databases like RocksDB was burst writes. Several types of data migration jobs, such as online schema changes and data loading, generated massive data writes. For example, the InnoDB to MyRocks migration needed to dump from InnoDB and load into MyRocks. MemTable flush and compaction are unable to keep up with heavy write ingestion rates. Writes, including those from user queries, would then be stalled until flush and compaction has made sufficient progress. Throttling was one way to mitigate, but it increased overall migration time.

To optimize burst writes and ensure they do not interfere with user queries, we implemented a bulk loading capability in RocksDB and MyRocks takes advantage of it. With bulk loading, MyRocks used RocksDB File Ingestion API to create SST files. The new SST files are directly ingested into Lmax, automatically and atomically updating the RocksDB Manifest to reflect the data files. Using Bulk Loading allows data migration jobs to bypass the MemTable and compactions through each RocksDB sorted run level, thus eliminating any write stalls from those applications. Bulk loading requires that ingested key ranges never overlap with existing data. For us, massive data writes typically occurred when creating new tables, so bulk loading supports this scenario. Below figure was a benchmark to load our test tables, which had both primary and secondary keys, into InnoDB and MyRocks, with and without bulk loading. X-axis was the number of tables to load in parallel, and Y-axis was rows inserted per second. With MyRocks bulk loading, insert throughput linearly scaled at least up to 20 tables and throughput was higher than InnoDB by 2.2 times (with one table, one concurrency) to 5.7 times (with 20 tables, 20 concurrency). Bulk loading eliminated stalls caused by these migration jobs.

BulkLoadingBench

In this page, we'll explain how to load data into MyRocks tables with bulk loading.

Sorted bulk loading (recommended)

If your source data is guaranteed to be loaded in primary key order, then this method is recommended. Sorted bulk loading steps are as follows.

  1. Creating a source data ordered by primary key

For example, if you migrate from a table in a different instance, you can dump a table ordered by primary key by running mysqldump with --order-by-primary.

  1. Creating empty MyRocks tables Bulk loading supports secondary indexes too. However, destination table must be empty if the table has secondary indexes. This is because RocksDB Bulk Loading API expects key ranges don't overlap.

  2. Loading into tables with Bulk Loading enabled

First, run the following commands to change session variables.

SET SESSION sql_log_bin=0;
SET SESSION rocksdb_bulk_load_allow_sk=1;
SET SESSION rocksdb_bulk_load=1;

Using sql_log_bin=0 avoids writing to binary logs. We recommend using bulk loading without going through binary logging (and replication). If you run bulk loading with binary logging, replicas will hit severe lag. This is because rocksdb_bulk_load* session variables are not written to binary logs, so replica instances run data loading without bulk loading, which will take much longer time.

rocksdb_bulk_load is a session variable to enable bulk loading. With rocksdb_bulk_load=1, MyRocks enters special mode to write all inserts into bottommost RocksDB levels, and skips writing data into MemTable and the following compactions. This is very efficient way to load data.

rocksdb_bulk_load_allow_sk is a session variable to enable bulk loading for secondary keys. Without rocksdb_bulk_load_allow_sk, only primary key is executed under bulk loading, and secondary keys go through regular MemTable/WAL/Flush/Compaction write paths. With rocksdb_bulk_load_allow_sk, MyRocks creates temporary files (under rocksdb_tmpdir) for loading into secondary keys in sorted order, and loading into secondary keys in batch when finishing bulk loading. MyRocks determines bulk loading ends by LOAD DATA into a different table, or when disabling rocksdb_bulk_load_allow_sk.

Loading into primary keys can be done by typical bulk loading commands such as bulk INSERTS (e.g. mysqldump output) or LOAD DATA.

LOAD DATA INFILE ... INTO TABLE t1 ...;
LOAD DATA INFILE ... INTO TABLE t2 ...;
LOAD DATA INFILE ... INTO TABLE t3 ...;

After finished loading into tables, disable bulk loading commands.

SET SESSION rocksdb_bulk_load=0;
SET SESSION rocksdb_bulk_load_allow_sk=0;

When converting from large MyISAM/InnoDB tables, you can create MyRocks tables as below.

SET session sql_log_bin=0;
SET SESSION rocksdb_bulk_load_allow_sk=1;
SET session rocksdb_bulk_load=1;
ALTER TABLE large_myisam_table ENGINE=RocksDB;
SET session rocksdb_bulk_load=0;
SET SESSION rocksdb_bulk_load_allow_sk=0;

The rocksdb_bulk_load mode operates with a few conditions:

  1. None of the data being bulk loaded can overlap with existing data in the table. The easiest way to ensure this is to always bulk load into an empty table, but the mode will allow loading some data into the table, doing other operations, and then returning and bulk loading addition data if there is no overlap between what is being loaded and what already exists.

  2. The data may not be visible until bulk load mode is ended (i.e. the rocksdb_bulk_load is set to zero again). The method that is used is building up SST files which will later be added as-is to the database. Until a particular SST has been added the data will not be visible to the rest of the system, thus issuing a SELECT on the table currently being bulk loaded will only show older data and will likely not show the most recently added rows. Ending the bulk load mode will cause the most recent SST file to be added. When bulk loading multiple tables, starting a new table will trigger the code to add the most recent SST file to the system -- as a result, it is inadvisable to interleave INSERT statements to two or more tables during bulk load mode.

By default, the rocksdb_bulk_load mode expects all data be inserted in primary key order (or reversed order). If the data is in the reverse order (i.e. the data is descending on a normally ordered primary key or is ascending on a reverse ordered primary key), the rows are cached in chunks to switch the order to match the expected order.

Inserting one or more rows out of order will result in an error and may result in some of the data being inserted in the table and some not. To resolve the problem, one can either fix the data order of the insert, truncate the table, and restart.

Unsorted bulk loading

If your data is not ordered in primary key order, then this method is recommended. With this method, secondary keys do not need to be dropped and restored. However, writing to the primary key no longer goes directly to SST files, and are written to temporary files for sorted first, so there is extra cost to this method.

To allow for loading unsorted data:

SET session sql_log_bin=0;
SET SESSION rocksdb_bulk_load_allow_sk=1;
SET session rocksdb_bulk_load_allow_unsorted=1;
SET session rocksdb_bulk_load=1;
...
SET session rocksdb_bulk_load=0;
SET session rocksdb_bulk_load_allow_unsorted=0;
SET SESSION rocksdb_bulk_load_allow_sk=0;

Note that rocksdb_bulk_load_allow_unsorted can only be changed with rocksdb_bulk_load is disabled (set to 0). In this case, all input data will go through an intermediate step that writes the rows to temporary SST files, sorts them rows in the primary key order, and then writes to final SST files in the correct order.

Other Approaches

If rocksdb_bulk_load does not fit your needs there are two other settings that might help: unique_check and rocksdb_commit_in_the_middle. By default unique_check is enabled and rocksdb_commit_in_the_middle is disabled.

If unique_check is disabled, MyRocks skips unique constraint checking. In other words, you have to make sure by yourself that source data doesn't break unique constraint. If violating unique constraint, older rows are silently overwritten by newer rows with the same key. REPLACE and INSERT ON DUPLICATE KEY UPDATE statements are not allowed with this option (errors are returned).

If rocksdb_commit_in_the_middle is enabled, MyRocks implicitly commits every rocksdb_bulk_load_size records (default is 1,000) in the middle of your transaction. If your data loading fails in the middle of the statement (LOAD DATA or bulk INSERT), rows are not entirely rolled back, but some of rows are stored in the table. To restart data loading, you'll need to truncate the table and loading data again.

If you are loading large data without enabling rocksdb_bulk_load or rocksdb_commit_in_the_middle, please make sure transaction size is small enough. All modifications of the ongoing transactions are kept in memory.

Clone this wiki locally