Skip to content

Faster Deletes

Manuel edited this page May 17, 2017 · 5 revisions

MyRocks provides session variables to make deletes faster, by sacrificing some transaction semantics:

  • Skip holding row locks (rocksdb-master-skip-tx-api)
  • Blind Deletes by primary key (rocksdb-blind-delete-primary-key)
  • Implicit commits on range deletes (rocksdb-commit-in-the-middle)

Skipping holding row lock

With rocksdb-master-skip-tx-api session variable enabled, MyRocks does not use RocksDB's Transaction API and uses RocksDB's WriteBatch API instead. Since WriteBatch API is significantly faster than Transaction API, you'll be able to update or delete a lot more rows per second. However, this comes with an important limitation. With rocksdb-master-skip-tx-api enabled, the session no longer holds any row lock. Therefore, it is the responsibility of the user to make sure that row updates/deletes will not conflict. Improper use of this feature can cause data corruption/inconsistency.

Blind Deletes by primary key

With the rocksdb-blind-delete-primary-key session variable enabled, MyRocks skips verifying if rows exist or not before executing deletes. This will speed up deletion speed. This is called "Blind Delete". Blind Delete is enabled if ALL of the following conditions are met.

  1. rocksdb-blind-delete-primary-key session variable is ON

  2. Target table has a primary key only and no secondary keys

  3. Deletes by primary key

  4. Only single table is given in the Delete statement

For example, Blind Delete is not enabled on DELETE .. WHERE id < 10. If Blind Delete is not enabled, MyRocks just does traditional deletes instead (looking up rows and issuing deletes if rows exist).

If there are many known primary keys to delete, you can get very high delete throughput by combining rocksdb-blind-delete-primary-key and rocksdb-master-skip-tx-api session variables. Bulk Delete syntax is highly recommended to eliminate query parsing overhead (i.e. DELETE FROM t WHERE id IN (1, 2, 3, 4, 5, 6, ...., 10000)).

With Blind Deletes, you need to be careful not to execute Deletes for the same primary keys multiple times. MyRocks no longer checks if keys exist or not, and just issues Deletes for the given primary keys. If you execute Delete for the same key many times, it will generate many tombstones in RocksDB and it may slow down read performance. On slaves, you will need to configure Read Free Replication so that multiple deletes for the same key don't stop replication.

Implicit commits on range deletes

If you want to delete lots of rows and primary keys are not known, rocksdb-commit-in-the-middle session variable is helpful. With this session variable enabled, MyRocks implicitly commits transactions every 1000 (configurable as rocksdb_bulk_load_size) rows. In general, modifying too many rows within single transaction causes performance and memory consumption issues. By implicitly committing transactions, bulk modification operations by single query can be a lot easier.

If statements fail, rows can be partially committed. So you should not enable this variable if you can't safely run statements repeatedly.

Using commit in the middle in transactions is also not recommended, because it will implicitly commit your transaction making rollbacks impossible. Also, commit in the middle uses a separate iterator to scan for keys, so any modifications local to the current transaction will not be visible in that same transaction.

Clone this wiki locally