Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Fractal trees store messages in non-leaf buffers. Optimize table pushes a broadcast optimize messages into the root of the fractal tree. Optimize table also flushes messages that are buffered in the non-leaf nodes of the fractal tree to the leaf nodes. When an optimize message is executed on a leaf entry, it cause the leaf entry to be garbage collected.
When optimize table completes, the fractal tree will have most if not all of the buffered messages pushed down to the leaf nodes where they are applied to the leaf entries.
MySQL 5.5 schedules optimize table with a shared no read no write meta data lock. The MDL_SHARED_NO_READ_WRITE meta data locking level is used. This lock prohibits concurrent reads and concurrent writes to the table while the table is being optimized.
TokuDB allows concurrent reads and writes to the table while the table is being optimized. We added a handler flag (HA_CAN_WRITE_DURING_OPTIMIZE) which TokuDB sets. This causes MySQL to downgrade the meta data lock from MDL_SHARED_NO_READ_WRITE to MDL_SHARED_WRITE, which allows concurrent reads and writes.
For MySQL 5.6 and MariaDB 10, we translate optimize table to alter table reorganize. This translation leverages the MDL manipulation built into the inplace alter table code, and this allows use to provide online optimize table without a patch to the MySQL and MariaDB code.
While optimize table is running, it periodically updates the process info with its progress.
Table optimization is a necessary evil; tables sometimes need to be optimized to reclaim space or to improve query performance. Unfortunately, MySQL blocks writes to a table while it is being optimized. Because optimization time is proportional to the table size, writes can be blocked for a long time. Fractal Tree indexes support online optimization; however, the MySQL metadata lock gets in the way of writing while optimizing. We will describe a simple patch to MySQL that enables online optimization of TokuDB tables.
Why do tables need to be optimized? Here are some reasons.
- Insertions with random keys can result in a tree with underutilized leaf blocks. Many tree algorithms split nodes in half when they become full. If these nodes are stored in fixed sized blocks, like many B-trees do, then there can be a lot of wasted space. Table optimization of B-trees write blocks with less fragmentation. In contrast, Fractal Tree indexes do not have this problem since we use variable sized blocks.
- As B-tree’s age, the leaf nodes that are adjacent in the key sort order are spread all over the disk. As a result, range queries spend a lot of time waiting on disk seeks. Table optimization builds a new tree with leaf blocks written in sort order. In contrast, Fractal Tree indexes do not have this problem since we write very large blocks.
- TokuDB’s Fractal Tree indexes may need to be optimized to take advantage of all the latest and greatest algorithms delivered in newer versions of the software. For example, basement nodes were recently introduced into the TokuDB product. Table optimization converts blocks from the old format to the new format, which results in much better performance.
To keep this operation hot, we added a storage engine flag that states that the storage engine can optimize a table with concurrent reads and writes. If this flag is set for the optimize operation, the table’s metadata lock is downgraded to a level that allows concurrent reads and writes to the table. Since the TokuDB storage engine implements this flag, TokuDB tables can be read and written while optimize is running. A change to enable hot table optimization for TokuDB (and we assume other storage engines), can be found in this patch, which we think could be useful for both MySQL v5.5 as well as the upcoming MySQL v5.6.