Skip to content

Clone SST Copy

Herman Lee edited this page Feb 17, 2023 · 2 revisions

MyRocks rolling checkpoint support does not fit well into the existing clone handlerton API. The rolling checkpoint should not happen too early in the clone process, so that the final checkpoint lifetime is minimized, and also it is not possible for it to happen too late: once the RocksDB file deletions are disabled and the participating storage engines are synchronized, the RocksDB checkpoint can no longer be rolled. This happens in the middle of InnoDB clone_copy. Thus, none of the existing clone handlerton APIs (clone_begin nor clone_copy) fit the copying of the rolling checkpoint.

Extend the donor-side handlerton clone API by a new precopy method:

  • using Clone_sstcopy_t = int (*)(handlerton *hton, THD *thd, const uchar *loc, uint loc_len, uint task_id, Ha_clone_cbk *cbk): similar to clone_copy, this method can do any non-consistent data copy that would make the eventual consistent data copy smaller. For MyRocks, this does the rolling checkpoint copy.

To perform the SST copy, introduce a new InnoDB clone stage between page and redo log copy stages, keeping the redo log archiver running. Alternative designs have been considered to do the precopy during the page -> log stage transition, and as a part of page copy stage. For the former, the drawback is that clone autotuning does not work during the state transition due to new thread admission being blocked while transition is underway. For the latter, there were clone restart correctness issues.

Thus the InnoDB clone stages become:

innodb-clone-patch-5

For the interplay between the added stage and Cross-Engine Synchronization for Clone, see Clone SST Copy and Synchronization Together.

Clone this wiki locally