Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BackupEngine feature to exclude files #11030

Closed
wants to merge 10 commits into from

Conversation

pdillinger
Copy link
Contributor

Summary: We have a request for RocksDB to essentially support
disconnected incremental backup. In other words, if there is limited
or no connectivity to the primary backup dir, we should still be able to
take an incremental backup relative to that primary backup dir,
assuming some metadata about that primary dir is available (and
obviously anticipating primary backup dir will be fully available if
restore is needed).

To support that, this feature allows the API user to "exclude" DB
files from backup. This only applies to files that can be shared
between backups (sst and blob files), and excluded files are
tracked in the backup metadata sufficiently to ensure they are
restored at restore time. At restore time, the user provides
a set of alternate backup directories (as open BackupEngines, which
can be read-only), and excluded files must be found in one of the
backup directories ("included" in some backup).

This feature depends on backup schema version 2 features, though
schema version 2.0 support is not sufficient to read / restore a
backup with exclusions. This change updates the schema version to
2.1 because of this feature, so that it's easy to recognize whether
a RocksDB release supports this feature, while backups not using the
feature are fully compatible with 2.0.

Also in this PR:

  • Stacked on Fix bug updating latest backup on delete #11029
  • Allow progress_callback to be empty, not just no-op function, and
    recover from exceptions thrown by BackupEngine callbacks.
  • The internal-only AsBackupEngine() function is working around the
    diamond hierarchy of BackupEngineImplThreadSafe to get to the
    internals, without using confusing features like virtual inheritance.

Test Plan: unit tests added / updated

Summary: Previously, the "latest" valid backup would not be updated on
delete.

Test Plan: unit test included (added to an existing test for efficiency)
Summary: We have a request for RocksDB to essentially support
disconnected incremental backup. In other words, if the primary backup
dir is currently unavailable or limited , we should still be able to take an
incremental backup relative to that primary backup dir (anticipating it
will come back online) assuming some

To support that, this feature allows the API user to

Also in this PR:
* Stacked on facebook#11029

Test Plan: unit tests added / updated
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@ajkr
Copy link
Contributor

ajkr commented Dec 21, 2022

How does this approach compare to the user's Env presenting a "virtual" shared directory containing all files pooled across the various alternate_dirs? The approach in this PR appears to give flexibility to the application in deciding what files to exclude, and hides a detail that is not appropriate for the Env abstraction layer (which directory is the shared directory). A downside might be API complexity, probably for a single user feature. Are there other tradeoffs / what is the most compelling reason for this approach?

@pdillinger
Copy link
Contributor Author

How does this approach compare to the user's Env presenting a "virtual" shared directory containing all files pooled across the various alternate_dirs?

To the BackupEngine, files in the recognized directories that are not owned by any backup are garbage files, which are not usable as is and often deleted. To present such remote files in a way that would allow them to "work" for incremental backups, without similar changes to what is done here, you would also need to create fake or unavailable backups referencing those unavailable files.

Also discussing in private channels.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one design alternative mentioned in a comment. I don't know if it's better or feel strongly so am accepting anyways. LGTM, thanks!

try {
progress_callback();
} catch (const std::exception& exn) {
io_s = IOStatus::Aborted("Exception in exclude_files_callback: " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/exclude_files_callback/progress_callback/

break;
} catch (...) {
io_s =
IOStatus::Aborted("Unknown exception in exclude_files_callback");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines 2498 to 2510
if (excludable_items != nullptr && shared && shared_checksum &&
need_to_copy) {
ROCKS_LOG_INFO(options_.info_log, "Copying (if not excluded) %s to %s",
fname.c_str(), copy_dest_path->c_str());
excludable_items->emplace_back(std::move(copy_or_create_work_item),
std::move(after_copy_or_create_work_item));
} else {
ROCKS_LOG_INFO(options_.info_log, "Copying %s to %s", fname.c_str(),
copy_dest_path->c_str());
files_to_copy_or_create_.write(std::move(copy_or_create_work_item));
backup_items_to_finish.push_back(
std::move(after_copy_or_create_work_item));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding work items and processing them feels a bit more scattered. What if you queue the work items no matter excludable or not (still keeping track of excludable filenames and still invoking the callback in CreateNewBackup*()), then make the worker threads wait for the exclusion list to be ready and let them use it to skip file copies, and then call AddExcludedFile for excluded files in the finishing loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the file copying can begin while the checkpoint is being taken. I was worried about regressing that by forcing all file copying to wait for the checkpoint to finish, even though it's probably a minor difference.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in 02f2b20.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants