Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New backup meta schema, with file temperatures #9660

Closed
wants to merge 5 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Mar 5, 2022

Summary:
The primary goal of this change is to add support for backing up and
restoring (applying on restore) file temperature metadata, without
committing to either the DB manifest or the FS reported "current"
temperatures being exclusive "source of truth".

To achieve this goal, we need to add temperature information to backup
metadata, which requires updated backup meta schema. Fortunately I
prepared for this in #8069, which began forward compatibility in version
6.19.0 for this kind of schema update. (Previously, backup meta schema
was not extensible! Making this schema update public will allow some
other "nice to have" features like taking backups with hard links, and
avoiding crc32c checksum computation when another checksum is already
available.) While schema version 2 is newly public, the default schema
version is still 1. Until we change the default, users will need to set
to 2 to enable features like temperature data backup+restore. New
metadata like temperature information will be ignored with a warning
in versions before this change and since 6.19.0. The metadata is
considered ignorable because a functioning DB can be restored without
it.

Some detail:

  • Some renaming because "future schema" is now just public schema 2.
  • Initialize some atomics in TestFs (linter reported)
  • Add temperature hint support to SstFileDumper (used by BackupEngine)

Test Plan: related unit test majorly updated for the new functionality,
including some shared testing support for tracking temperatures in a FS.

Some other tests and testing hooks into production code also updated for
making the backup meta schema change public.

Summary:
The primary goal of this change is to add support for backing up and
restoring (applying on restore) file temperature metadata, without
committing to either the DB manifest or the FS reported temperatures
being exclusive "source of truth".

To achieve this goal, we need to add temperature information to backup
metadata, which requires updated backup meta schema. Fortunately I
prepared for this in facebook#8609, which began forward compatibility in version
6.19.0 for this kind of schema update. (Previously, backup meta schema
was not extensible! Making this schema update public will allow some
other "nice to have" features like taking backups with hard links, and
avoiding crc32c checksum computation when another checksum is already
available.) While schema version 2 is newly public, the default schema
version is still 1. Until we change the default, users will need to set
to 2 to enable features like temperature data backup+restore. New
metadata like temperature information will be ignored with a warning
in versions before this change and since 6.19.0. The metadata is
considered ignorable because a functioning DB can be restored without
it.

Some detail:
* Some renaming because "future schema" is now just public schema 2.
* Initialize some atomics in TestFs (linter reported)

TODO?: consolidate terminology around either "actual" or "current"
rather than mixing the two.

Test Plan: related unit test majorly updated for the new functionality,
including some shared testing support for tracking temperatures in a FS.

Some other tests and testing hooks into production code also updated for
making the backup meta schema change public.
@pdillinger
Copy link
Contributor Author

pdillinger commented Mar 7, 2022

Requesting @ajkr review publishing backup schema_version=2 and @jay-zhuang look at temperature-related functionality and testing

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -1424,14 +1442,20 @@ IOStatus BackupEngineImpl::CreateNewBackupWithMetadata(
item.result.wait();
auto result = item.result.get();
item_io_status = result.io_status;
Temperature temp = result.expected_src_temperature;
if (result.current_src_temperature != Temperature::kUnknown &&
(temp == Temperature::kUnknown ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that if the DB runs with non tiered storage mode, we will always go to check temperature from the file system?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like filesystem is always queried no matter the storage mode.

Copy link
Contributor Author

@pdillinger pdillinger Mar 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FileSystem object is always queried, which for non-custom FS is a no-op. Therefore, putting this behind an option would just create one more way to do the wrong thing with probably no tangible benefit vs. just always trying to do the right thing. Querying the Temperature of an open file should be an in-memory operation, because the custom FileSystem should know what tier it is talking to. And if you really want to hide Temperature info or avoid querying, you can do that with a FileSystem wrapper that always returns Temperature::kUnknown.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if Siying is satisfied with the answer to his question

@@ -1424,14 +1442,20 @@ IOStatus BackupEngineImpl::CreateNewBackupWithMetadata(
item.result.wait();
auto result = item.result.get();
item_io_status = result.io_status;
Temperature temp = result.expected_src_temperature;
if (result.current_src_temperature != Temperature::kUnknown &&
(temp == Temperature::kUnknown ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like filesystem is always queried no matter the storage mode.

// option is false (default), (b) overrides (a) if both are not UNKNOWN.
// When true, (a) overrides (b) if both are not UNKNOWN. Regardless of this
// setting, a known temperature overrides UNKNOWN.
bool current_temperatures_override_manifest = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it feels more natural to restore the DB files to the pre-backup storage state, i.e., according to GetTemperature(). It'd be a behavior change though.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants