Skip to content

Commit

Permalink
Make backups openable as read-only DBs (#8142)
Browse files Browse the repository at this point in the history
Summary:
A current limitation of backups is that you don't know the
exact database state of when the backup was taken. With this new
feature, you can at least inspect the backup's DB state without
restoring it by opening it as a read-only DB.

Rather than add something like OpenAsReadOnlyDB to the BackupEngine API,
which would inhibit opening stackable DB implementations read-only
(if/when their APIs support it), we instead provide a DB name and Env
that can be used to open as a read-only DB.

Possible follow-up work:

* Add a version of GetBackupInfo for a single backup.
* Let CreateNewBackup return the BackupID of the newly-created backup.

Implementation details:

Refactored ChrootFileSystem to split off new base class RemapFileSystem,
which allows more general remapping of files. We use this base class to
implement BackupEngineImpl::RemapSharedFileSystem.

To minimize API impact, I decided to just add these fields `name_for_open`
and `env_for_open` to those set by GetBackupInfo when
include_file_details=true. Creating the RemapSharedFileSystem adds a bit
to the memory consumption, perhaps unnecessarily in some cases, but this
has been mitigated by (a) only initialize the RemapSharedFileSystem
lazily when GetBackupInfo with include_file_details=true is called, and
(b) using the existing `shared_ptr<FileInfo>` objects to hold most of the
mapping data.

To enhance API safety, RemapSharedFileSystem is wrapped by new
ReadOnlyFileSystem which rejects any attempts to write. This uncovered a
couple of places in which DB::OpenForReadOnly would write to the
filesystem, so I fixed these. Added a release note because this affects
logging.

Additional minor refactoring in backupable_db.cc to support the new
functionality.

Pull Request resolved: #8142

Test Plan:
new test (run with ASAN and UBSAN), added to stress test and
ran it for a while with amplified backup_one_in

Reviewed By: ajkr

Differential Revision: D27535408

Pulled By: pdillinger

fbshipit-source-id: 04666d310aa0261ef6b2385c43ca793ce1dfd148
  • Loading branch information
pdillinger authored and facebook-github-bot committed Apr 6, 2021
1 parent 09528f9 commit 879357f
Show file tree
Hide file tree
Showing 19 changed files with 908 additions and 360 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -651,6 +651,7 @@ set(SOURCES
env/env_hdfs.cc
env/file_system.cc
env/file_system_tracer.cc
env/fs_remap.cc
env/mock_env.cc
file/delete_scheduler.cc
file/file_prefetch_buffer.cc
Expand Down
4 changes: 4 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### Behavior Changes
* `ColumnFamilyOptions::sample_for_compression` now takes effect for creation of all block-based tables. Previously it only took effect for block-based tables created by flush.
* `CompactFiles()` can no longer compact files from lower level to up level, which has the risk to corrupt DB (details: #8063). The validation is also added to all compactions.
* Fixed some cases in which DB::OpenForReadOnly() could write to the filesystem. If you want a Logger with a read-only DB, you must now set DBOptions::info_log yourself, such as using CreateLoggerFromOptions().

### Bug Fixes
* Use thread-safe `strerror_r()` to get error messages.
Expand All @@ -18,6 +19,9 @@
* Added `TableProperties::slow_compression_estimated_data_size` and `TableProperties::fast_compression_estimated_data_size`. When `ColumnFamilyOptions::sample_for_compression > 0`, they estimate what `TableProperties::data_size` would have been if the "fast" or "slow" (see `ColumnFamilyOptions::sample_for_compression` API doc for definitions) compression had been used instead.
* Update DB::StartIOTrace and remove Env object from the arguments as its redundant and DB already has Env object that is passed down to IOTracer::StartIOTrace

### New Features
* Added the ability to open BackupEngine backups as read-only DBs, using BackupInfo::name_for_open and env_for_open provided by BackupEngine::GetBackupInfo() with include_file_details=true.

## 6.19.0 (03/21/2021)
### Bug Fixes
* Fixed the truncation error found in APIs/tools when dumping block-based SST files in a human-readable format. After fix, the block-based table can be fully dumped as a readable file.
Expand Down
2 changes: 2 additions & 0 deletions TARGETS
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,7 @@ cpp_library(
"env/file_system.cc",
"env/file_system_tracer.cc",
"env/fs_posix.cc",
"env/fs_remap.cc",
"env/io_posix.cc",
"env/mock_env.cc",
"file/delete_scheduler.cc",
Expand Down Expand Up @@ -526,6 +527,7 @@ cpp_library(
"env/file_system.cc",
"env/file_system_tracer.cc",
"env/fs_posix.cc",
"env/fs_remap.cc",
"env/io_posix.cc",
"env/mock_env.cc",
"file/delete_scheduler.cc",
Expand Down
12 changes: 7 additions & 5 deletions db/db_impl/compacted_db_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@ extern void MarkKeyMayExist(void* arg);
extern bool SaveValue(void* arg, const ParsedInternalKey& parsed_key,
const Slice& v, bool hit_and_return);

CompactedDBImpl::CompactedDBImpl(
const DBOptions& options, const std::string& dbname)
: DBImpl(options, dbname), cfd_(nullptr), version_(nullptr),
user_comparator_(nullptr) {
}
CompactedDBImpl::CompactedDBImpl(const DBOptions& options,
const std::string& dbname)
: DBImpl(options, dbname, /*seq_per_batch*/ false, +/*batch_per_txn*/ true,
/*read_only*/ true),
cfd_(nullptr),
version_(nullptr),
user_comparator_(nullptr) {}

CompactedDBImpl::~CompactedDBImpl() {
}
Expand Down
5 changes: 3 additions & 2 deletions db/db_impl/db_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,11 @@ void DumpSupportInfo(Logger* logger) {
} // namespace

DBImpl::DBImpl(const DBOptions& options, const std::string& dbname,
const bool seq_per_batch, const bool batch_per_txn)
const bool seq_per_batch, const bool batch_per_txn,
bool read_only)
: dbname_(dbname),
own_info_log_(options.info_log == nullptr),
initial_db_options_(SanitizeOptions(dbname, options)),
initial_db_options_(SanitizeOptions(dbname, options, read_only)),
env_(initial_db_options_.env),
io_tracer_(std::make_shared<IOTracer>()),
immutable_db_options_(initial_db_options_),
Expand Down
11 changes: 7 additions & 4 deletions db/db_impl/db_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,8 @@ class Directories {
class DBImpl : public DB {
public:
DBImpl(const DBOptions& options, const std::string& dbname,
const bool seq_per_batch = false, const bool batch_per_txn = true);
const bool seq_per_batch = false, const bool batch_per_txn = true,
bool read_only = false);
// No copying allowed
DBImpl(const DBImpl&) = delete;
void operator=(const DBImpl&) = delete;
Expand Down Expand Up @@ -1236,7 +1237,7 @@ class DBImpl : public DB {
virtual bool OwnTablesAndLogs() const { return true; }

// Set DB identity file, and write DB ID to manifest if necessary.
Status SetDBId();
Status SetDBId(bool read_only);

// REQUIRES: db mutex held when calling this function, but the db mutex can
// be released and re-acquired. Db mutex will be held when the function
Expand Down Expand Up @@ -2231,9 +2232,11 @@ class DBImpl : public DB {
BlobFileCompletionCallback blob_callback_;
};

extern Options SanitizeOptions(const std::string& db, const Options& src);
extern Options SanitizeOptions(const std::string& db, const Options& src,
bool read_only = false);

extern DBOptions SanitizeOptions(const std::string& db, const DBOptions& src);
extern DBOptions SanitizeOptions(const std::string& db, const DBOptions& src,
bool read_only = false);

extern CompressionType GetCompressionFlush(
const ImmutableCFOptions& ioptions,
Expand Down
14 changes: 10 additions & 4 deletions db/db_impl/db_impl_files.cc
Original file line number Diff line number Diff line change
Expand Up @@ -854,7 +854,7 @@ uint64_t PrecomputeMinLogNumberToKeep2PC(
return min_log_number_to_keep;
}

Status DBImpl::SetDBId() {
Status DBImpl::SetDBId(bool read_only) {
Status s;
// Happens when immutable_db_options_.write_dbid_to_manifest is set to true
// the very first time.
Expand All @@ -865,9 +865,15 @@ Status DBImpl::SetDBId() {
// it is no longer available then at this point DB ID is not in Identity
// file or Manifest.
if (s.IsNotFound()) {
s = SetIdentityFile(env_, dbname_);
if (!s.ok()) {
return s;
// Create a new DB ID, saving to file only if allowed
if (read_only) {
db_id_ = env_->GenerateUniqueId();
return Status::OK();
} else {
s = SetIdentityFile(env_, dbname_);
if (!s.ok()) {
return s;
}
}
} else if (!s.ok()) {
assert(s.IsIOError());
Expand Down
12 changes: 7 additions & 5 deletions db/db_impl/db_impl_open.cc
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,17 @@
#include "util/rate_limiter.h"

namespace ROCKSDB_NAMESPACE {
Options SanitizeOptions(const std::string& dbname, const Options& src) {
auto db_options = SanitizeOptions(dbname, DBOptions(src));
Options SanitizeOptions(const std::string& dbname, const Options& src,
bool read_only) {
auto db_options = SanitizeOptions(dbname, DBOptions(src), read_only);
ImmutableDBOptions immutable_db_options(db_options);
auto cf_options =
SanitizeOptions(immutable_db_options, ColumnFamilyOptions(src));
return Options(db_options, cf_options);
}

DBOptions SanitizeOptions(const std::string& dbname, const DBOptions& src) {
DBOptions SanitizeOptions(const std::string& dbname, const DBOptions& src,
bool read_only) {
DBOptions result(src);

if (result.env == nullptr) {
Expand All @@ -50,7 +52,7 @@ DBOptions SanitizeOptions(const std::string& dbname, const DBOptions& src) {
&result.max_open_files);
}

if (result.info_log == nullptr) {
if (result.info_log == nullptr && !read_only) {
Status s = CreateLoggerFromOptions(dbname, result, &result.info_log);
if (!s.ok()) {
// No place suitable for logging
Expand Down Expand Up @@ -488,7 +490,7 @@ Status DBImpl::Recover(
if (!s.ok()) {
return s;
}
s = SetDBId();
s = SetDBId(read_only);
if (s.ok() && !read_only) {
s = DeleteUnreferencedSstFiles();
}
Expand Down
11 changes: 7 additions & 4 deletions db/db_impl/db_impl_readonly.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ namespace ROCKSDB_NAMESPACE {

DBImplReadOnly::DBImplReadOnly(const DBOptions& db_options,
const std::string& dbname)
: DBImpl(db_options, dbname) {
: DBImpl(db_options, dbname, /*seq_per_batch*/ false,
/*batch_per_txn*/ true, /*read_only*/ true) {
ROCKS_LOG_INFO(immutable_db_options_.info_log,
"Opening the db in read only mode");
LogFlush(immutable_db_options_.info_log);
Expand Down Expand Up @@ -131,8 +132,8 @@ Status DBImplReadOnly::NewIterators(
}

namespace {
// Return OK if dbname exists in the file system
// or create_if_missing is false
// Return OK if dbname exists in the file system or create it if
// create_if_missing
Status OpenForReadOnlyCheckExistence(const DBOptions& db_options,
const std::string& dbname) {
Status s;
Expand All @@ -143,14 +144,16 @@ Status OpenForReadOnlyCheckExistence(const DBOptions& db_options,
uint64_t manifest_file_number;
s = VersionSet::GetCurrentManifestPath(dbname, fs.get(), &manifest_path,
&manifest_file_number);
} else {
// Historic behavior that doesn't necessarily make sense
s = db_options.env->CreateDirIfMissing(dbname);
}
return s;
}
} // namespace

Status DB::OpenForReadOnly(const Options& options, const std::string& dbname,
DB** dbptr, bool /*error_if_wal_file_exists*/) {
// If dbname does not exist in the file system, should not do anything
Status s = OpenForReadOnlyCheckExistence(options, dbname);
if (!s.ok()) {
return s;
Expand Down
54 changes: 42 additions & 12 deletions db_stress_tool/db_stress_test_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1368,8 +1368,13 @@ Status StressTest::TestBackupRestore(
}
}
std::vector<BackupInfo> backup_info;
// If inplace_not_restore, we verify the backup by opening it as a
// read-only DB. If !inplace_not_restore, we restore it to a temporary
// directory for verification.
bool inplace_not_restore = thread->rand.OneIn(3);
if (s.ok()) {
backup_engine->GetBackupInfo(&backup_info);
backup_engine->GetBackupInfo(&backup_info,
/*include_file_details*/ inplace_not_restore);
if (backup_info.empty()) {
s = Status::NotFound("no backups found");
from = "BackupEngine::GetBackupInfo";
Expand All @@ -1385,8 +1390,8 @@ Status StressTest::TestBackupRestore(
}
const bool allow_persistent = thread->tid == 0; // not too many
bool from_latest = false;
if (s.ok()) {
int count = static_cast<int>(backup_info.size());
int count = static_cast<int>(backup_info.size());
if (s.ok() && !inplace_not_restore) {
if (count > 1) {
s = backup_engine->RestoreDBFromBackup(
RestoreOptions(), backup_info[thread->rand.Uniform(count)].backup_id,
Expand All @@ -1404,7 +1409,9 @@ Status StressTest::TestBackupRestore(
}
}
}
if (s.ok()) {
if (s.ok() && !inplace_not_restore) {
// Purge early if restoring, to ensure the restored directory doesn't
// have some secret dependency on the backup directory.
uint32_t to_keep = 0;
if (allow_persistent) {
// allow one thread to keep up to 2 backups
Expand Down Expand Up @@ -1432,10 +1439,21 @@ Status StressTest::TestBackupRestore(
for (auto name : column_family_names_) {
cf_descriptors.emplace_back(name, ColumnFamilyOptions(restore_options));
}
s = DB::Open(DBOptions(restore_options), restore_dir, cf_descriptors,
&restored_cf_handles, &restored_db);
if (!s.ok()) {
from = "DB::Open in backup/restore";
if (inplace_not_restore) {
BackupInfo& info = backup_info[thread->rand.Uniform(count)];
restore_options.env = info.env_for_open.get();
s = DB::OpenForReadOnly(DBOptions(restore_options), info.name_for_open,
cf_descriptors, &restored_cf_handles,
&restored_db);
if (!s.ok()) {
from = "DB::OpenForReadOnly in backup/restore";
}
} else {
s = DB::Open(DBOptions(restore_options), restore_dir, cf_descriptors,
&restored_cf_handles, &restored_db);
if (!s.ok()) {
from = "DB::Open in backup/restore";
}
}
}
// Note the column families chosen by `rand_column_families` cannot be
Expand Down Expand Up @@ -1476,17 +1494,29 @@ Status StressTest::TestBackupRestore(
}
}
}
if (backup_engine != nullptr) {
delete backup_engine;
backup_engine = nullptr;
}
if (restored_db != nullptr) {
for (auto* cf_handle : restored_cf_handles) {
restored_db->DestroyColumnFamilyHandle(cf_handle);
}
delete restored_db;
restored_db = nullptr;
}
if (s.ok() && inplace_not_restore) {
// Purge late if inplace open read-only
uint32_t to_keep = 0;
if (allow_persistent) {
// allow one thread to keep up to 2 backups
to_keep = thread->rand.Uniform(3);
}
s = backup_engine->PurgeOldBackups(to_keep);
if (!s.ok()) {
from = "BackupEngine::PurgeOldBackups";
}
}
if (backup_engine != nullptr) {
delete backup_engine;
backup_engine = nullptr;
}
if (s.ok()) {
// Preserve directories on failure, or allowed persistent backup
if (!allow_persistent) {
Expand Down
Loading

0 comments on commit 879357f

Please sign in to comment.