Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail DB::Open() if logger cannot be created #9984

Closed

Conversation

riversand963
Copy link
Contributor

@riversand963 riversand963 commented May 12, 2022

For regular db instance and secondary instance, we return error and refuse to open DB if Logger creation fails.

Our current code allows it, but it is really difficult to debug because
there will be no LOG files. The same for OPTIONS file, which will be explored in another PR.

Furthermore, Arena::AllocateAligned(size_t bytes, size_t huge_page_size, Logger* logger) has an
assertion as the following:

#ifdef MAP_HUGETLB
if (huge_page_size > 0 && bytes > 0) {
  assert(logger != nullptr);
}
#endif

It can be removed.

Test Plan:
make check

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ajkr
Copy link
Contributor

ajkr commented May 16, 2022

Before reviewing this PR, a bigger question: for non-read-only instance,
should we return error and refuse to open DB if Logger creation in
SanitizeOptions() fails?

I think that would be good. Also consider requiring failure for OPTIONS file errors - there is currently a fail_if_options_file_error flag that is false by default.

@jay-zhuang
Copy link
Contributor

+1 for failing the open if Logger creation failed.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@riversand963 riversand963 changed the title Remove an assertion of logger in Arena Fail DB::Open() if logger cannot be created May 26, 2022
@riversand963
Copy link
Contributor Author

At this point, I expect there will be a lot of errors due to a bug related to non-empty options.db_log_dir.

@riversand963
Copy link
Contributor Author

riversand963 commented May 26, 2022

Also consider requiring failure for OPTIONS file errors - there is currently a fail_if_options_file_error flag that is false by default.

We can explore this in another PR since it's another behavior change.
Somebody asked a similar question in #9941

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Before reviewing this PR, a bigger question: for non-read-only instance,
should we return error and refuse to open DB if Logger creation in
`SanitizeOptions()` fails?
Our current code allows it, but it is really difficult to debug because
there will be no LOG files.

Summary:
Arena::AllocateAligned(size_t bytes, size_t huge_page_size, Logger* logger) has an
assertion as the following:

```cpp
\#ifdef MAP_HUGETLB
if (huge_page_size > 0 && bytes > 0) {
  assert(logger != nullptr);
}
\#endif
```

This assertion is based on wrong assumption. The logger comes from
ConcurrentArena::AllocateAligned(..., Logger* logger = nullptr) and can
be nullptr by default. Tracing back further, logger comes from
immutable_db_options.info_log which can be nullptr if
`CreateLoggerFromOptions()` return non-ok status. In this case, RocksDB
currently still proceeds.

```cpp
if (result.info_log == nullptr && !read_only) {
  Status s = CreateLoggerFromOptions(...);
  if (!s.ok()) {
    result.info_log = nullptr;
  }
}
```

An actual occurence of the backtrace will be provided

Test Plan:
make check
@facebook-github-bot
Copy link
Contributor

@riversand963 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@riversand963 riversand963 deleted the fix-assertion-hugetlb-alloc branch May 27, 2022 15:09
@ajkr
Copy link
Contributor

ajkr commented May 28, 2022

Also consider requiring failure for OPTIONS file errors - there is currently a fail_if_options_file_error flag that is false by default.

We can explore this in another PR since it's another behavior change.
Somebody asked a similar question in #9941

I remembered realizing a problem with this last year - #8275 (comment)

facebook-github-bot pushed a commit that referenced this pull request May 31, 2022
Summary:
After #9984, BackupEngineTest.Concurrency becomes flaky.

During DB::Open(), someone else can rename/remove the LOG file, causing
this thread's `CreateLoggerFromOptions()` to fail. The reason is that the operation sequence
of "FileExists -> Rename" is not atomic. It's possible that a FileExists() returns OK, but the file
gets deleted before Rename(), causing the latter to return IOError with PathNotFound subcode.

Although it's not encouraged to concurrently modify the contents of the directories managed by
the database instance in this case, we can still perform some simple handling to make DB::Open()
more robust. In this case, we can check if a racing thread has deleted the original LOG file, we can
allow this thread to continue creating a new LOG file.

Pull Request resolved: #10069

Test Plan: ~/gtest-parallel/gtest-parallel -r 100 ./backup_engine_test --gtest_filter=BackupEngineTest.Concurrency

Reviewed By: jay-zhuang

Differential Revision: D36736913

Pulled By: riversand963

fbshipit-source-id: 3cbe92d77ca175e55e586bdb1a32ac8107217ae6
facebook-github-bot pushed a commit that referenced this pull request Jun 22, 2022
Summary:
#9984 changes the behavior of RocksDB: if logger creation failed during `SanitizeOptions()`,
`DB::Open()` will fail. However, since `SanitizeOptions()` is called in `DBImpl::DBImpl()`, we cannot
directly expose the error to caller without some additional work.
This is a first version proposal which:
- Adds a new member `init_logger_creation_s` to `DBImpl` to store the result of init logger creation
- Checks the error during `DB::Open()` and return it to caller if non-ok

This is not very ideal. We can alternatively move the logger creation logic out of the `SanitizeOptions()`.
Since `SanitizeOptions()` is used in other places, we need to check whether this change breaks anything
in case other callers of `SanitizeOptions()` assumes that a logger should be created.

Pull Request resolved: #10223

Test Plan: make check

Reviewed By: pdillinger

Differential Revision: D37321717

Pulled By: riversand963

fbshipit-source-id: 58042358a86369d606549dd9938933dd47591c4b
pdillinger pushed a commit to pdillinger/rocksdb that referenced this pull request Jun 22, 2022
Summary:
facebook#9984 changes the behavior of RocksDB: if logger creation failed during `SanitizeOptions()`,
`DB::Open()` will fail. However, since `SanitizeOptions()` is called in `DBImpl::DBImpl()`, we cannot
directly expose the error to caller without some additional work.
This is a first version proposal which:
- Adds a new member `init_logger_creation_s` to `DBImpl` to store the result of init logger creation
- Checks the error during `DB::Open()` and return it to caller if non-ok

This is not very ideal. We can alternatively move the logger creation logic out of the `SanitizeOptions()`.
Since `SanitizeOptions()` is used in other places, we need to check whether this change breaks anything
in case other callers of `SanitizeOptions()` assumes that a logger should be created.

Pull Request resolved: facebook#10223

Test Plan: make check

Reviewed By: pdillinger

Differential Revision: D37321717

Pulled By: riversand963

fbshipit-source-id: 58042358a86369d606549dd9938933dd47591c4b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants