Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some fixes and enhancements to ldb repair #8544

Closed
wants to merge 7 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Jul 16, 2021

Summary:

  • Basic handling of SST file with just range tombstones rather than
    failing assertion about smallest_seqno <= largest_seqno
  • Adds --verbose option so that there exists a way to see the INFO
    output from Repairer.

Test Plan: unit test added, manual testing for --verbose

Summary:
* Properly handles SST file with just range tombstones rather than
failing assertion about smallest_seqno <= largest_seqno
* Adds --verbose option so that there exists a way to see the INFO
output from Repairer.

Test Plan: TODO: unit test
@pdillinger pdillinger removed the request for review from ajkr July 21, 2021 14:59
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the range tombstones can stick out arbitrarily past the file's MANIFEST-specified bounds. This doc (https://github.com/cockroachdb/pebble/blob/master/docs/range_deletions.md) explains why the final state we are in is necessary (although tbh I still feel we arrived there by chance).

So including range tombstones can extend a file's bounds beyond what it is supposed to cover. Meanwhile (before this PR) excluding range tombstones could make a file's bounds too small. Maybe we can just add a comment that the current solution is known to be problematic.

@ajkr
Copy link
Contributor

ajkr commented Jul 21, 2021

I wonder if we should add table properties for smallest/largest key stored to MANIFEST, at least when they don't match the smallest/largest key in the file.

@pdillinger
Copy link
Contributor Author

pdillinger commented Jul 21, 2021

I see, it should ideally have the extra logic that is in compaction_job.cc but not in builder.cc (re: UpdateBoundariesForRange)?

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 21, 2021
Summary: FileOptions has an implicit conversion from EnvOptions and some
internal APIs take `const FileOptions&` and save the reference, which is
counter to Google C++ guidelines,

> Avoid defining functions that require a const reference parameter to outlive the call, because const reference parameters bind to temporaries. Instead, find a way to eliminate the lifetime requirement (for example, by copying the parameter), or pass it by const pointer and document the lifetime and non-null requirements.

This is at least a problem for repair.cc, which passes an EnvOptions to
TableCache(), which would save a reference to the temporary copy as
FileOptions. This was unfortunately only caught as a side effect of
changes in facebook#8544.

This change fixes the repair.cc case and updates the involved internal
APIs that save a reference to use `const FileOptions*` instead.

Unfortunately, I don't know how to get any of our sanitizers to reliably
report bugs like this, so I can't rule out more existing in our
codebase.

Test Plan: Test that issues seen with facebook#8544 are fixed (can reproduce on
AWS EC2)
@ajkr
Copy link
Contributor

ajkr commented Jul 21, 2021

I see, it should ideally have the extra logic that is in compaction_job.cc but not in builder.cc (re: UpdateBoundariesForRange)?

Right, CompactionJob uses its knowledge about the neighboring files to decide.

@pdillinger
Copy link
Contributor Author

Unfortunately, there is some other issue (alleged memory leak) being reported by ASAN with this change, even with fix #8571 applied.

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 27, 2021
Summary: This appears to be little used code so not a major bug, but is
blocking facebook#8544

Test Plan: Added regression test to the end of
DBRangeDelTest::TableEvictedDuringScan. Without this fix, ASAN reports
memory leak.
facebook-github-bot pushed a commit that referenced this pull request Jul 28, 2021
…8589)

Summary:
This appears to be little used code so not a major bug, but is
blocking #8544

Pull Request resolved: #8589

Test Plan:
Added regression test to the end of
DBRangeDelTest::TableEvictedDuringScan. Without this fix, ASAN reports
memory leak.

Reviewed By: ajkr

Differential Revision: D29943623

Pulled By: pdillinger

fbshipit-source-id: f7115fa6d4440aef83888ff609aa03d09216463b
facebook-github-bot pushed a commit that referenced this pull request Jul 28, 2021
Summary:
FileOptions has an implicit conversion from EnvOptions and some
internal APIs take `const FileOptions&` and save the reference, which is
counter to Google C++ guidelines,

> Avoid defining functions that require a const reference parameter to outlive the call, because const reference parameters bind to temporaries. Instead, find a way to eliminate the lifetime requirement (for example, by copying the parameter), or pass it by const pointer and document the lifetime and non-null requirements.

This is at least a problem for repair.cc, which passes an EnvOptions to
TableCache(), which would save a reference to the temporary copy as
FileOptions. This was unfortunately only caught as a side effect of
changes in #8544.

This change fixes the repair.cc case and updates the involved internal
APIs that save a reference to use `const FileOptions*` instead.

Unfortunately, I don't know how to get any of our sanitizers to reliably
report bugs like this, so I can't rule out more existing in our
codebase.

Pull Request resolved: #8571

Test Plan:
Test that issues seen with #8544 are fixed (can reproduce on
AWS EC2)

Reviewed By: ajkr

Differential Revision: D29943890

Pulled By: pdillinger

fbshipit-source-id: 95f9c5251548777b4dc994c1a083dd2add5799c9
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@pdillinger
Copy link
Contributor Author

@ajkr After more consideration (but not willing to invest a lot of time into this flawed repair tool), do you think it's better only to run this logic if counter==0, i.e. the SST file is only range tombstones?

@ajkr
Copy link
Contributor

ajkr commented Jul 28, 2021

@ajkr After more consideration (but not willing to invest a lot of time into this flawed repair tool), do you think it's better only to run this logic if counter==0, i.e. the SST file is only range tombstones?

Hm, not particularly. A couple options that feel consistent to me are:

  • Always consider range tombstone boundaries that stick out past the point keys (i.e., this PR).
  • Never consider range tombstone boundaries sticking out past the point keys. In case of counter == 0 this would mean dropping the file.

@facebook-github-bot
Copy link
Contributor

@pdillinger merged this pull request in 0804b44.

jaepil pushed a commit to deepsearch-hq/rocksdb-backup that referenced this pull request Aug 1, 2021
…acebook#8589)

Summary:
This appears to be little used code so not a major bug, but is
blocking facebook#8544

Pull Request resolved: facebook#8589

Test Plan:
Added regression test to the end of
DBRangeDelTest::TableEvictedDuringScan. Without this fix, ASAN reports
memory leak.

Reviewed By: ajkr

Differential Revision: D29943623

Pulled By: pdillinger

fbshipit-source-id: f7115fa6d4440aef83888ff609aa03d09216463b
jaepil pushed a commit to deepsearch-hq/rocksdb-backup that referenced this pull request Aug 1, 2021
Summary:
FileOptions has an implicit conversion from EnvOptions and some
internal APIs take `const FileOptions&` and save the reference, which is
counter to Google C++ guidelines,

> Avoid defining functions that require a const reference parameter to outlive the call, because const reference parameters bind to temporaries. Instead, find a way to eliminate the lifetime requirement (for example, by copying the parameter), or pass it by const pointer and document the lifetime and non-null requirements.

This is at least a problem for repair.cc, which passes an EnvOptions to
TableCache(), which would save a reference to the temporary copy as
FileOptions. This was unfortunately only caught as a side effect of
changes in facebook#8544.

This change fixes the repair.cc case and updates the involved internal
APIs that save a reference to use `const FileOptions*` instead.

Unfortunately, I don't know how to get any of our sanitizers to reliably
report bugs like this, so I can't rule out more existing in our
codebase.

Pull Request resolved: facebook#8571

Test Plan:
Test that issues seen with facebook#8544 are fixed (can reproduce on
AWS EC2)

Reviewed By: ajkr

Differential Revision: D29943890

Pulled By: pdillinger

fbshipit-source-id: 95f9c5251548777b4dc994c1a083dd2add5799c9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants