New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WritePrepared Txn: Compaction/Flush #2926
Conversation
Work in progress: Pending test. |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@yiwu-arbug has updated the pull request. View: changes, changes since last import |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@yiwu-arbug has updated the pull request. View: changes, changes since last import |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
5a64352
to
363ddc2
Compare
@yiwu-arbug has updated the pull request. |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
363ddc2
to
aa79ae5
Compare
@yiwu-arbug has updated the pull request. |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
db/compaction_iterator.cc
Outdated
@@ -292,6 +302,18 @@ void CompactionIterator::NextFromInput() { | |||
ikey_.user_key = current_key_.GetUserKey(); | |||
} | |||
|
|||
if (UNLIKELY(!current_key_committed_)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put a TODO to change this to LIKELY when our default txn policy changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will refactor a bit to make this cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1st pass
db/compaction_iterator.cc
Outdated
@@ -563,10 +588,13 @@ void CompactionIterator::PrepareOutput() { | |||
} | |||
} | |||
|
|||
inline SequenceNumber CompactionIterator::findEarliestVisibleSnapshot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why inline is removed?
db/compaction_job_test.cc
Outdated
versions_.get(), &shutting_down_, &log_buffer, | ||
nullptr, nullptr, nullptr, &mutex_, &bg_error_, | ||
snapshots, earliest_write_conflict_snapshot, | ||
nullptr, table_cache_, &event_logger, false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extend this test to include snapshot_checker?
@@ -1213,6 +1219,10 @@ class DBImpl : public DB { | |||
std::unordered_map<uint64_t, uint64_t> prepared_section_completed_; | |||
std::mutex prep_heap_mutex_; | |||
|
|||
// Callback for compaction to check if a key is visible to a snapshot. | |||
// REQUIRES: mutex held |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smart pointers are not thread-safe, and I'm not making it constant since I set it during WritePrepareTxnDB::init() which is after DBImpl open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it any overhead? Does compaction already holds the mutex_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only set the field once. Compaction hold mutex on-and-off. Just make sure the pointer is fetched while compaction is holding mutex.
db/db_impl_open.cc
Outdated
@@ -892,7 +892,7 @@ Status DBImpl::WriteLevel0TableForRecovery(int job_id, ColumnFamilyData* cfd, | |||
std::unique_ptr<InternalIterator>(mem->NewRangeTombstoneIterator(ro)), | |||
&meta, cfd->internal_comparator(), | |||
cfd->int_tbl_prop_collector_factories(), cfd->GetID(), cfd->GetName(), | |||
snapshot_seqs, earliest_write_conflict_snapshot, | |||
snapshot_seqs, earliest_write_conflict_snapshot, nullptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why no snapshot_checker is passed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wanting to avoid adding yet another option to db_options, and we don't really need snapshot_checker before WritePreparedTxnDB init, which is after DBImpl open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then if we use SetSnapshotChecker to set it, does anybody uses this changed API of BuildTable to set the snapshot checker? If not can we remove it?
db/flush_job_test.cc
Outdated
FlushJob flush_job( | ||
dbname_, versions_->GetColumnFamilySet()->GetDefault(), db_options_, | ||
*cfd->GetLatestMutableCFOptions(), env_options_, versions_.get(), &mutex_, | ||
&shutting_down_, {}, kMaxSequenceNumber, nullptr, &job_context, nullptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we extend the test to work with snapshot checker?
db/repair.cc
Outdated
nullptr /* internal_stats */, TableFileCreationReason::kRecovery, | ||
nullptr /* event_logger */, 0 /* job_id */, Env::IO_HIGH, | ||
nullptr /* table_properties */, -1 /* level */, current_time); | ||
{}, kMaxSequenceNumber, nullptr /*snapshot_checker*/, kNoCompression, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does not repair need snapshot checker? does not it flush memtables when they are full?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transaction is not involved in repair (let me know if I'm wrong) and snapshot_checker is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could repair result into memtable flush? Does it involve any memtable insert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was assuming these will be no preparing transaction on db open, but I was wrong since there can be preparing transactions recovering from logs. Let me think..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the repair code doesn't support transactions, since it simply replay logs and write them into sst file but don't even look at prepare/commit markers.
db/snapshot_checker.h
Outdated
SequenceNumber snapshot_sequence) const { | ||
return sequence <= snapshot_sequence; | ||
} | ||
#endif // ROCKSDB_LITE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
! ROCKSDB_LITE
db/snapshot_checker.h
Outdated
#endif // !ROCKSDB_LITE | ||
}; | ||
|
||
#ifdef ROCKSDB_LITE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not moving it to cc file?
db/snapshot_checker.h
Outdated
|
||
bool SnapshotChecker::IsInSnapshot(SequenceNumber sequence, | ||
SequenceNumber snapshot_sequence) const { | ||
return sequence <= snapshot_sequence; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing. If we do not support WritePrepared in ROCKSDB_LITE, then SnapshotChecker should be set to nullptr. This function should never be called in that case so it should throw an exception if it is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshot_checker is not being used in LITE mode. The implementation here is just a place holder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that but providing an implementation that is never used could be confusing. Return true all the time with a proper in-line comment would make it more clear.
port::RWMutex old_commit_map_mutex_; | ||
port::RWMutex commit_cache_mutex_; | ||
port::RWMutex snapshots_mutex_; | ||
mutable port::RWMutex prepared_mutex_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why mutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding const
keyword to IsInSnapshot
call and mutable
here means holding mutex doesn't violate const
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2nd pass is done.
Instead of passing a blind nullptr to the functions it would be great if you add a const variable, assigned to nullptr with some comments that why it makes sense to pass nullptr in each case.
db/compaction_iterator.cc
Outdated
if (snapshot_checker_ != nullptr) { | ||
return findEarliestVisibleSnapshotWithSnapshotChecker(in, prev_snapshot); | ||
} | ||
SequenceNumber prev = kMaxSequenceNumber; | ||
for (const auto cur : *snapshots_) { | ||
assert(prev == kMaxSequenceNumber || prev <= cur); | ||
if (cur >= in) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference between this func and findEarliestVisibleSnapshotWithSnapshotChecker is the check in this line. Can we instead of adding findEarliestVisibleSnapshotWithSnapshotChecker, simply put the if-then condition here: i.e., if (snapshot_checker_ != nullptr) call snapsh_checker else (cur >= in).
ASSERT_EQ(expected_versions[i].sequence, versions[i].sequence); | ||
ASSERT_EQ(expected_versions[i].type, versions[i].type); | ||
if (versions[i].type != kTypeDeletion && | ||
versions[i].type != kTypeSingleDeletion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about Range Delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RangeDelete is not supported for transactions (seems even for StackableDB).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you assert that here? Just in case it changes in future.
ASSERT_OK(db->Put(WriteOptions(), "a", "dummy")); | ||
ASSERT_OK(db->Put(WriteOptions(), "z", "dummy")); | ||
ASSERT_OK(db->CompactRange(CompactRangeOptions(), nullptr, nullptr)); | ||
VerifyInternalKeys({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to make it easier to verify by eyeball checking? Currently it is not clear whether this values actually test the implementation or they are set to make the test pass. For example we can add an item to the verify list right after the final insertion for the key. This would not still help figuring the sequence number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any idea here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by adding the expected_seq variable.
delete transaction; | ||
} | ||
|
||
TEST_P(WritePreparedTransactionTest, CompactionShouldKeepSnapshotVisibleKeys) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we disable auto compaction to make sure the test is deterministic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
ASSERT_OK(db->Flush(FlushOptions())); | ||
db->ReleaseSnapshot(snapshot1); | ||
db->ReleaseSnapshot(snapshot3); | ||
// Dummy keys to avoid trivial move. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "move" mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compaction can trivially move file from lower level to upper level, which is called trivial move. When it happen CompactionIterator is not called and we are not testing actual compaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great if you put this response as an in-line comment.
VerifyInternalKeys({ | ||
{"key1", "value1_2", 5, kTypeValue}, | ||
// "value1_1" is visible to snapshot2. Also keys at bottom level visible | ||
// to earliest snapshot will output with seq = 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that IsInSnapshot should always return true for seq == 0? Can you add a unit test for that and make the code changes in IsInSnapshot to let the test pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do in a separate PR.
ASSERT_OK(db->Put(WriteOptions(), "z", "dummy")); | ||
ASSERT_OK(db->CompactRange(CompactRangeOptions(), nullptr, nullptr)); | ||
VerifyInternalKeys({ | ||
{"key1", "value1_2", 5, kTypeValue}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again it is hard to see if the seq 5 is verifying the code or it is a number we came up with to make the test pass. How about adding a variable exp_seq and ++ it after each operation. This way there would be less likely that we have got the math wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the expected_seq variable.
} | ||
|
||
TEST_P(WritePreparedTransactionTest, | ||
CompactionShouldKeepSequenceForUncommittedKeys) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name sounds similar to CompactionShouldKeepUncommittedKeys. Perhaps add some comments to clarify what the test does and how these two are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added.
aa79ae5
to
ac4d8fa
Compare
@yiwu-arbug has updated the pull request. |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@yiwu-arbug has updated the pull request. View: changes, changes since last import |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
e4fd2b2
to
19a84ed
Compare
@yiwu-arbug has updated the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@yiwu-arbug has updated the pull request. View: changes, changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
// number equal to 0 if the key is not visible to earliest snapshot, based on | ||
// commit sequence number. | ||
TEST_P(WritePreparedTransactionTest, | ||
CompactionShouldKeepSequenceForUncommittedKeys) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will disable this test before landing, since without #2972 it will fail.
10452b2
to
254fab1
Compare
@yiwu-arbug has updated the pull request. |
@yiwu-arbug has updated the pull request. View: changes, changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@@ -400,17 +400,21 @@ class Repairer { | |||
int64_t _current_time = 0; | |||
status = env_->GetCurrentTime(&_current_time); // ignore error | |||
const uint64_t current_time = static_cast<uint64_t>(_current_time); | |||
// Only TransactionDB make use of snapshot_checker and repair doesn't | |||
// currently support TransactionDB with uncommitted prepared keys in WAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Repair do any compaction? Does it need to be aware of uncommitted values? If yes, should we add a TODO to upgrade repair later?
a9fcbc1
to
939fc2d
Compare
@yiwu-arbug has updated the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiwu-arbug is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary:
Update Compaction/Flush to support WritePreparedTxnDB: Add SnapshotChecker which is a proxy to query WritePreparedTxnDB::IsInSnapshot. Pass SnapshotChecker to DBImpl on WritePreparedTxnDB open. CompactionIterator use it to check if a key has been committed and if it is visible to a snapshot. In CompactionIterator:
Test Plan:
See the new tests.