Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WritePrepared Txn: Compaction/Flush #2926

Closed
wants to merge 16 commits into from

Conversation

yiwu-arbug
Copy link
Contributor

@yiwu-arbug yiwu-arbug commented Sep 25, 2017

Summary:
Update Compaction/Flush to support WritePreparedTxnDB: Add SnapshotChecker which is a proxy to query WritePreparedTxnDB::IsInSnapshot. Pass SnapshotChecker to DBImpl on WritePreparedTxnDB open. CompactionIterator use it to check if a key has been committed and if it is visible to a snapshot. In CompactionIterator:

  • check if key has been committed. If not, output uncommitted keys AS-IS.
  • use SnapshotChecker to check if key is visible to a snapshot when in need.
  • do not output key with seq = 0 if the key is not committed.

Test Plan:
See the new tests.

@yiwu-arbug
Copy link
Contributor Author

Work in progress: Pending test.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request. View: changes, changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request. View: changes, changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request.

@yiwu-arbug yiwu-arbug changed the title [WIP][Transaction] Compaction/Flush support for WritePreparedTxnDB [Transaction] Compaction/Flush support for WritePreparedTxnDB Sep 27, 2017
@yiwu-arbug yiwu-arbug changed the title [Transaction] Compaction/Flush support for WritePreparedTxnDB [Transaction] WritePrepared Txn: Compaction/Flush Sep 27, 2017
@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@yiwu-arbug yiwu-arbug changed the title [Transaction] WritePrepared Txn: Compaction/Flush WritePrepared Txn: Compaction/Flush Sep 27, 2017
@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -292,6 +302,18 @@ void CompactionIterator::NextFromInput() {
ikey_.user_key = current_key_.GetUserKey();
}

if (UNLIKELY(!current_key_committed_)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put a TODO to change this to LIKELY when our default txn policy changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor a bit to make this cleaner.

Copy link
Contributor

@maysamyabandeh maysamyabandeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st pass

@@ -563,10 +588,13 @@ void CompactionIterator::PrepareOutput() {
}
}

inline SequenceNumber CompactionIterator::findEarliestVisibleSnapshot(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why inline is removed?

versions_.get(), &shutting_down_, &log_buffer,
nullptr, nullptr, nullptr, &mutex_, &bg_error_,
snapshots, earliest_write_conflict_snapshot,
nullptr, table_cache_, &event_logger, false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extend this test to include snapshot_checker?

@@ -1213,6 +1219,10 @@ class DBImpl : public DB {
std::unordered_map<uint64_t, uint64_t> prepared_section_completed_;
std::mutex prep_heap_mutex_;

// Callback for compaction to check if a key is visible to a snapshot.
// REQUIRES: mutex held
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smart pointers are not thread-safe, and I'm not making it constant since I set it during WritePrepareTxnDB::init() which is after DBImpl open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it any overhead? Does compaction already holds the mutex_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only set the field once. Compaction hold mutex on-and-off. Just make sure the pointer is fetched while compaction is holding mutex.

@@ -892,7 +892,7 @@ Status DBImpl::WriteLevel0TableForRecovery(int job_id, ColumnFamilyData* cfd,
std::unique_ptr<InternalIterator>(mem->NewRangeTombstoneIterator(ro)),
&meta, cfd->internal_comparator(),
cfd->int_tbl_prop_collector_factories(), cfd->GetID(), cfd->GetName(),
snapshot_seqs, earliest_write_conflict_snapshot,
snapshot_seqs, earliest_write_conflict_snapshot, nullptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why no snapshot_checker is passed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wanting to avoid adding yet another option to db_options, and we don't really need snapshot_checker before WritePreparedTxnDB init, which is after DBImpl open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then if we use SetSnapshotChecker to set it, does anybody uses this changed API of BuildTable to set the snapshot checker? If not can we remove it?

FlushJob flush_job(
dbname_, versions_->GetColumnFamilySet()->GetDefault(), db_options_,
*cfd->GetLatestMutableCFOptions(), env_options_, versions_.get(), &mutex_,
&shutting_down_, {}, kMaxSequenceNumber, nullptr, &job_context, nullptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we extend the test to work with snapshot checker?

db/repair.cc Outdated
nullptr /* internal_stats */, TableFileCreationReason::kRecovery,
nullptr /* event_logger */, 0 /* job_id */, Env::IO_HIGH,
nullptr /* table_properties */, -1 /* level */, current_time);
{}, kMaxSequenceNumber, nullptr /*snapshot_checker*/, kNoCompression,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does not repair need snapshot checker? does not it flush memtables when they are full?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transaction is not involved in repair (let me know if I'm wrong) and snapshot_checker is not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could repair result into memtable flush? Does it involve any memtable insert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was assuming these will be no preparing transaction on db open, but I was wrong since there can be preparing transactions recovering from logs. Let me think..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the repair code doesn't support transactions, since it simply replay logs and write them into sst file but don't even look at prepare/commit markers.

SequenceNumber snapshot_sequence) const {
return sequence <= snapshot_sequence;
}
#endif // ROCKSDB_LITE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! ROCKSDB_LITE

#endif // !ROCKSDB_LITE
};

#ifdef ROCKSDB_LITE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not moving it to cc file?


bool SnapshotChecker::IsInSnapshot(SequenceNumber sequence,
SequenceNumber snapshot_sequence) const {
return sequence <= snapshot_sequence;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing. If we do not support WritePrepared in ROCKSDB_LITE, then SnapshotChecker should be set to nullptr. This function should never be called in that case so it should throw an exception if it is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snapshot_checker is not being used in LITE mode. The implementation here is just a place holder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that but providing an implementation that is never used could be confusing. Return true all the time with a proper in-line comment would make it more clear.

port::RWMutex old_commit_map_mutex_;
port::RWMutex commit_cache_mutex_;
port::RWMutex snapshots_mutex_;
mutable port::RWMutex prepared_mutex_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why mutable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding const keyword to IsInSnapshot call and mutable here means holding mutex doesn't violate const.

Copy link
Contributor

@maysamyabandeh maysamyabandeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2nd pass is done.
Instead of passing a blind nullptr to the functions it would be great if you add a const variable, assigned to nullptr with some comments that why it makes sense to pass nullptr in each case.

if (snapshot_checker_ != nullptr) {
return findEarliestVisibleSnapshotWithSnapshotChecker(in, prev_snapshot);
}
SequenceNumber prev = kMaxSequenceNumber;
for (const auto cur : *snapshots_) {
assert(prev == kMaxSequenceNumber || prev <= cur);
if (cur >= in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference between this func and findEarliestVisibleSnapshotWithSnapshotChecker is the check in this line. Can we instead of adding findEarliestVisibleSnapshotWithSnapshotChecker, simply put the if-then condition here: i.e., if (snapshot_checker_ != nullptr) call snapsh_checker else (cur >= in).

ASSERT_EQ(expected_versions[i].sequence, versions[i].sequence);
ASSERT_EQ(expected_versions[i].type, versions[i].type);
if (versions[i].type != kTypeDeletion &&
versions[i].type != kTypeSingleDeletion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Range Delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RangeDelete is not supported for transactions (seems even for StackableDB).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assert that here? Just in case it changes in future.

ASSERT_OK(db->Put(WriteOptions(), "a", "dummy"));
ASSERT_OK(db->Put(WriteOptions(), "z", "dummy"));
ASSERT_OK(db->CompactRange(CompactRangeOptions(), nullptr, nullptr));
VerifyInternalKeys({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make it easier to verify by eyeball checking? Currently it is not clear whether this values actually test the implementation or they are set to make the test pass. For example we can add an item to the verify list right after the final insertion for the key. This would not still help figuring the sequence number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by adding the expected_seq variable.

delete transaction;
}

TEST_P(WritePreparedTransactionTest, CompactionShouldKeepSnapshotVisibleKeys) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we disable auto compaction to make sure the test is deterministic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

ASSERT_OK(db->Flush(FlushOptions()));
db->ReleaseSnapshot(snapshot1);
db->ReleaseSnapshot(snapshot3);
// Dummy keys to avoid trivial move.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "move" mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compaction can trivially move file from lower level to upper level, which is called trivial move. When it happen CompactionIterator is not called and we are not testing actual compaction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if you put this response as an in-line comment.

VerifyInternalKeys({
{"key1", "value1_2", 5, kTypeValue},
// "value1_1" is visible to snapshot2. Also keys at bottom level visible
// to earliest snapshot will output with seq = 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that IsInSnapshot should always return true for seq == 0? Can you add a unit test for that and make the code changes in IsInSnapshot to let the test pass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in a separate PR.

ASSERT_OK(db->Put(WriteOptions(), "z", "dummy"));
ASSERT_OK(db->CompactRange(CompactRangeOptions(), nullptr, nullptr));
VerifyInternalKeys({
{"key1", "value1_2", 5, kTypeValue},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again it is hard to see if the seq 5 is verifying the code or it is a number we came up with to make the test pass. How about adding a variable exp_seq and ++ it after each operation. This way there would be less likely that we have got the math wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thought?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the expected_seq variable.

}

TEST_P(WritePreparedTransactionTest,
CompactionShouldKeepSequenceForUncommittedKeys) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name sounds similar to CompactionShouldKeepUncommittedKeys. Perhaps add some comments to clarify what the test does and how these two are different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request. View: changes, changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

// number equal to 0 if the key is not visible to earliest snapshot, based on
// commit sequence number.
TEST_P(WritePreparedTransactionTest,
CompactionShouldKeepSequenceForUncommittedKeys) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will disable this test before landing, since without #2972 it will fail.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request. View: changes, changes since last import

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -400,17 +400,21 @@ class Repairer {
int64_t _current_time = 0;
status = env_->GetCurrentTime(&_current_time); // ignore error
const uint64_t current_time = static_cast<uint64_t>(_current_time);
// Only TransactionDB make use of snapshot_checker and repair doesn't
// currently support TransactionDB with uncommitted prepared keys in WAL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Repair do any compaction? Does it need to be aware of uncommitted values? If yes, should we add a TODO to upgrade repair later?

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has updated the pull request.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiwu-arbug is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@yiwu-arbug yiwu-arbug deleted the txn_compaction branch October 6, 2017 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants