-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rocksandra] Cassandra compaction filter for purge expired columns and rows #2588
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
@wpc updated the pull request - view changes |
@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@wpc updated the pull request - view changes - changes since last import |
@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@wpc updated the pull request - view changes - changes since last import |
@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@wpc updated the pull request - view changes - changes since last import |
@wpc updated the pull request - view changes - changes since last import |
@wpc updated the pull request - view changes - changes since last import |
@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@wpc updated the pull request - view changes - changes since last import |
@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
utilities/cassandra/format.h
Outdated
@@ -82,8 +82,8 @@ class ColumnBase { | |||
virtual int8_t Index() const; | |||
virtual std::size_t Size() const; | |||
virtual void Serialize(std::string* dest) const; | |||
|
|||
static std::unique_ptr<ColumnBase> Deserialize(const char* src, | |||
virtual bool Collectable() const = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GarbageCollectable for clarification maybe?
utilities/cassandra/format.cc
Outdated
@@ -112,7 +116,11 @@ void ExpiringColumn::Serialize(std::string* dest) const { | |||
rocksdb::cassandra::Serialize<int32_t>(ttl_, dest); | |||
} | |||
|
|||
std::unique_ptr<ExpiringColumn> ExpiringColumn::Deserialize( | |||
bool ExpiringColumn::Collectable() const { | |||
return Timestamp() + ttl_ * 1000000 < time(NULL) * 1000000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- not sure if we should use c++ 11 time style here: http://en.cppreference.com/w/cpp/chrono/time_point
- I believe we should turn it into a tombstone first, then compaction filter kicks in according to gc_grace_period.
utilities/cassandra/format.cc
Outdated
// * and there are no live data for the key in higher level. | ||
// We cannot get info for the second condidtion yet, so we | ||
// will just keep tombstone around for now. | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one possible solution is that we could make the gc_grace_seconds as part of compaction filter's constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. But we still do not have information on whether there are live data on the higher level sstable files yet. It feels weird passing in gc_grace_seconds but has no place using it.
@wpc updated the pull request - view changes - changes since last import |
@scv119 I updated the PR:
|
@wpc updated the pull request - view changes - changes since last import |
@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@wpc updated the pull request - view changes - changes since last import |
// This source code is licensed under the BSD-style license found in the | ||
// LICENSE file in the root directory of this source tree. An additional grant | ||
// of patent rights can be found in the PATENTS file in the same directory. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This source code is also licensed under the GPLv2 license found in the
// COPYING file in the root directory of this source tree.
I believe we also need add this two lines to all source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently updated the license to Apache+GPLv2, and updated the copyright headers in all the files in RocksDB. So please use those headers here and in all the new files.
utilities/cassandra/format.cc
Outdated
} | ||
return RowValue(std::move(new_columns), last_modified_time_); | ||
|
||
return RowValue(columns_, last_modified_time_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicated returns
namespace rocksdb { | ||
namespace cassandra { | ||
|
||
class CassandraCompactionFilter : public CompactionFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might worth add some comments on what's the expected behavior of this filter, and more importantly, the option purge_ttl_on_expiration
utilities/cassandra/format.h
Outdated
@@ -159,6 +166,9 @@ class RowValue { | |||
// otherwise it returns the max timestamp of containing columns. | |||
int64_t LastModifiedTime() const; | |||
void Serialize(std::string* dest) const; | |||
RowValue PurgeTTL(bool* changed) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ttl or TTL ? not sure the convention in Rocksdb but I see both TTL and Ttl is used in this PR.
utilities/cassandra/format.cc
Outdated
} | ||
return RowValue(std::move(new_columns), last_modified_time_); | ||
|
||
return RowValue(columns_, last_modified_time_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line?
utilities/cassandra/format.cc
Outdated
for (auto& column : columns_) { | ||
if(column->Expired()) { | ||
if(shared_ptr<Tombstone> tombstone = column->ToTombstone()) { | ||
new_columns.push_back(tombstone); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use move instead of copy here should give us some performance gain.
utilities/cassandra/format.cc
Outdated
} | ||
|
||
bool ExpiringColumn::Expired() const { | ||
return Timestamp() + ttl_ * 1000000 < NowMicros(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://en.cppreference.com/w/cpp/chrono/duration there are a bunch of utilities could help the conversion.
@wpc updated the pull request - view changes - changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logics looks good to me, few nits:
- GPLv2 licence in the comment header?
- consider using std::chrono:: to do the cast between second and microsecond.
I'll leave this to @sagar0 to approve.
@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Thanks @wpc for contributing!
|
@@ -301,6 +302,51 @@ TEST(RowValueTest, RowWithColumns) { | |||
std::memcmp(dest.c_str(), dest.c_str() + r.Size(), r.Size()) == 0); | |||
} | |||
|
|||
TEST(RowValueTest, PurgeTTLShouldRemvoeAllColumnExpired) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/RemvoeAllColumn/RemoveAllColumns/
EXPECT_FALSE(changed); | ||
} | ||
|
||
TEST(RowValueTest, ExpiresTTLShouldConvertExpiredColumnsToTombstone) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Expires/Expired/
@wpc updated the pull request - view changes - changes since last import |
@wpc updated the pull request - view changes - changes since last import |
@wpc updated the pull request - view changes - changes since last import |
@sagar0 and @scv119 I updated the PR base on the review feedback:
|
@wpc has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Major changes in this PR: