-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include estimated bytes deleted by range tombstones in compensated file size #10734
Conversation
704581b
to
d7ace18
Compare
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
d7ace18
to
9b4e2e6
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
8575490
to
80f759d
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
80f759d
to
3cbf870
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
3cbf870
to
745f152
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@@ -227,6 +227,13 @@ bool VersionEdit::EncodeTo(std::string* dst) const { | |||
std::string unique_id_str = EncodeUniqueIdBytes(&unique_id); | |||
PutLengthPrefixedSlice(dst, Slice(unique_id_str)); | |||
} | |||
if (f.compensated_range_deletion_size) { | |||
PutVarint32(dst, kCompensatedRangeDeletionSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change would add a new field kCompensatedRangeDeletionSize
(how many bytes the range tombstones in the current file overlaps with older levels) and store it inkNewFile4
records. Wondering if you have any concern/suggestion about this? @siying @pdillinger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks OK to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have yet to reconcile this conceptual gap in my head: the MANIFEST entry for a kNewFile4
currently only contains properties of the file itself. The FileMetaData
structure does also have properties of the file as it relates to the rest of the DB (compensated_file_size
). But those properties are recomputed every time the file is loaded, so if the rest of the LSM changes those changes can be reflected in those properties' values.
I am wondering if we can start off with something similar here. For example what if you compute stats for average range tombstone bytes covered and use that together with the persisted num_range_deletions
to fill in FileMetaData::compensated_range_deletion_size
? The downside (we might've talked about this before...) of course is it totally misses an occasional extra-wide range tombstone. I wonder if that can be fixed in an incremental improvement, though, like if we ever have a metric for key distance that can be used to measure width of all range tombstones in a file. We could store that in kNewFile4
since it's entirely within the scope of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach LGTM. Few comments on the details.
I like this approach in that it's much more efficient, especially in case when there is a lot of range tombstones in a file, as |
I was thinking it would be the average GetApproximateSize() across a sample of range tombstones we examine during the stats calculation phase of DB open. Note that comment is a month old and I'm OK with the approach in this PR now too. |
d92f4c1
to
ab9f418
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
ab9f418
to
9986747
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: compensate file sizes in compaction picking so files with range tombstones are preferred, such that they get compacted down earlier as they tend to delete a lot of data. This PR adds a
compensated_range_deletion_size
field in FileMeta that is computed during Flush/Compaction and persisted in MANIFEST. This value is added tocompensated_file_size
which will be used for compaction picking. Currently, for a file in level L,compensated_range_deletion_size
is set to the estimated bytes deleted by range tombstone of this file in all levels > L. This helps to reduce space amp when data in older levels are covered by range tombstones in level L.Test plan:
compensated_range_deletion_size
is reducing space amp as intended, without affecting write amp too much. The experiment set up favorable for this optimization: large range tombstone issued infrequently. Command used:In this experiment, each thread wrote 16 range tombstones over the duration of 30 minutes, each range tombstone has width 5M that is the 10% of the key space width. Results shows this PR generates a smaller DB size.
Compaction stats from this PR:
Compaction stats from main: