Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include estimated bytes deleted by range tombstones in compensated file size #10734

Closed
wants to merge 8 commits into from

Conversation

cbi42
Copy link
Member

@cbi42 cbi42 commented Sep 26, 2022

Summary: compensate file sizes in compaction picking so files with range tombstones are preferred, such that they get compacted down earlier as they tend to delete a lot of data. This PR adds a compensated_range_deletion_size field in FileMeta that is computed during Flush/Compaction and persisted in MANIFEST. This value is added to compensated_file_size which will be used for compaction picking. Currently, for a file in level L, compensated_range_deletion_size is set to the estimated bytes deleted by range tombstone of this file in all levels > L. This helps to reduce space amp when data in older levels are covered by range tombstones in level L.

Test plan:

  • Added unit tests.
  • benchmark to check if the above definition compensated_range_deletion_size is reducing space amp as intended, without affecting write amp too much. The experiment set up favorable for this optimization: large range tombstone issued infrequently. Command used:
./db_bench -benchmarks=fillrandom,waitforcompaction,stats,levelstats -use_existing_db=false -avoid_flush_during_recovery=true -write_buffer_size=33554432 -level_compaction_dynamic_level_bytes=true -max_background_jobs=8 -max_bytes_for_level_base=134217728 -target_file_size_base=33554432 -writes_per_range_tombstone=500000 -range_tombstone_width=5000000 -num=50000000 -benchmark_write_rate_limit=8388608 -threads=16 -duration=1800 --max_num_range_tombstones=1000000000

In this experiment, each thread wrote 16 range tombstones over the duration of 30 minutes, each range tombstone has width 5M that is the 10% of the key space width. Results shows this PR generates a smaller DB size.

Compaction stats from this PR:

Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      2/0   31.54 MB   0.5      0.0     0.0      0.0       8.4      8.4       0.0   1.0      0.0     63.4    135.56            110.94       544    0.249       0      0       0.0       0.0
  L4      3/0   96.55 MB   0.8     18.5     6.7     11.8      18.4      6.6       0.0   2.7     65.3     64.9    290.08            284.03       108    2.686    284M  1957K       0.0       0.0
  L5     15/0   404.41 MB   1.0     19.1     7.7     11.4      18.8      7.4       0.3   2.5     66.6     65.7    292.93            285.34       220    1.332    293M  3808K       0.0       0.0
  L6    143/0    4.12 GB   0.0     45.0     7.5     37.5      41.6      4.1       0.0   5.5     71.2     65.9    647.00            632.66       251    2.578    739M    47M       0.0       0.0
 Sum    163/0    4.64 GB   0.0     82.6    21.9     60.7      87.2     26.5       0.3  10.4     61.9     65.4   1365.58           1312.97      1123    1.216   1318M    52M       0.0       0.0

Compaction stats from main:

Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0       8.4      8.4       0.0   1.0      0.0     60.5    142.12            115.89       569    0.250       0      0       0.0       0.0
  L4      3/0   85.68 MB   1.0     17.7     6.8     10.9      17.6      6.7       0.0   2.6     62.7     62.3    289.05            281.79       112    2.581    272M  2309K       0.0       0.0
  L5     11/0   293.73 MB   1.0     18.8     7.5     11.2      18.5      7.2       0.5   2.5     64.9     63.9    296.07            288.50       220    1.346    288M  4365K       0.0       0.0
  L6    130/0    3.94 GB   0.0     51.5     7.6     43.9      47.9      3.9       0.0   6.3     67.2     62.4    784.95            765.92       258    3.042    848M    51M       0.0       0.0
 Sum    144/0    4.31 GB   0.0     88.0    21.9     66.0      92.3     26.3       0.5  11.0     59.6     62.5   1512.19           1452.09      1159    1.305   1409M    58M       0.0       0.0```

@cbi42 cbi42 force-pushed the compensate-range-deletion branch 2 times, most recently from 704581b to d7ace18 Compare September 26, 2022 23:18
@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@cbi42 cbi42 requested a review from ajkr October 1, 2022 00:17
@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -227,6 +227,13 @@ bool VersionEdit::EncodeTo(std::string* dst) const {
std::string unique_id_str = EncodeUniqueIdBytes(&unique_id);
PutLengthPrefixedSlice(dst, Slice(unique_id_str));
}
if (f.compensated_range_deletion_size) {
PutVarint32(dst, kCompensatedRangeDeletionSize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change would add a new field kCompensatedRangeDeletionSize (how many bytes the range tombstones in the current file overlaps with older levels) and store it inkNewFile4 records. Wondering if you have any concern/suggestion about this? @siying @pdillinger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks OK to me.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have yet to reconcile this conceptual gap in my head: the MANIFEST entry for a kNewFile4 currently only contains properties of the file itself. The FileMetaData structure does also have properties of the file as it relates to the rest of the DB (compensated_file_size). But those properties are recomputed every time the file is loaded, so if the rest of the LSM changes those changes can be reflected in those properties' values.

I am wondering if we can start off with something similar here. For example what if you compute stats for average range tombstone bytes covered and use that together with the persisted num_range_deletions to fill in FileMetaData::compensated_range_deletion_size? The downside (we might've talked about this before...) of course is it totally misses an occasional extra-wide range tombstone. I wonder if that can be fixed in an incremental improvement, though, like if we ever have a metric for key distance that can be used to measure width of all range tombstones in a file. We could store that in kNewFile4 since it's entirely within the scope of the file.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach LGTM. Few comments on the details.

db/builder.cc Outdated Show resolved Hide resolved
db/builder.cc Outdated Show resolved Hide resolved
db/db_impl/db_impl_open.cc Outdated Show resolved Hide resolved
@cbi42
Copy link
Member Author

cbi42 commented Dec 15, 2022

compute stats for average range tombstone bytes covered and use that together with the persisted num_range_deletions to fill in FileMetaData::compensated_range_deletion_size

I like this approach in that it's much more efficient, especially in case when there is a lot of range tombstones in a file, as versions->ApproximateSize() is not very cheap. For "average range tombstone bytes", do you mean a fixed weight multiplied by average value size of the file as done similarly for point tombstones? That can be a smaller/less risky change and we can start off with a maybe conservative weight (and maybe consider a weight as a function of level in the LSM tree).

@ajkr
Copy link
Contributor

ajkr commented Dec 15, 2022

For "average range tombstone bytes", do you mean a fixed weight multiplied by average value size of the file as done similarly for point tombstones? That can be a smaller/less risky change and we can start off with a maybe conservative weight (and maybe consider a weight as a function of level in the LSM tree).

I was thinking it would be the average GetApproximateSize() across a sample of range tombstones we examine during the stats calculation phase of DB open. Note that comment is a month old and I'm OK with the approach in this PR now too.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 merged this pull request in cc6f323.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants