Skip to content

Add new heauristic 'num_collapsible_entry_reads_sampled' #14434

Closed
joshkang97 wants to merge 4 commits into
facebook:mainfrom
joshkang97:unwanted_io_stats
Closed

Add new heauristic 'num_collapsible_entry_reads_sampled' #14434
joshkang97 wants to merge 4 commits into
facebook:mainfrom
joshkang97:unwanted_io_stats

Conversation

@joshkang97
Copy link
Copy Markdown
Contributor

@joshkang97 joshkang97 commented Mar 6, 2026

Summary

Add per-file sampling of "collapsible" entry reads (single deletions, merges, and kNotFound results) that may later be used to help inform read-triggered compactions. This is a better metric than num_reads_sampled as it is more targeted towards reads that could be avoided via compaction.

The existing behavior of num_reads_sampled is that reads only gets sampled on iterator creation for a file. It is problematic because next/prev() calls are not sampled, nor are additional seeks().

This PR moves sampling to per-seek/next granularity within LevelIterator and adds a new num_collapsible_entry_reads_sampled counter that tracks how often a file serves entries that could be eliminated by compaction.

Note only L1+ files have iterator seeks/nexts/prevs sampled. Introducing this at L0 would require wrapping table reader iterators, introducing a performance cost.

Key changes

  • New counter num_collapsible_entry_reads_sampled in FileSampledStats tracks sampled reads that encounter deletions, single deletions, merges, or kNotFound results in both Get and Iterator paths.
  • Moved sampling from file-open to per-operation in LevelIterator: sampling now happens in SampleRead() called from Seek(), SeekForPrev(), SeekToFirst(), SeekToLast(), Next(), NextAndGetResult(), and Prev(). The should_sample parameter was removed from LevelIterator's constructor.
  • Differentiated sampling rate for Next() vs Seek(): should_sample_file_read_next() uses a 64x lower sampling rate (kFileReadSampleRate * 64) since Next() is cheaper than Seek() and called more frequently.
  • Collapsible tracking in Get path: Version::Get() now increments the collapsible counter when GetContext::State() is kNotFound, kMerge, or kDeleted.
  • Collapsible tracking in MultiGet path: MultiGetFromSST also increments the collapsible counter for the same states.

Test Plan

  • Added new DB tests for both num_reads_sampled and num_collapsible_entry_reads_sampled

Benchmark results (readrandom, readseq)

Setup: 1M keys, 16-byte keys, 100-byte values, no compression, fillrandom+compact

Benchmark Params ops/s (main) ops/s (feature) % change
readrandom seed=1, threads=1 387,194 389,449 +0.6%
readseq seed=1, threads=1 5,598,371 5,572,975 -0.5%

No meaningful performance regression observed — differences are within run-to-run noise.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 6, 2026

✅ clang-tidy: No findings on changed lines

Completed in 291.3s.

@joshkang97 joshkang97 marked this pull request as ready for review March 6, 2026 22:23
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 6, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D95613793.

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 6, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D95613793.

Copy link
Copy Markdown
Contributor

@xingbowang xingbowang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.
I don't see num_collapsible_entry_reads_sampled is used. I guess there is a follow up change to leverage it in the compaction logic.

Comment on lines +23 to +24
// Decrease probability of sampling next() to discount it as it is cheaper
// than seek()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more accurate, it is more like next() is called a lot more frequent than seek. Therefore, we want to lower the sampling rate to avoid introducing performance penalty. I guess this is why we use a counter instead of random to make sampling decision as well.

Copy link
Copy Markdown
Contributor Author

@joshkang97 joshkang97 Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual idea is to decrease probability but keep the amount the same. The reasoning is that the read cost of a seek() is ~64x (very loose estimation) as expensive as a next().

And yep the lowered probability lowers the cost of sample_collapsible_entry_file_read_inc(), and using a counter is more performant than division

Comment on lines +31 to +39
inline void sample_file_read_inc(const FileMetaData* meta) {
meta->stats.num_reads_sampled.fetch_add(kFileReadSampleRate,
std::memory_order_relaxed);
}

inline void sample_collapsible_entry_file_read_inc(const FileMetaData* meta) {
meta->stats.num_collapsible_entry_reads_sampled.fetch_add(
kFileReadSampleRate, std::memory_order_relaxed);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we increase different amount based on whether it is seek or next, as their sampling rate is different.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied in my other comment

@meta-codesync meta-codesync Bot closed this in 42eff8b Mar 9, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 9, 2026

@joshkang97 merged this pull request in 42eff8b.

doxtop pushed a commit to flyingw/rocksdb that referenced this pull request Apr 7, 2026
)

Summary:
Add per-file sampling of "collapsible" entry reads (single deletions, merges, and kNotFound results) that may later be used to help inform read-triggered compactions. This is a better metric than `num_reads_sampled` as it is more targeted towards reads that could be avoided via compaction.

The existing behavior of `num_reads_sampled` is that reads only gets sampled on iterator creation for a file. It is problematic because next/prev() calls are not sampled, nor are additional seeks().

This PR moves sampling to per-seek/next granularity within `LevelIterator` and adds a new `num_collapsible_entry_reads_sampled` counter that tracks how often a file serves entries that could be eliminated by compaction.

 Note only L1+ files have iterator seeks/nexts/prevs sampled. Introducing this at L0 would require wrapping table reader iterators, introducing a performance cost.

## Key changes

- **New counter `num_collapsible_entry_reads_sampled`** in `FileSampledStats` tracks sampled reads that encounter deletions, single deletions, merges, or kNotFound results in both Get and Iterator paths.
- **Moved sampling from file-open to per-operation** in `LevelIterator`: sampling now happens in `SampleRead()` called from `Seek()`, `SeekForPrev()`, `SeekToFirst()`, `SeekToLast()`, `Next()`, `NextAndGetResult()`, and `Prev()`. The `should_sample` parameter was removed from `LevelIterator`'s constructor.
- **Differentiated sampling rate for Next() vs Seek()**: `should_sample_file_read_next()` uses a 64x lower sampling rate (`kFileReadSampleRate * 64`) since Next() is cheaper than Seek() and called more frequently.
- **Collapsible tracking in Get path**: `Version::Get()` now increments the collapsible counter when `GetContext::State()` is `kNotFound`, `kMerge`, or `kDeleted`.
- **Collapsible tracking in MultiGet path**: `MultiGetFromSST` also increments the collapsible counter for the same states.

Pull Request resolved: facebook#14434

Test Plan:
- Added new DB tests for both num_reads_sampled and num_collapsible_entry_reads_sampled

### Benchmark results (readrandom, readseq)

Setup: 1M keys, 16-byte keys, 100-byte values, no compression, fillrandom+compact

| Benchmark  | Params             | ops/s (main) | ops/s (feature) | % change |
|------------|--------------------|-------------|--------------------------|----------|
| readrandom | seed=1, threads=1  | 387,194     | 389,449                  | +0.6%    |
| readseq    | seed=1, threads=1  | 5,598,371   | 5,572,975                | -0.5%    |

No meaningful performance regression observed — differences are within run-to-run noise.

Reviewed By: xingbowang

Differential Revision: D95613793

Pulled By: joshkang97

fbshipit-source-id: 9dd09c9b7527b148424bde5686f4157c7a9e1214
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants