-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Add support in CompactForTieringCollectorFactory to collect data write time #13569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8d97c51 to
aa28469
Compare
|
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
pdillinger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking good. I would like to see a SST building-heavy performance test between (a) no tracking before this change, (b) no tracking after this change, (c) with tracking.
|
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
I did some benchmarking with this test script:
I ran the script for three rounds, and it shows consistently a minor regression from (a) to (b), and bigger regression from (b) to (c): 979631 vs. 972503 vs. 937697 (ops/second) 975044 vs. 969513 vs. 935618 (ops/second) |
|
I'd hate for everyone to pay a small tax for this feature, and a sizable one for those actually using it. Can we re-work this so that the inner loop of compaction is unmodified? E.g. pass a time mapping on collector creation and let the collector do any mapping itself? Also, what do we expect the statistics for aggregation to look like? Do we need to map each entry's time before inputting it into statistics or can we gather statistics and then map the results? Percentile data (e.g. boundary for each 10% of entries by age), for example, can be mapped afterwards with no loss of information. |
Thanks for checking this and offering these improvement ideas. I think both of these are promising. I will try these ideas out. Let me put this PR back to draft state. |
This PR adds support in
CompactForTieringCollectorFactoryandCompactForTieringCollectorto collect and aggregate data write time metric.In order to do this, an internal API
TableBuilder::SetSeqnoTimeForTrackingWriteTimeis added to be called before table building starts, so that write time tracking per data entry is available while table is being built.Two public API
TablePropertiesCollector::AddUserKeyWithWriteTime(...)andTablePropertiesCollector::WriteTimeTrackingEnabled(...)are added to enable table builder passing per data entry write time to table property collectors.This PR also include implementation for the util method
GetDataCollectionUnixWriteTimeInfoForFileto create aDataCollectionUnixWriteTimeInfofrom table properties.A follow up is needed to aggregate
DataCollectionUnixWriteTimeInfo, such as across all the files on a level.Test plan:
Added unit tests