-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
significantly improve DB compilation speed #58
Conversation
This pull request was exported from Phabricator. Differential Revision: D54191772 |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #58 +/- ##
===========================================
- Coverage 53.57% 30.82% -22.75%
===========================================
Files 113 26 -87
Lines 10489 2073 -8416
===========================================
- Hits 5619 639 -4980
+ Misses 4519 1393 -3126
+ Partials 351 41 -310 ☔ View full report in Codecov by Sentry. |
018f10f
to
1217ef4
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
1217ef4
to
0793212
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
0793212
to
448733e
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
448733e
to
b40a405
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
b40a405
to
ab4f5b5
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
ab4f5b5
to
269d34b
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
269d34b
to
56bda5f
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
56bda5f
to
37b2471
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
64dbf83
to
6c412fa
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D54191772 |
Summary: Our bottleneck was unsurprisingly in sorting the entire dataset, which of course is CPU bound and single-threaded. But we don't need to do it like this. The reason we did so was to be able to guarantee that same keys end up in same buckets, combined with RocksDB requirement for data being [pre-sorted when we use SST ingestion](https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files). Instead, with this change, we use fast hashing to put same keys into the same buckets, and sort each bucket separately, in parallel, so it's ready for ingestion. NOTE: while RDB is happy with individually sorted SSTs, for real performance we need to either globally sort the data, or call compaction after the ingestion is done. This diff opted for sorting in-memory using merge sort, as this brings significant performance wins at small memory cost (~300mb on 50m keys). The result is a significant speedup in DB compilation, by around 50s on our biggest internal shard with 54m records. Reviewed By: deathowl Differential Revision: D54191772
6c412fa
to
cc99763
Compare
This pull request was exported from Phabricator. Differential Revision: D54191772 |
landed in dd16e29. closing |
Summary:
Our bottleneck was unsurprisingly in sorting the entire dataset, which of course is CPU bound and single-threaded.
But we don't need to do it like this.
The reason we did so was to be able to guarantee that same keys end up in same buckets, combined with RocksDB requirement for data being pre-sorted when we use SST ingestion.
Instead, with this change, we use fast hashing to put same keys into the same buckets, and sort each bucket separately, in parallel, so it's ready for ingestion.
The result is a significant speedup in DB compilation, up to 2x for big shards.
Reviewed By: t3lurid3
Differential Revision: D54191772