-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(pageserver): improve in-memory layer vectored get #7467
Conversation
2912 tests run: 2778 passed, 0 failed, 134 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
e401025 at 2024-04-29T18:10:16.732Z :recycle: |
...and another benefit here is that it saves 50% memory when freezing the in-memory layer. The current implementation gets all keys out of the index and sort it. Now we can directly write it without sorting b/c BTreeMap already sorts it. 0298e9e |
benchmark result: baseline
this commit
note that |
Hard to tell from this one test. Might be worth asking for Christian's opinion as he's been staring at these number lately. |
@VladLazar thinking about test plan, what about having a histogram metrics in prod of the latency of accessing each type of layer? i.e., for each of the get page at lsn request, we record the breakdown of time spent on open layer, N in-memory layer, and M on-disk/remote layers, these are three numbers. I can have a pull request tomorrow and get the metrics merged into the next release, so that we can have this pull request merged in the release after the next one. |
75866ae
to
42f3586
Compare
My (perhaps somewhat theoretical) concern is "how does this change impact ingest performance? In my previous comment I was asking about what you think we should monitor to ensure there's no significant regression in ingest perf. Let's chat about it if anything is unclear. I agree with you that this change should have a positive impact for the read path. The metrics you are proposing for the read path would be nice, but perhaps intrusive/expensive. |
@VladLazar for the write path, the code path that calls into in-memory layer put_value is
I can submit a pull request later today to add the second metrics. |
Makes sense. We can craft a query which takes into account pageserver_wal_ingest_records_committed, pageserver_wal_ingest_records_received and pageserver_wal_ingest_records_filtered to validate to see what happens to the ingest rate. |
plan to merge after #7515 gets released |
As a follow-up on #7467, also measure the ingestion operation speed. Signed-off-by: Alex Chi Z <chi@neon.tech>
39079a2
to
41e043d
Compare
Signed-off-by: Alex Chi Z <chi@neon.tech> fix ce Signed-off-by: Alex Chi Z <chi@neon.tech> no sort for freezing Signed-off-by: Alex Chi Z <chi@neon.tech> fix Signed-off-by: Alex Chi Z <chi@neon.tech> fix clippy Signed-off-by: Alex Chi Z <chi@neon.tech> remove sort comments Signed-off-by: Alex Chi Z <chi@neon.tech>
41e043d
to
e401025
Compare
Problem
previously in #7375, we observed that for in-memory layers, we will need to iterate every key in the key space in order to get the result. The operation can be more efficient if we use BTreeMap as the in-memory layer representation, even if we are doing vectored get in a dense keyspace. Imagine a case that the in-memory layer covers a very little part of the keyspace, and most of the keys need to be found in lower layers. Using a BTreeMap can significantly reduce probes for nonexistent keys.
Summary of changes
Alternatively, I can proceed with having a separate read path for the scan interface and leave this piece of code untouched.Does not seem easy to efficiently implement scan without BTreeMap.Checklist before requesting a review
Checklist before merging