chore(pageserver): improve in-memory layer vectored get #7467

skyzh · 2024-04-22T19:23:48Z

Problem

previously in #7375, we observed that for in-memory layers, we will need to iterate every key in the key space in order to get the result. The operation can be more efficient if we use BTreeMap as the in-memory layer representation, even if we are doing vectored get in a dense keyspace. Imagine a case that the in-memory layer covers a very little part of the keyspace, and most of the keys need to be found in lower layers. Using a BTreeMap can significantly reduce probes for nonexistent keys.

Summary of changes

Use BTreeMap as in-memory layer representation.
Optimize the vectored get flow to utilize the range scan functionality of BTreeMap.

~~Alternatively, I can proceed with having a separate read path for the scan interface and leave this piece of code untouched.~~ Does not seem easy to efficiently implement scan without BTreeMap.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

github-actions · 2024-04-22T19:29:00Z

2912 tests run: 2778 passed, 0 failed, 134 skipped (full report)

Flaky tests (1)

Postgres 15

test_partial_evict_tenant[relative_spare]: release

Code coverage* (full report)

functions: 28.1% (6552 of 23322 functions)
lines: 46.7% (46278 of 99183 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
e401025 at 2024-04-29T18:10:16.732Z :recycle:}

pageserver/src/tenant/storage_layer/inmemory_layer.rs

skyzh · 2024-04-23T14:17:57Z

...and another benefit here is that it saves 50% memory when freezing the in-memory layer. The current implementation gets all keys out of the index and sort it. Now we can directly write it without sorting b/c BTreeMap already sorts it. 0298e9e

skyzh · 2024-04-23T16:05:03Z

benchmark result:

baseline

test_bulk_insert[neon-release-pg14].insert: 5.188 s
test_bulk_insert[neon-release-pg14].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg14].peak_mem: 379,360 MB
test_bulk_insert[neon-release-pg14].size: 0 MB
test_bulk_insert[neon-release-pg14].data_uploaded: 498 MB
test_bulk_insert[neon-release-pg14].num_files_uploaded: 3
test_bulk_insert[neon-release-pg14].wal_written: 345 MB
test_bulk_insert[neon-release-pg14].wal_recovery: 6.429 s
test_bulk_insert[neon-release-pg15].insert: 5.280 s
test_bulk_insert[neon-release-pg15].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg15].peak_mem: 384,768 MB
test_bulk_insert[neon-release-pg15].size: 0 MB
test_bulk_insert[neon-release-pg15].data_uploaded: 495 MB
test_bulk_insert[neon-release-pg15].num_files_uploaded: 3
test_bulk_insert[neon-release-pg15].wal_written: 345 MB
test_bulk_insert[neon-release-pg15].wal_recovery: 6.535 s
test_bulk_insert[neon-release-pg16].insert: 5.147 s
test_bulk_insert[neon-release-pg16].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg16].peak_mem: 377,120 MB
test_bulk_insert[neon-release-pg16].size: 0 MB
test_bulk_insert[neon-release-pg16].data_uploaded: 498 MB
test_bulk_insert[neon-release-pg16].num_files_uploaded: 3
test_bulk_insert[neon-release-pg16].wal_written: 346 MB
test_bulk_insert[neon-release-pg16].wal_recovery: 6.437 s

this commit

test_bulk_insert[neon-release-pg14].insert: 5.725 s
test_bulk_insert[neon-release-pg14].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg14].peak_mem: 382,400 MB
test_bulk_insert[neon-release-pg14].size: 0 MB
test_bulk_insert[neon-release-pg14].data_uploaded: 498 MB
test_bulk_insert[neon-release-pg14].num_files_uploaded: 3
test_bulk_insert[neon-release-pg14].wal_written: 345 MB
test_bulk_insert[neon-release-pg14].wal_recovery: 6.376 s
test_bulk_insert[neon-release-pg15].insert: 5.343 s
test_bulk_insert[neon-release-pg15].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg15].peak_mem: 373,008 MB
test_bulk_insert[neon-release-pg15].size: 0 MB
test_bulk_insert[neon-release-pg15].data_uploaded: 495 MB
test_bulk_insert[neon-release-pg15].num_files_uploaded: 3
test_bulk_insert[neon-release-pg15].wal_written: 345 MB
test_bulk_insert[neon-release-pg15].wal_recovery: 6.223 s
test_bulk_insert[neon-release-pg16].insert: 5.510 s
test_bulk_insert[neon-release-pg16].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg16].peak_mem: 384,576 MB
test_bulk_insert[neon-release-pg16].size: 0 MB
test_bulk_insert[neon-release-pg16].data_uploaded: 498 MB
test_bulk_insert[neon-release-pg16].num_files_uploaded: 3
test_bulk_insert[neon-release-pg16].wal_written: 346 MB
test_bulk_insert[neon-release-pg16].wal_recovery: 6.260 s

note that peak_mem should be divided by 1024 due to the difference of units on macOS/Linux. There is some slowdown but the difference is not very significant. Also there are no pageserver_writes so the benefit of no sorting before freezing is not observed in the benchmark.

VladLazar · 2024-04-23T16:40:51Z

There is some slowdown but the difference is not very significant. Also there are no pageserver_writes so the benefit of no sorting before freezing is not observed in the benchmark.

Hard to tell from this one test. Might be worth asking for Christian's opinion as he's been staring at these number lately.
I'd personally be fine with merging this if you have a plan for what to monitor in prod.

skyzh · 2024-04-24T19:58:49Z

@VladLazar thinking about test plan, what about having a histogram metrics in prod of the latency of accessing each type of layer? i.e., for each of the get page at lsn request, we record the breakdown of time spent on open layer, N in-memory layer, and M on-disk/remote layers, these are three numbers. I can have a pull request tomorrow and get the metrics merged into the next release, so that we can have this pull request merged in the release after the next one.

VladLazar · 2024-04-25T09:03:57Z

@VladLazar thinking about test plan, what about having a histogram metrics in prod of the latency of accessing each type of layer? i.e., for each of the get page at lsn request, we record the breakdown of time spent on open layer, N in-memory layer, and M on-disk/remote layers, these are three numbers. I can have a pull request tomorrow and get the metrics merged into the next release, so that we can have this pull request merged in the release after the next one.

My (perhaps somewhat theoretical) concern is "how does this change impact ingest performance? In my previous comment I was asking about what you think we should monitor to ensure there's no significant regression in ingest perf. Let's chat about it if anything is unclear.

I agree with you that this change should have a positive impact for the read path. The metrics you are proposing for the read path would be nice, but perhaps intrusive/expensive.

skyzh · 2024-04-25T14:17:24Z

@VladLazar for the write path, the code path that calls into in-memory layer put_value is Timeline::put_batch. I can think of two places where I can monitor:

wal_ingest series of metrics, i.e., pageserver_wal_ingest_records_committed, which already exists.
add a new timer histogram for DatadirModification::commit that measures commit speed.

I can submit a pull request later today to add the second metrics.

VladLazar · 2024-04-25T15:53:00Z

@VladLazar for the write path, the code path that calls into in-memory layer put_value is Timeline::put_batch. I can think of two places where I can monitor:
* wal_ingest series of metrics, i.e., pageserver_wal_ingest_records_committed, which already exists.

* add a new timer histogram for DatadirModification::commit that measures commit speed.
I can submit a pull request later today to add the second metrics.

Makes sense. We can craft a query which takes into account pageserver_wal_ingest_records_committed, pageserver_wal_ingest_records_received and pageserver_wal_ingest_records_filtered to validate to see what happens to the ingest rate.

pageserver/src/tenant/storage_layer/inmemory_layer.rs

skyzh · 2024-04-25T16:16:52Z

plan to merge after #7515 gets released

As a follow-up on #7467, also measure the ingestion operation speed. Signed-off-by: Alex Chi Z <chi@neon.tech>

Signed-off-by: Alex Chi Z <chi@neon.tech> fix ce Signed-off-by: Alex Chi Z <chi@neon.tech> no sort for freezing Signed-off-by: Alex Chi Z <chi@neon.tech> fix Signed-off-by: Alex Chi Z <chi@neon.tech> fix clippy Signed-off-by: Alex Chi Z <chi@neon.tech> remove sort comments Signed-off-by: Alex Chi Z <chi@neon.tech>

The metrics was added in #7515 to observe if #7467 introduces any perf regressions. The change was deployed on 5/7 and no changes are observed in the metrics. So it's safe to remove the metrics now. Signed-off-by: Alex Chi Z <chi@neon.tech>

skyzh requested a review from VladLazar April 22, 2024 19:23

skyzh requested a review from a team as a code owner April 22, 2024 19:23

skyzh changed the title ~~chore(pageserver): make in-memory layer vectored get more efficient~~ chore(pageserver): improve in-memory layer vectored get Apr 22, 2024

skyzh mentioned this pull request Apr 22, 2024

feat(pageserver): add scan interface #7468

Merged

5 tasks

VladLazar reviewed Apr 23, 2024

View reviewed changes

pageserver/src/tenant/storage_layer/inmemory_layer.rs Show resolved Hide resolved

pageserver/src/tenant/storage_layer/inmemory_layer.rs Show resolved Hide resolved

skyzh requested a review from problame April 23, 2024 17:06

problame removed their request for review April 23, 2024 17:07

skyzh added the run-benchmarks Indicates to the CI that benchmarks should be run for PR marked with this label label Apr 24, 2024

skyzh force-pushed the skyzh/btreemap-inmem-layer branch from 75866ae to 42f3586 Compare April 24, 2024 20:07

VladLazar approved these changes Apr 25, 2024

View reviewed changes

pageserver/src/tenant/storage_layer/inmemory_layer.rs Outdated Show resolved Hide resolved

skyzh mentioned this pull request Apr 25, 2024

chore(pageserver): temporary metrics on ingestion time #7515

Merged

5 tasks

skyzh added a commit that referenced this pull request Apr 25, 2024

chore(pageserver): temporary metrics on ingestion time (#7515)

c59abed

As a follow-up on #7467, also measure the ingestion operation speed. Signed-off-by: Alex Chi Z <chi@neon.tech>

skyzh force-pushed the skyzh/btreemap-inmem-layer branch from 39079a2 to 41e043d Compare April 26, 2024 17:41

skyzh enabled auto-merge (squash) April 29, 2024 16:34

skyzh force-pushed the skyzh/btreemap-inmem-layer branch from 41e043d to e401025 Compare April 29, 2024 16:34

skyzh merged commit 11945e6 into main Apr 29, 2024
51 of 52 checks passed

skyzh deleted the skyzh/btreemap-inmem-layer branch April 29, 2024 17:16

skyzh mentioned this pull request May 21, 2024

chore(pageserver): remove metrics for in-memory ingestion #7823

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(pageserver): improve in-memory layer vectored get #7467

chore(pageserver): improve in-memory layer vectored get #7467

skyzh commented Apr 22, 2024 •

edited

github-actions bot commented Apr 22, 2024 •

edited

Postgres 15

skyzh commented Apr 23, 2024

skyzh commented Apr 23, 2024

VladLazar commented Apr 23, 2024

skyzh commented Apr 24, 2024

VladLazar commented Apr 25, 2024 •

edited

skyzh commented Apr 25, 2024

VladLazar commented Apr 25, 2024

skyzh commented Apr 25, 2024 •

edited

chore(pageserver): improve in-memory layer vectored get #7467

chore(pageserver): improve in-memory layer vectored get #7467

Conversation

skyzh commented Apr 22, 2024 • edited

Problem

Summary of changes

Checklist before requesting a review

Checklist before merging

github-actions bot commented Apr 22, 2024 • edited

2912 tests run: 2778 passed, 0 failed, 134 skipped (full report)

Postgres 15

Code coverage* (full report)

skyzh commented Apr 23, 2024

skyzh commented Apr 23, 2024

VladLazar commented Apr 23, 2024

skyzh commented Apr 24, 2024

VladLazar commented Apr 25, 2024 • edited

skyzh commented Apr 25, 2024

VladLazar commented Apr 25, 2024

skyzh commented Apr 25, 2024 • edited

skyzh commented Apr 22, 2024 •

edited

github-actions bot commented Apr 22, 2024 •

edited

VladLazar commented Apr 25, 2024 •

edited

skyzh commented Apr 25, 2024 •

edited