Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(pageserver): improve in-memory layer vectored get #7467

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

skyzh
Copy link
Member

@skyzh skyzh commented Apr 22, 2024

Problem

previously in #7375, we observed that for in-memory layers, we will need to iterate every key in the key space in order to get the result. The operation can be more efficient if we use BTreeMap as the in-memory layer representation, even if we are doing vectored get in a dense keyspace. Imagine a case that the in-memory layer covers a very little part of the keyspace, and most of the keys need to be found in lower layers. Using a BTreeMap can significantly reduce probes for nonexistent keys.

Summary of changes

  • Use BTreeMap as in-memory layer representation.
  • Optimize the vectored get flow to utilize the range scan functionality of BTreeMap.

Alternatively, I can proceed with having a separate read path for the scan interface and leave this piece of code untouched. Does not seem easy to efficiently implement scan without BTreeMap.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@skyzh skyzh requested a review from VladLazar April 22, 2024 19:23
@skyzh skyzh requested a review from a team as a code owner April 22, 2024 19:23
@skyzh skyzh changed the title chore(pageserver): make in-memory layer vectored get more efficient chore(pageserver): improve in-memory layer vectored get Apr 22, 2024
Copy link

github-actions bot commented Apr 22, 2024

2912 tests run: 2778 passed, 0 failed, 134 skipped (full report)


Flaky tests (1)

Postgres 15

  • test_partial_evict_tenant[relative_spare]: release

Code coverage* (full report)

  • functions: 28.1% (6552 of 23322 functions)
  • lines: 46.7% (46278 of 99183 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
e401025 at 2024-04-29T18:10:16.732Z :recycle:

@skyzh skyzh mentioned this pull request Apr 22, 2024
5 tasks
@skyzh
Copy link
Member Author

skyzh commented Apr 23, 2024

...and another benefit here is that it saves 50% memory when freezing the in-memory layer. The current implementation gets all keys out of the index and sort it. Now we can directly write it without sorting b/c BTreeMap already sorts it. 0298e9e

@skyzh
Copy link
Member Author

skyzh commented Apr 23, 2024

benchmark result:

baseline

test_bulk_insert[neon-release-pg14].insert: 5.188 s
test_bulk_insert[neon-release-pg14].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg14].peak_mem: 379,360 MB
test_bulk_insert[neon-release-pg14].size: 0 MB
test_bulk_insert[neon-release-pg14].data_uploaded: 498 MB
test_bulk_insert[neon-release-pg14].num_files_uploaded: 3
test_bulk_insert[neon-release-pg14].wal_written: 345 MB
test_bulk_insert[neon-release-pg14].wal_recovery: 6.429 s
test_bulk_insert[neon-release-pg15].insert: 5.280 s
test_bulk_insert[neon-release-pg15].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg15].peak_mem: 384,768 MB
test_bulk_insert[neon-release-pg15].size: 0 MB
test_bulk_insert[neon-release-pg15].data_uploaded: 495 MB
test_bulk_insert[neon-release-pg15].num_files_uploaded: 3
test_bulk_insert[neon-release-pg15].wal_written: 345 MB
test_bulk_insert[neon-release-pg15].wal_recovery: 6.535 s
test_bulk_insert[neon-release-pg16].insert: 5.147 s
test_bulk_insert[neon-release-pg16].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg16].peak_mem: 377,120 MB
test_bulk_insert[neon-release-pg16].size: 0 MB
test_bulk_insert[neon-release-pg16].data_uploaded: 498 MB
test_bulk_insert[neon-release-pg16].num_files_uploaded: 3
test_bulk_insert[neon-release-pg16].wal_written: 346 MB
test_bulk_insert[neon-release-pg16].wal_recovery: 6.437 s

this commit

test_bulk_insert[neon-release-pg14].insert: 5.725 s
test_bulk_insert[neon-release-pg14].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg14].peak_mem: 382,400 MB
test_bulk_insert[neon-release-pg14].size: 0 MB
test_bulk_insert[neon-release-pg14].data_uploaded: 498 MB
test_bulk_insert[neon-release-pg14].num_files_uploaded: 3
test_bulk_insert[neon-release-pg14].wal_written: 345 MB
test_bulk_insert[neon-release-pg14].wal_recovery: 6.376 s
test_bulk_insert[neon-release-pg15].insert: 5.343 s
test_bulk_insert[neon-release-pg15].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg15].peak_mem: 373,008 MB
test_bulk_insert[neon-release-pg15].size: 0 MB
test_bulk_insert[neon-release-pg15].data_uploaded: 495 MB
test_bulk_insert[neon-release-pg15].num_files_uploaded: 3
test_bulk_insert[neon-release-pg15].wal_written: 345 MB
test_bulk_insert[neon-release-pg15].wal_recovery: 6.223 s
test_bulk_insert[neon-release-pg16].insert: 5.510 s
test_bulk_insert[neon-release-pg16].pageserver_writes: 0 MB
test_bulk_insert[neon-release-pg16].peak_mem: 384,576 MB
test_bulk_insert[neon-release-pg16].size: 0 MB
test_bulk_insert[neon-release-pg16].data_uploaded: 498 MB
test_bulk_insert[neon-release-pg16].num_files_uploaded: 3
test_bulk_insert[neon-release-pg16].wal_written: 346 MB
test_bulk_insert[neon-release-pg16].wal_recovery: 6.260 s

note that peak_mem should be divided by 1024 due to the difference of units on macOS/Linux. There is some slowdown but the difference is not very significant. Also there are no pageserver_writes so the benefit of no sorting before freezing is not observed in the benchmark.

@VladLazar
Copy link
Contributor

There is some slowdown but the difference is not very significant. Also there are no pageserver_writes so the benefit of no sorting before freezing is not observed in the benchmark.

Hard to tell from this one test. Might be worth asking for Christian's opinion as he's been staring at these number lately.
I'd personally be fine with merging this if you have a plan for what to monitor in prod.

@skyzh skyzh requested a review from problame April 23, 2024 17:06
@problame problame removed their request for review April 23, 2024 17:07
@skyzh skyzh added the run-benchmarks Indicates to the CI that benchmarks should be run for PR marked with this label label Apr 24, 2024
@skyzh
Copy link
Member Author

skyzh commented Apr 24, 2024

@VladLazar thinking about test plan, what about having a histogram metrics in prod of the latency of accessing each type of layer? i.e., for each of the get page at lsn request, we record the breakdown of time spent on open layer, N in-memory layer, and M on-disk/remote layers, these are three numbers. I can have a pull request tomorrow and get the metrics merged into the next release, so that we can have this pull request merged in the release after the next one.

@skyzh skyzh force-pushed the skyzh/btreemap-inmem-layer branch from 75866ae to 42f3586 Compare April 24, 2024 20:07
@VladLazar
Copy link
Contributor

VladLazar commented Apr 25, 2024

@VladLazar thinking about test plan, what about having a histogram metrics in prod of the latency of accessing each type of layer? i.e., for each of the get page at lsn request, we record the breakdown of time spent on open layer, N in-memory layer, and M on-disk/remote layers, these are three numbers. I can have a pull request tomorrow and get the metrics merged into the next release, so that we can have this pull request merged in the release after the next one.

My (perhaps somewhat theoretical) concern is "how does this change impact ingest performance? In my previous comment I was asking about what you think we should monitor to ensure there's no significant regression in ingest perf. Let's chat about it if anything is unclear.

I agree with you that this change should have a positive impact for the read path. The metrics you are proposing for the read path would be nice, but perhaps intrusive/expensive.

@skyzh
Copy link
Member Author

skyzh commented Apr 25, 2024

@VladLazar for the write path, the code path that calls into in-memory layer put_value is Timeline::put_batch. I can think of two places where I can monitor:

  • wal_ingest series of metrics, i.e., pageserver_wal_ingest_records_committed, which already exists.
  • add a new timer histogram for DatadirModification::commit that measures commit speed.

I can submit a pull request later today to add the second metrics.

@VladLazar
Copy link
Contributor

@VladLazar for the write path, the code path that calls into in-memory layer put_value is Timeline::put_batch. I can think of two places where I can monitor:

* wal_ingest series of metrics, i.e., pageserver_wal_ingest_records_committed, which already exists.

* add a new timer histogram for DatadirModification::commit that measures commit speed.

I can submit a pull request later today to add the second metrics.

Makes sense. We can craft a query which takes into account pageserver_wal_ingest_records_committed, pageserver_wal_ingest_records_received and pageserver_wal_ingest_records_filtered to validate to see what happens to the ingest rate.

@skyzh
Copy link
Member Author

skyzh commented Apr 25, 2024

plan to merge after #7515 gets released

skyzh added a commit that referenced this pull request Apr 25, 2024
As a follow-up on #7467, also
measure the ingestion operation speed.

Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh skyzh force-pushed the skyzh/btreemap-inmem-layer branch from 39079a2 to 41e043d Compare April 26, 2024 17:41
@skyzh skyzh enabled auto-merge (squash) April 29, 2024 16:34
Signed-off-by: Alex Chi Z <chi@neon.tech>

fix ce

Signed-off-by: Alex Chi Z <chi@neon.tech>

no sort for freezing

Signed-off-by: Alex Chi Z <chi@neon.tech>

fix

Signed-off-by: Alex Chi Z <chi@neon.tech>

fix clippy

Signed-off-by: Alex Chi Z <chi@neon.tech>

remove sort comments

Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh skyzh force-pushed the skyzh/btreemap-inmem-layer branch from 41e043d to e401025 Compare April 29, 2024 16:34
@skyzh skyzh merged commit 11945e6 into main Apr 29, 2024
51 of 52 checks passed
@skyzh skyzh deleted the skyzh/btreemap-inmem-layer branch April 29, 2024 17:16
skyzh added a commit that referenced this pull request May 21, 2024
The metrics was added in #7515
to observe if #7467 introduces
any perf regressions.

The change was deployed on 5/7 and no changes are observed in the
metrics. So it's safe to remove the metrics now.

Signed-off-by: Alex Chi Z <chi@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmarks Indicates to the CI that benchmarks should be run for PR marked with this label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants