concurrency-limit initial logical size calculation #5955

problame · 2023-11-28T15:19:39Z

Before this patch, there was no concurrency limit on initial logical
size computations.

In an experiment with a PS with 20k tenants, 1 timeline each,
all tenants inactive in SKs / not present in storage broker,
all logical size calculations are spawned by MetricsCollection,
i.e., consumption metrics worker.

Before this patch, these timelines would all do their initial logical
size calculation in parallel, leading to extreme thrashing in page cache
and virtual file cache.

With this patch, the virtual file cache thrashing is reduced
signficantly (from 80k open-system-calls/second to ~500
open-system-calls/second during loading).

This patch uses the existing background tasks semaphore to limit
concurrency, which generally is the right call for background activity.
However, due to logical size's involvement in PageserverFeedback towards
safekeepers, I think we need a priority-boosting mechanism, e.g., if
we're still calculating but walreceiver is actively asking, skip the
semaphore. That's fairly easy to implement, but, want to some feedback
on the general idea first before implementing it.
See also the FIXME in the block comment added in this commit.

NB: when evaluating, keep in mind that consumption metrics worker
persists its interval across restarts; delete the state file on disk
to get predictable (and I believe worst-case in terms of concurrency
during PS restart) behavior.

Before this patch, there was no concurrency limit on initial logical size computations. In an experiment with a PS with 20k tenants, 1 timeline each, all tenants inactive in SKs / not present in storage broker, all logical size calculations are spawned by MetricsCollection, i.e., consumption metrics worker. Before this patch, these timelines would all do their initial logical size calculation in parallel, leading to extreme thrashing in page cache and virtual file cache. With this patch, the virtual file cache thrashing is reduced signficantly (from 80k `open`-system-calls/second to ~500 `open`-system-calls/second during loading). This patch uses the existing background tasks semaphore to limit concurrency, which generally is the right call for background activity. However, due to logical size's involvement in PageserverFeedback towards safekeepers, I think we need a priority-boosting mechanism, e.g., if we're still calculating but walreceiver is actively asking, skip the semaphore. That's fairly easy to implement, but, want to some feedback on the general idea first before implementing it. See also the FIXME in the block comment added in this commit. NB: when evaluating, keep in mind that consumption metrics worker persists its interval across restarts; delete the state file on disk to get predictable (and I believe worst-case in terms of concurrency during PS restart) behavior.

koivunej · 2023-11-28T15:28:25Z

pageserver/src/tenant/timeline.rs

+                //
+                // FIXME: with the current code, walreceiver requests would also hit this semaphore
+                // and get queued behind other background operations. That's bad because walreceiver_connection
+                // will push the not-precise value as `current_timeline_size` in the `PageserverFeedback`


I think it would be zero, there is no more not-precise-value. If I recall correctly this will lead to no throttling, or not registering the new value if a higher value is already calculated.

koivunej · 2023-11-28T15:30:50Z

pageserver/src/tenant/timeline.rs

+                // Example query to show different causes of initial size calculation spawning:
+                //
+                // https://neonprod.grafana.net/explore?panes=%7B%22wSx%22:%7B%22datasource%22:%22grafanacloud-logs%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22sum%20by%20%28task_kind%29%20%28count_over_time%28%7Bneon_service%3D%5C%22pageserver%5C%22,%20neon_region%3D%5C%22us-west-2%5C%22%7D%20%7C%3D%20%60logical%20size%20computation%20from%20context%20of%20task%20kind%60%20%7C%20regexp%20%60logical%20size%20computation%20from%20context%20of%20task%20kind%20%28%3FP%3Ctask_kind%3E.%2A%29%60%20%5B1m%5D%29%29%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22grafanacloud-logs%22%7D,%22editorMode%22:%22code%22,%22step%22:%221m%22%7D%5D,%22range%22:%7B%22from%22:%221700637500615%22,%22to%22:%221700639648743%22%7D%7D%7D&schemaVersion=1&orgId=1


Would not include this in, it's of no use here.

Suggested change

// Example query to show different causes of initial size calculation spawning:

//

// https://neonprod.grafana.net/explore?panes=%7B%22wSx%22:%7B%22datasource%22:%22grafanacloud-logs%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22sum%20by%20%28task_kind%29%20%28count_over_time%28%7Bneon_service%3D%5C%22pageserver%5C%22,%20neon_region%3D%5C%22us-west-2%5C%22%7D%20%7C%3D%20%60logical%20size%20computation%20from%20context%20of%20task%20kind%60%20%7C%20regexp%20%60logical%20size%20computation%20from%20context%20of%20task%20kind%20%28%3FP%3Ctask_kind%3E.%2A%29%60%20%5B1m%5D%29%29%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22grafanacloud-logs%22%7D,%22editorMode%22:%22code%22,%22step%22:%221m%22%7D%5D,%22range%22:%7B%22from%22:%221700637500615%22,%22to%22:%221700639648743%22%7D%7D%7D&schemaVersion=1&orgId=1

github-actions · 2023-11-28T16:33:52Z

2394 tests run: 2298 passed, 1 failed, 95 skipped (full report)

Failures on Postgres 15

test_branching_with_pgbench[cascade-1-10]: debug

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_branching_with_pgbench[debug-pg15-cascade-1-10]"

Flaky tests (5)

Postgres 16

test_crafted_wal_end[last_wal_record_xlog_switch_ends_on_page_boundary]: debug
test_emergency_mode: release
test_pitr_gc: debug

Postgres 14

test_gc_cutoff: debug
test_remote_storage_upload_queue_retries: debug

Test coverage report is not available

_{The comment gets automatically updated with the latest test results
85445cd at 2023-11-28T16:33:51.260Z :recycle:}

These will help us answer questions such as: - when & at what do calculations get started after PS restart? - how often is the api to get current incrementally-computed logical size called, and does it return Exact vs Approximate? I'd also be interested in a histogram of how much wall clock time size calculations take, but, I don't know good bucket sizes, and, logging it would introduce yet another per-timeline log message during startup; don't think that's worth it just yet. Context - https://neondb.slack.com/archives/C033RQ5SPDH/p1701197668789769 - #5962 - #5963 - #5955 - neondatabase/cloud#7408

problame · 2023-11-30T15:53:16Z

v2 in #6000

These will help us answer questions such as: - when & at what do calculations get started after PS restart? - how often is the api to get current incrementally-computed logical size called, and does it return Exact vs Approximate? I'd also be interested in a histogram of how much wall clock time size calculations take, but, I don't know good bucket sizes, and, logging it would introduce yet another per-timeline log message during startup; don't think that's worth it just yet. Context - https://neondb.slack.com/archives/C033RQ5SPDH/p1701197668789769 - #5962 - #5963 - #5955 - neondatabase/cloud#7408

This was referenced Nov 28, 2023

Epic: getpage@lsn benchmark #5771

Open

revert "revert recent VirtualFile asyncification changes #5291" #5479

Closed

koivunej reviewed Nov 28, 2023

View reviewed changes

This was referenced Nov 28, 2023

WAL records that are no-ops cause timelines to remain active #5962

Open

logical size limit is broken during PS restart #5963

Open

problame self-assigned this Nov 29, 2023

problame mentioned this pull request Nov 29, 2023

page_cache: find_victim: prevent starvation #5483

Closed

problame mentioned this pull request Nov 30, 2023

initial logical size calculation: add a bunch of metrics #5995

Merged

problame mentioned this pull request Nov 30, 2023

concurrency-limit initial logical size calculation [v2] #6000

Merged

problame closed this Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concurrency-limit initial logical size calculation #5955

concurrency-limit initial logical size calculation #5955

problame commented Nov 28, 2023

koivunej Nov 28, 2023 •

edited

Loading

koivunej Nov 28, 2023

github-actions bot commented Nov 28, 2023

Postgres 16

Postgres 14

problame commented Nov 30, 2023

	// Example query to show different causes of initial size calculation spawning:
	//
	// https://neonprod.grafana.net/explore?panes=%7B%22wSx%22:%7B%22datasource%22:%22grafanacloud-logs%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22sum%20by%20%28task_kind%29%20%28count_over_time%28%7Bneon_service%3D%5C%22pageserver%5C%22,%20neon_region%3D%5C%22us-west-2%5C%22%7D%20%7C%3D%20%60logical%20size%20computation%20from%20context%20of%20task%20kind%60%20%7C%20regexp%20%60logical%20size%20computation%20from%20context%20of%20task%20kind%20%28%3FP%3Ctask_kind%3E.%2A%29%60%20%5B1m%5D%29%29%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22grafanacloud-logs%22%7D,%22editorMode%22:%22code%22,%22step%22:%221m%22%7D%5D,%22range%22:%7B%22from%22:%221700637500615%22,%22to%22:%221700639648743%22%7D%7D%7D&schemaVersion=1&orgId=1

concurrency-limit initial logical size calculation #5955

concurrency-limit initial logical size calculation #5955

Conversation

problame commented Nov 28, 2023

koivunej Nov 28, 2023 • edited Loading

Choose a reason for hiding this comment

koivunej Nov 28, 2023

Choose a reason for hiding this comment

github-actions bot commented Nov 28, 2023

2394 tests run: 2298 passed, 1 failed, 95 skipped (full report)

Failures on Postgres 15

Postgres 16

Postgres 14

Test coverage report is not available

problame commented Nov 30, 2023

koivunej Nov 28, 2023 •

edited

Loading