controller: limit Reconciler concurrency #7493

jcsp · 2024-04-24T08:44:34Z

Problem

Storage controller memory can spike very high if we have many tenants and they all try to reconcile at the same time.

Summary of changes

Add a CLI arg --reconciler-concurrency, defaulted to 128
Add a semaphore to Service with this many units
In maybe_reconcile_shard, try to acquire semaphore unit. If we can't get one, return a ReconcileWaiter for a future sequence number, and push the TenantShardId onto a channel of delayed IDs.
In process_result, consume from the channel of delayed IDs if there are semaphore units available and call maybe_reconcile_shard again for these delayed shards.

This has been tested in #7475, but will land that PR separately because it contains other changes & needs the test stabilizing. This change is worth merging sooner, because it fixes a practical issue with larger shard counts.

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

github-actions · 2024-04-24T09:28:10Z

2766 tests run: 2645 passed, 0 failed, 121 skipped (full report)

Code coverage* (full report)

functions: 28.1% (6481 of 23054 functions)
lines: 47.0% (46060 of 98092 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
4f025d2 at 2024-04-25T08:06:52.653Z :recycle:}

storage_controller/src/tenant_shard.rs

storage_controller/src/service.rs

storage_controller/src/tenant_shard.rs

jcsp · 2024-04-25T07:21:30Z

Two commits added:

One to improve behavior (the earlier code worked, but under some conditions would end up waiting for a background reconcile to kick the delayed shards)
One for the review comments on naming etc.

storage_controller/src/tenant_shard.rs

…7495) ## Problem Previously, we try to send compute notifications in startup_reconcile before completing that function, with a time limit. Any notifications that don't happen within the time limit result in tenants having their `pending_compute_notification` flag set, which causes them to spawn a Reconciler next time the background reconciler loop runs. This causes two problems: - Spawning a lot of reconcilers after startup caused a spike in memory (this is addressed in #7493) - After #7493, spawning lots of reconcilers will block some other operations, e.g. a tenant creation might fail due to lack of reconciler semaphore units while the controller is busy running all the Reconcilers for its startup compute notifications. When the code was first written, ComputeHook didn't have internal ordering logic to ensure that notifications for a shard were sent in the right order. Since that was added in #7088, we can use it to avoid waiting for notifications to complete in startup_reconcile. Related to: #7460 ## Summary of changes - Add a `notify_background` method to ComputeHook. - Call this from startup_reconcile instead of doing notifications inline - Process completions from `notify_background` in `process_results`, and if a notification failed then set the `pending_compute_notification` flag on the shard. The result is that we will only spawn lots of Reconcilers if the compute notifications _fail_, not just because they take some significant amount of time. Test coverage for this case is in #7475

## Problem Storage controller was observed to have unexpectedly large memory consumption when loaded with many thousands of shards. This was recently fixed: - #7493 ...but we need a general test that the controller is well behaved with thousands of shards. Closes: #7460 Closes: #7463 ## Summary of changes - Add test test_storage_controller_many_tenants to exercise the system's behaviour with a more substantial workload. This test measures memory consumption and reproduces #7460 before the other changes in this PR. - Tweak reconcile_all's return value to make it nonzero if it spawns no reconcilers, but _would_ have spawned some reconcilers if they weren't blocked by the reconcile concurrency limit. This makes the test's reconcile_until_idle behave as expected (i.e. not complete until the system is nice and calm). - Fix an issue where tenant migrations would leave a spurious secondary location when migrated to some location that was not already their secondary (this was an existing low-impact bug that tripped up the test's consistency checks). On the test with 8000 shards, the resident memory per shard is about 20KiB. This is not really per-shard memory: the primary source of memory growth is the number of concurrent network/db clients we create. With 8000 shards, the test takes 125s to run on my workstation.

controller: limit Reconciler concurrency

4070440

jcsp added t/bug Issue Type: Bug c/storage/controller Component: Storage Controller labels Apr 24, 2024

jcsp requested a review from a team as a code owner April 24, 2024 08:44

jcsp requested a review from arssher April 24, 2024 08:44

jcsp mentioned this pull request Apr 24, 2024

storage controller: send startup compute notifications in background #7495

Merged

5 tasks

jcsp requested review from petuhovskiy and VladLazar and removed request for arssher April 24, 2024 09:09

jcsp mentioned this pull request Apr 24, 2024

storage controller: test for large shard counts #7475

Merged

5 tasks

VladLazar reviewed Apr 24, 2024

View reviewed changes

jcsp added 2 commits April 25, 2024 08:07

controller: more reliably prompt execution of delayed reconcile

fb3e1c8

naming/nits

4f025d2

jcsp requested a review from VladLazar April 25, 2024 07:21

VladLazar approved these changes Apr 25, 2024

View reviewed changes

storage_controller/src/tenant_shard.rs Show resolved Hide resolved

jcsp merged commit e8814b6 into main Apr 25, 2024
53 checks passed

jcsp deleted the jcsp/storcon-concurrency-limit branch April 25, 2024 09:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

controller: limit Reconciler concurrency #7493

controller: limit Reconciler concurrency #7493

jcsp commented Apr 24, 2024

github-actions bot commented Apr 24, 2024 •

edited

jcsp commented Apr 25, 2024

controller: limit Reconciler concurrency #7493

controller: limit Reconciler concurrency #7493

Conversation

jcsp commented Apr 24, 2024

Problem

Summary of changes

Checklist before requesting a review

Checklist before merging

github-actions bot commented Apr 24, 2024 • edited

2766 tests run: 2645 passed, 0 failed, 121 skipped (full report)

Code coverage* (full report)

jcsp commented Apr 25, 2024

github-actions bot commented Apr 24, 2024 •

edited