Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage controller: fix heatmaps getting disabled during shard split #8197

Merged
merged 2 commits into from
Jun 28, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Jun 28, 2024

Problem

At the start of do_tenant_shard_split, we drop any secondary location for the parent shards. The reconciler uses presence of secondary locations as a condition for enabling heatmaps.

On the pageserver, child shards inherit their configuration from parents, but the storage controller assumes the child's ObservedState is the same as the parent's config from the prepare phase. The result is that some child shards end up with inaccurate ObservedState, and until something next migrates or restarts, those tenant shards aren't uploading heatmaps, so their secondary locations are downloading everything that was resident at the moment of the split (including ancestor layers which are often cleaned up shortly after the split).

Closes: #8189

Summary of changes

  • Use PlacementPolicy to control enablement of heatmap upload, rather than the literal presence of secondaries in IntentState: this way we avoid switching them off during shard split
  • test: during tenant split test, assert that the child shards have heatmap uploads enabled.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/bug Issue Type: Bug c/storage/controller Component: Storage Controller labels Jun 28, 2024
@jcsp jcsp requested a review from VladLazar June 28, 2024 10:50
Copy link

github-actions bot commented Jun 28, 2024

2940 tests run: 2823 passed, 0 failed, 117 skipped (full report)


Flaky tests (2)

Postgres 16

Postgres 15

  • test_timeline_size_quota_on_startup: release

Code coverage* (full report)

  • functions: 32.7% (6910 of 21115 functions)
  • lines: 50.1% (54154 of 108067 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
13d7f11 at 2024-06-28T17:30:46.651Z :recycle:

@jcsp jcsp force-pushed the jcsp/issue-8189-split-heatmaps branch from 12daac9 to 1f7d48c Compare June 28, 2024 11:39
@jcsp jcsp force-pushed the jcsp/issue-8189-split-heatmaps branch from 1f7d48c to a9878b5 Compare June 28, 2024 15:33
@jcsp jcsp force-pushed the jcsp/issue-8189-split-heatmaps branch from a9878b5 to 13d7f11 Compare June 28, 2024 16:45
@jcsp jcsp marked this pull request as ready for review June 28, 2024 16:45
@jcsp jcsp requested a review from a team as a code owner June 28, 2024 16:45
Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the issue was that the observed state was HA aware, but the intent wasn't. This made the tenant shard dirty() and triggered a post shard split reconcile which unset the HA tenant config stuff.


Looks good - nice find!

@jcsp jcsp merged commit b8bbaaf into main Jun 28, 2024
59 of 60 checks passed
@jcsp jcsp deleted the jcsp/issue-8189-split-heatmaps branch June 28, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/controller Component: Storage Controller t/bug Issue Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage controller: after shard split, some shards end up with heatmap uploads disabled.
2 participants