Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: seamless tenant migration between pageservers #5199

Open
8 of 10 tasks
jcsp opened this issue Sep 5, 2023 · 4 comments
Open
8 of 10 tasks

Epic: seamless tenant migration between pageservers #5199

jcsp opened this issue Sep 5, 2023 · 4 comments
Assignees
Labels
t/Epic Issue type: Epic

Comments

@jcsp
Copy link
Contributor

jcsp commented Sep 5, 2023

Motivation

This follows on from #5050. Whereas #5050 makes it safe to attach a tenant to a new pageserver without detaching the old one, this epic will make a tenant migration seamless from postgres's point of view.

See RFC: #5029

This ticket is the pageserver side of the work: control plane changes are described elsewhere:

DoD

For failover (i.e. unplanned migration), only most recent LSNs become unavailable for reads. They become available again in a time bounded by the time taken for the new pageserver to download the writes since the remote_consistent_lsn on the old pageserver.

For planned migrations, postgres experiences no loss of read availability whatsoever.

Tasks

  1. c/storage/pageserver t/feature
  2. a/tech_debt c/storage/pageserver
    jcsp
  3. c/storage/controller t/feature
  4. a/test c/storage/pageserver
@jcsp jcsp added the t/Epic Issue type: Epic label Sep 5, 2023
@jcsp jcsp changed the title Epic: seamless tenant migration Epic: seamless tenant migration between pageservers Sep 5, 2023
@jcsp jcsp self-assigned this Sep 11, 2023
@koivunej
Copy link
Contributor

Before we can enable generation usage and these, we will need to have a way to restrict consumption_metrics.rs from a pageserver with previous generation number.

@jcsp
Copy link
Contributor Author

jcsp commented Oct 6, 2023

This is on pause to avoid merging more code until we can deploy existing changes, which will require https://github.com/neondatabase/cloud/issues/6600

@NanoBjorn
Copy link
Contributor

NanoBjorn commented Jan 25, 2024

@jcsp do you have an issue somewhere on re-attach and maybe validate modifications? As far as I understood from the RFC, we should return mode in /re-attach response, but generation number there should also become optional for the secondaries then.

And it is also unclear what should we do about /validate? If it is called in the middle of migration by old primary, when new primary is already AttachedMulti, but compute not yet switched, then pageserver will invalidate tenant, what will happen then with not yet switched compute?

@jcsp
Copy link
Contributor Author

jcsp commented Jan 26, 2024

I just responded to the similar ping on https://github.com/neondatabase/cloud/issues/9614#issuecomment-1911064489 . There's no change to validate.

Including attachment states and/or secondary tenants in /re-attach responses is not essential for correctness here -- that's more of an optimization to enable tenants omitted in /re-attach to smoothly transition to secondary rather than detaching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t/Epic Issue type: Epic
Projects
None yet
Development

No branches or pull requests

3 participants