Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give possibility for restoring DC using mapping sourceDC -> destinationDC #3829

Open
karol-kokoszka opened this issue Apr 29, 2024 · 6 comments
Assignees
Labels
Milestone

Comments

@karol-kokoszka
Copy link
Collaborator

#3871

Right now, there is no option in the Scylla Manager restore task to restore just a single data center (DC) from the backup location. This could lead to problematic situations, particularly when:

Encryption at Rest (EaR) is enabled,
Two DCs use different encryption keys,
Encryption keys are stored in different cloud regions, and
There is only one backup location available.
To address this, we would need to make the encryption keys multi-regional to facilitate the restoration process in such scenarios.

The location flag may not be very intuitive, as the [dc] part defines the destination DC, not the source DC data. We need to discuss during the manager planning to determine if a new flag specifying the source is necessary.
If we can restore just a single DC, then we can restore DC by DC, avoiding the need to create multi-regional keys.

(cc: @tzach)

@tzach
Copy link
Collaborator

tzach commented Apr 30, 2024

To address this, we would need to make the encryption keys multi-regional to facilitate the restoration process in such scenarios.

Agree, but how this is a Scylla Manager issue to fix?

@karol-kokoszka
Copy link
Collaborator Author

We could potentially address the problem by allowing to restore just a single DC from the location bucket.
It's something what we doesn't support at the moment (possibly by a mistake).

@rayakurl
Copy link

rayakurl commented May 7, 2024

@tzach - we need a resolution. for now almost all sct tests are failing since they are multi DC. We will add a couple of pipelines for a single DC + encryption but are are disabling the multi DC jobs as they constantly failing. @mikliapko as discussed, please create a task for the new pipelines. and disable the multi DC ones for now. Thanks

mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue May 14, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue May 14, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue May 14, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue May 14, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue May 14, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue May 14, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue May 14, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue May 14, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
mikliapko added a commit to mikliapko/scylla-cluster-tests that referenced this issue May 14, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this issue May 15, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.
fruch pushed a commit to scylladb/scylla-cluster-tests that referenced this issue May 19, 2024
Since there is an issue with multiDC cluster restore when the EaR is
turned on (scylladb/scylla-manager#3829),
it was decided to temporarily switch the main part of jobs to run on
singleDC cluster. Only one multiDC cluster job is left for enterprise
version 2022 where EaR is not implemented.

(cherry picked from commit 4da831d)
@karol-kokoszka
Copy link
Collaborator Author

grooming notes

The initial idea is to add new flag to the restore CLI, so that it's possible to define the origin DC from the backup location.
Then, data from this DC is going to be restored to specified destination.

@mikliapko SCT will have to be updated to test the scenario with restoring single DC.

@Michal-Leszczynski Michal-Leszczynski added this to the 3.4 milestone Sep 24, 2024
@Michal-Leszczynski Michal-Leszczynski self-assigned this Sep 24, 2024
Michal-Leszczynski added a commit that referenced this issue Oct 8, 2024
Michal-Leszczynski added a commit that referenced this issue Oct 8, 2024
@Michal-Leszczynski
Copy link
Collaborator

The initial idea is to add new flag to the restore CLI, so that it's possible to define the origin DC from the backup location.
Then, data from this DC is going to be restored to specified destination.

After giving it some more thought, I wouldn't recommend adding it in such way.
The need for this feature raised from #3871, where it could be used to restore DC by DC.
This is problematic, as restore task does not only download and load&stream the data, but it also:

  • disables and enables tombstone_gc
  • drops and creates views
  • runs repair

So running many restore tasks, one by one, DC by DC, would result in lots of redundant work.
Also, I could theoretically (not sure about that) lead to data resurrection, as tombstone_gc would be enabled in between DC restorations.
Not to mention, that it would be user responsibility to remember about all DCs from the backup to be restored.

A better idea could be to extend restore with a flag like --dc-mapping (string -> list of strings).
This would allow user to specify which dc from the backup should be restored by which dcs in restored cluster.
It has a few benefits;

@karol-kokoszka karol-kokoszka changed the title Give possibility for restoring just a single DC Give possibility for restoring DC using mapping sourceDC -> destinationDC Oct 10, 2024
@mikliapko
Copy link

A better idea could be to extend restore with a flag like --dc-mapping (string -> list of strings).

@Michal-Leszczynski
When it is ready, could you please then provide an example of input for this flag.
I will switch some of our SCT tests back to run on multiDC cluster.

@karol-kokoszka karol-kokoszka removed this from the 3.4 milestone Oct 21, 2024
@karol-kokoszka karol-kokoszka added this to the 3.5 milestone Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants