Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedule and replicate bootstrap daily backup #3557

Merged
merged 3 commits into from
Oct 14, 2021

Conversation

alexandre-allard
Copy link
Contributor

Component: salt, backup

Context:
Actually we do backups only when performing an action on the cluster (bootstrap, upgrade, downgrade and restore) and we keep this backup locally.
It means that if we do not do any actions for a long time, the latest backup will be very old if we need it at some point.
Also, since the backup is not copied to other nodes, if we really lose the bootstrap node 💥 we will be unable to restore a new one.

Summary:
Schedule daily backups on the bootstrap node and copy this archive on the master nodes.

Acceptance criteria:

@bert-e

This comment has been minimized.

@bert-e

This comment has been minimized.

@bert-e

This comment has been minimized.

@NicolasT
Copy link
Contributor

NicolasT commented Oct 6, 2021

Instead of making a backup in one place, then replicating the backup(s) to all master nodes, would it make sense instead to create a (local) backup on every master node? What are these nodes missing in order to generate such backup? Basically: which data is currently not present on non-bootstrap master nodes for them to be able to become bootstrap? I guess mainly CA things?

Could we instead work on that, which would also further lead towards 'Bootstrap HA'?

@alexandre-allard alexandre-allard force-pushed the improvement/schedule-bootstrap-backup branch from d1160f7 to 69bdfe8 Compare October 6, 2021 08:42
@bert-e

This comment has been minimized.

@TeddyAndrieux
Copy link
Collaborator

Instead of making a backup in one place, then replicating the backup(s) to all master nodes, would it make sense instead to create a (local) backup on every master node? What are these nodes missing in order to generate such backup? Basically: which data is currently not present on non-bootstrap master nodes for them to be able to become bootstrap? I guess mainly CA things?

Could we instead work on that, which would also further lead towards 'Bootstrap HA'?

We also miss the configuration files (/etc/metalk8s/{bootstrap,solutions}.yaml).
I agree it could be a better approach to start thinking about a way to have everything available on "some nodes" but it's not as simple as for CA, that shouldn't change really often.

🤔
But maybe the way to go should be a Job (or a script) running on the master nodes that generate the backup, that today retrieve some needed information from the bootstrap node (so it could be, today, a Job that needs the bootstrap node to be up even if it generates the backup locally on every master node)

@alexandre-allard
Copy link
Contributor Author

@NicolasT Yes, it is mainly the certificates and metalk8s/solutions configuration files.
We wanted to work on that, this is supposed to be a first iteration to have this feature quickly and gives us time to think on how to achieve that, with a proper design.

@alexandre-allard alexandre-allard force-pushed the improvement/schedule-bootstrap-backup branch from 69bdfe8 to 40214fb Compare October 6, 2021 14:42
@bert-e

This comment has been minimized.

@alexandre-allard alexandre-allard force-pushed the improvement/schedule-bootstrap-backup branch 4 times, most recently from 2b1b3a6 to 1547888 Compare October 8, 2021 08:50
@alexandre-allard

This comment has been minimized.

@alexandre-allard alexandre-allard marked this pull request as ready for review October 8, 2021 08:50
@alexandre-allard alexandre-allard requested a review from a team as a code owner October 8, 2021 08:50
@bert-e

This comment has been minimized.

@bert-e

This comment has been minimized.

@bert-e

This comment has been minimized.

Copy link
Collaborator

@TeddyAndrieux TeddyAndrieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments
I think you should add a changelog entry and btw I'm not sure it should be in 2.10

@@ -23,6 +23,10 @@ mine_functions:
- mine_function: hashutil.base64_encodefile
- /etc/metalk8s/pki/nginx-ingress/ca.crt

backup_ca_b64:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to me to have a "backup" CA but ... I have no better idea in mind, maybe just rename everywhere so that we know that it's "backup_server" CA

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw wondering if we really need a CA for this backup server ? shouldn't we use just a self-signed server certificate ? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be sufficient, I don't know.

salt/_modules/metalk8s.py Outdated Show resolved Hide resolved
salt/_modules/metalk8s.py Outdated Show resolved Hide resolved
salt/metalk8s/backup/certs/ca.sls Outdated Show resolved Hide resolved
scripts/backup.sh.in Outdated Show resolved Hide resolved
@alexandre-allard alexandre-allard force-pushed the improvement/schedule-bootstrap-backup branch from 1547888 to 3fa4a32 Compare October 11, 2021 09:08
@bert-e

This comment has been minimized.

@alexandre-allard
Copy link
Contributor Author

/reset

@bert-e

This comment has been minimized.

@bert-e

This comment has been minimized.

@alexandre-allard alexandre-allard force-pushed the improvement/schedule-bootstrap-backup branch 2 times, most recently from af4a836 to f3366f6 Compare October 11, 2021 13:44
@bert-e

This comment has been minimized.

@alexandre-allard
Copy link
Contributor Author

/approve

@bert-e

This comment has been minimized.

@alexandre-allard alexandre-allard force-pushed the improvement/schedule-bootstrap-backup branch from f3366f6 to e9e78c5 Compare October 12, 2021 14:55
@bert-e

This comment has been minimized.

@alexandre-allard

This comment has been minimized.

@bert-e

This comment has been minimized.

@bert-e

This comment has been minimized.

@bert-e

This comment has been minimized.

@alexandre-allard alexandre-allard force-pushed the improvement/schedule-bootstrap-backup branch from e9e78c5 to 746c52b Compare October 14, 2021 12:36
@bert-e

This comment has been minimized.

Copy link
Collaborator

@TeddyAndrieux TeddyAndrieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go for it

@alexandre-allard

This comment has been minimized.

@bert-e

This comment has been minimized.

@bert-e

This comment has been minimized.

@bert-e
Copy link
Contributor

bert-e commented Oct 14, 2021

Build failed

The build for commit did not succeed in branch w/2.11/improvement/schedule-bootstrap-backup.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Oct 14, 2021

Build failed

The build for commit did not succeed in branch improvement/schedule-bootstrap-backup.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Oct 14, 2021

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/2.10

  • ✔️ development/2.11

The following branches will NOT be impacted:

  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Oct 14, 2021

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/2.10

  • ✔️ development/2.11

The following branches have NOT changed:

  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

Please check the status of the associated issue None.

Goodbye alexandre-allard-scality.

@bert-e bert-e merged commit 746c52b into development/2.10 Oct 14, 2021
@bert-e bert-e deleted the improvement/schedule-bootstrap-backup branch October 14, 2021 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants