New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup/migrate cluster? #24229

Open
ariscn opened this Issue Apr 14, 2016 · 37 comments

Comments

@ariscn
Copy link

ariscn commented Apr 14, 2016

I've searched the docs, but can't find an answer to an important question: how do I backup and migrate or restore a Kubernetes cluster?

There's quite a bit of gray around what qualifies as "the cluster" - running containers, node IP addresses, etc. But at the very least, I'd expect a way to dump the configuration of all high level resources (services, replication controllers, deployments, etc).

Does this exist? If so, is there an existing issue tracking progress? If this is a bad idea, what's the Right Way (tm) to have an insurance policy for disaster recovery in the event of catastrophic cluster failure?

@zhouhaibing089

This comment has been minimized.

Copy link
Contributor

zhouhaibing089 commented Apr 14, 2016

I am thinking backup the etcd storage.

@goblain

This comment has been minimized.

Copy link
Contributor

goblain commented Apr 14, 2016

@zhouhaibing089 : this is not really an option as too much changes between separate deployments of kubernetes (like ie. IP addresses).

@ariscn : to recover from a catastrophic failure you should be able to quickly provision a new, clean k8s cluster, and then be able to restore services you deployed on the cluster. The biggest issue is obviously persistent data (PVs) but in catastrophic sense your disaster recovery plan should include restoration of data from backups outside of k8s cluster. Potentially you might want to snapshot some PVs and be able to provision new volumes for new cluster from these snapshots, but that is somthing I don't have experience with. All in all I think that restoring services on new cluster is not something you can expect from k8s, you need your own tooling around this as no one knows what you deploy to cluster and how it behaves in case of failure.

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Apr 14, 2016

IMO, you should really be keeping your config in a version control system. You want to store the objects before the cluster applies defaults; exporting from the cluster will include the cluster-applied defaults.

Definitely backing up etcd data is a really good idea (GKE does this for you!) but not for the purpose of exporting to a new cluster; all sorts of things (endpoints, node names, etc) are different between clusters.

Migrating between clusters is a use case that the "ubernetes" effort will eventually address or make easier.

@mml @quinton-hoole anything to add?

@pikeas

This comment has been minimized.

Copy link

pikeas commented Apr 15, 2016

If storing config in VCS is best practices, this should be documented, and probably emphasized.

I'll go further and propose that Kubernetes should do this for me - the master receives all API calls to add high-level resources, it seems in-scope for Kubernetes to automatically log configuration changes to an attached volume / log file for playback / VCS / etc.

It's like a database server that doesn't do snapshots - from an operational perspective, that's a hair-on-fire problem.

@colhom

This comment has been minimized.

Copy link

colhom commented Apr 21, 2016

Similar to #21582

@quinton-hoole quinton-hoole added this to the next-candidate milestone Apr 21, 2016

@quinton-hoole quinton-hoole self-assigned this Apr 21, 2016

@quinton-hoole

This comment has been minimized.

Copy link
Member

quinton-hoole commented Apr 21, 2016

The approach suggested by @goblain in #24229 (comment) is a sound one right now.

The ubernetes project docs/proposals/federation.md will help you in the future. We're busy working on that right now.

@ashw7n

This comment has been minimized.

Copy link
Contributor

ashw7n commented Aug 8, 2016

@lavalamp do you know if the mechanisms GKE does for backup/restore for etcd is public? This is something we (eBay) is interested in and are ready to contribute if the GKE solve is not open source. Any suggestions on this front?

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Aug 8, 2016

@ashw7n It is not public, so go ahead.

I will note that backup/restore and cluster replication / service turn-up in a new cluster are separate problems and it's not plausible that one mechanism could solve both. Backing up a cluster (etcd data dir) is only useful if that cluster somehow ends up with a corrupt etcd. Restoring that backup in a different cluster is going to do very unpredictable things, as the nodes, routes, IP addresses, load balancers, etc are all different between clusters.

I believe there's an "export" thingy in the API that Red Hat added to solve this latter case in OpenShift.

@colhom

This comment has been minimized.

Copy link

colhom commented Aug 8, 2016

@lavalamp I believe you're talking about kubectl get --export? I find said thingy very useful for doing production cluster migrations, though it doesn't do ** quite ** what you'd expect w.r.t stripping non-portable api objects and metadata (#21582), leaving some of that up to the operator. I feel it could do more automation around the common case of migrating a cluster's workload.

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Aug 8, 2016

Yeah, that's what I'm talking about.

If it were me, I would keep all config in source control and write
something that applied it to a cluster, so there'd be nothing to strip out.
Honestly I think it's a little crazy to not to keep your config in source
control, but I understand it's not possible for everything...

On Mon, Aug 8, 2016 at 3:48 PM, colhom notifications@github.com wrote:

@lavalamp https://github.com/lavalamp I believe you're talking about kubectl
get --export? I find said thingy very useful for doing production cluster
migrations, though it doesn't do ** quite ** what you'd expect w.r.t
stripping non-portable api objects and metadata (#21582
#21582), leaving some of
that up to the operator. I feel it could do more automation around the
common case of migrating a cluster's workload.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24229 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAnglj5g9AsukNIaJsuufBOBoy2Er0Tlks5qd7IrgaJpZM4IG8NF
.

@colhom

This comment has been minimized.

Copy link

colhom commented Aug 9, 2016

A huge boon of Kubernetes is generic handling of deployed applications. An operator can take a cluster with 100s of different applications, each deployed differently and by a different team, and theoretically migrate/backup all those applications in an entirely homogeneous way, without interacting w/ deployment tooling for each.

The operator can even be open ended about what applications are being touched: (for instance) select all namespaces withmigrate_v1.4='true', export to file and re-create on the new v1.4 cluster.

This statement has caveats of course, but generally I think it's a good goal.

@rca

This comment has been minimized.

Copy link

rca commented Aug 14, 2016

+1 for kubectl get --export; thanks for the tip!

@lavalamp I think i'm pretty close with respect to keeping things in version control, but am missing a way to have what's in version control define what's in the cluster. For example, say I create a service foo and create a services/foo.yml file. I then apply it to the cluster with kubectl apply -f services/foo.yml. When I no longer want the service, I remove it from the cluster kubectl delete service foo, but services/foo.yml remains in VC. Furthermore, I've found myself forgetting to commit assets that are running in the cluster into VC. Long story short, the two -- VC and kubernetes -- are detached.

I know I can be more diligent about certain things, however, I'm looking for a workflow to ensure these are always in sync. Any pointers?

The other thing that's completely out of sync are secrets. I've intentionally kept secrets out of VC, but that's just one more thing that's just floating in kubernetes and not persisted anywhere else. Suggestions on managing these would also be great.

Thanks.

@lavalamp

This comment has been minimized.

Copy link
Member

lavalamp commented Aug 15, 2016

Yeah, I don't know of anything off the top of my head here. I think there's
room for a github/k8s cluster syncing service, it shouldn't be that hard to
make (famous last words), at least not a very basic one.

I would always commit your changes (including a deletion) before executing
the kubectl command.

Secret management is not something I am expert enough in to have an opinion
about :)

On Sat, Aug 13, 2016 at 5:43 PM, Roberto Aguilar notifications@github.com
wrote:

+1 for kubectl get --export; thanks for the tip!

@lavalamp https://github.com/lavalamp I think i'm pretty close with
respect to keeping things in version control, but am missing a way to have
what's in version control define what's in the cluster. For example, say I
create a service foo and create a services/foo.yml file. I then apply it
to the cluster with kubectl apply -f services/foo.yml. When I no longer
want the service, I remove it from the cluster kubectl delete service foo,
but services/foo.yml remains in VC. Furthermore, I've found myself
forgetting to commit assets that are running in the cluster into VC. Long
story short, the two -- VC and kubernetes -- are detached.

I know I can be more diligent about certain things, however, I'm looking
for a workflow to ensure these are always in sync. Any pointers?

The other thing that's completely out of sync are secrets. I've
intentionally kept secrets out of VC, but that's just one more thing that's
just floating in kubernetes and not persisted anywhere else. Suggestions on
managing these would also be great.

Thanks.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24229 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAnglqO6kO2sjqhEGh70jjmhH_qaTkWpks5qfmS-gaJpZM4IG8NF
.

@rca

This comment has been minimized.

Copy link

rca commented Sep 1, 2016

After looking around a bit with respect to secrets, I found ansible-vault, which will encrypt a file so it can be kept in version control. In this file I keep simple key=value entries, and i'm working on a tool to generate the secrets yaml for kubernetes to ingest.

Any thoughts on whether ansible-vault, or more specifically an encrypted blob in version control, is a good approach to secrets management are most welcome.

Thanks.

@vendrov

This comment has been minimized.

Copy link
Contributor

vendrov commented Sep 30, 2016

@rca The "spread" project is trying to achive to goal of linking git and kubernetes files:
https://github.com/redspread/spread

@rca

This comment has been minimized.

Copy link

rca commented Oct 11, 2016

@keglevich3 cool, thanks for the tip! Will check them out.

@quinton-hoole

This comment has been minimized.

Copy link
Member

quinton-hoole commented Nov 11, 2016

@timothysc

This comment has been minimized.

Copy link
Member

timothysc commented Nov 16, 2016

/cc @detiber

@pieterlange

This comment has been minimized.

Copy link

pieterlange commented Mar 4, 2017

For anyone still stumbling over this ticket, i made a minimalistic "rancid clone" for kubernetes here: https://github.com/pieterlange/kube-backup

This basically loops over configured resources and commits changes to a git repo (from a Job within the cluster). Hopes this helps anyone.

/edit: questions & comments welcome as github issues, but you can also find me on slack.

@ReneSaenz

This comment has been minimized.

Copy link
Contributor

ReneSaenz commented Jun 26, 2017

@pieterlange How can I email you? I have a question

@luxas

This comment has been minimized.

Copy link
Member

luxas commented Jun 26, 2017

Also @mhausenblas dropped this in the sig-cluster-lifecycle chat today: https://hackernoon.com/introducing-reshifter-for-kubernetes-backup-restore-migration-upgrade-ffaf78da36

Probably useful for subscribers to this thread, thanks @mhausenblas for working on it!

@mhausenblas

This comment has been minimized.

Copy link

mhausenblas commented Jun 26, 2017

Thank you very much for pointing this out here @luxas! I'd be delighted to demo it in the SIG Cluster Lifecycle as well. Will join tomorrow and let's take it from there?

@luxas

This comment has been minimized.

Copy link
Member

luxas commented Jun 26, 2017

Will join tomorrow and let's take it from there?

Yup!

@mhausenblas

This comment has been minimized.

Copy link

mhausenblas commented Jul 1, 2017

A quick update after the first week working on ReShifter:

  • A more or less stable alpha release is now available: v0.3.4, incl. support for etcd 2 and 3, remote backups to S3, as well as a CLI tool.
  • I've tried to capture the design and the reasoning for the approach in the architecture document

It would be wonderful if folks could test ReShifter and provide feedback on both the tools (app, API, and CLI) and the design.

@fejta-bot

This comment has been minimized.

Copy link

fejta-bot commented Dec 31, 2017

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@pires

This comment has been minimized.

Copy link
Member

pires commented Dec 31, 2017

@jbeda @timothysc @ncdc maybe Ark design answers most, if not all, of this?

@mhausenblas

This comment has been minimized.

Copy link

mhausenblas commented Dec 31, 2017

@pires I'd agree that Ark addresses many use cases. I wonder if we want to invest some work here, maybe make it part of the official docs?

@pikeas

This comment has been minimized.

Copy link

pikeas commented Jan 1, 2018

Please add a link to Ark?

@pieterlange

This comment has been minimized.

Copy link

pieterlange commented Jan 1, 2018

@errordeveloper

This comment has been minimized.

Copy link
Member

errordeveloper commented Jan 22, 2018

Sound like this should stay open.

/remove-lifecycle stale
/lifecycle frozen

@iSynth

This comment has been minimized.

Copy link

iSynth commented Aug 10, 2018

The lack of import/export configuration of the K8S cluster is a huge step back which was the decisive factor for the rejection of Kubernetes. In our time, the absence of such a features in the mainstream systems is not permissible.

@redbaron

This comment has been minimized.

Copy link
Contributor

redbaron commented Aug 10, 2018

@iSynth , it is like saying that you reject GNU Linux, because there is no blessed backup solution.

@iSynth

This comment has been minimized.

Copy link

iSynth commented Aug 10, 2018

@redbaron, Linux it does not have bk solutions?!!! Hm.... Bk solutions require services and systems that are run in Linux and one of these mainstream system is K8S which does not know how to do this.

@redbaron

This comment has been minimized.

Copy link
Contributor

redbaron commented Aug 10, 2018

If you squint, you can see GNU Linux as a runtime platform: it provides primitives such as signals, filesystem, processes, etc, , these primitives are accessible via API (syscalls) and you use them to build your application. Then you recognize the need for backups and you build/buy backup solution which uses same primitives, runs and works together with same runtime platform to save state of your apps. You don't expect runtime platform to magically know how to backup all your apps.

Going one (or more) levels higher there is a Kubernetes, which is fundamentally the same: bunch of well defined primitives accessible via API (REST), which you use to run your application, why your requirements suddenly change and you demand it to provide backup solution for you apps? All it can do is to provide primitives to make it easier.

@goblain

This comment has been minimized.

Copy link
Contributor

goblain commented Aug 10, 2018

2+ years after my 1st comment here I still think that is the way to go, and that's how I work. Although recently I decided to try out the approach for cluster backup and recovery targeted at kubeadm provisioned cluster. Now, there is a lot of gotchas in my particular case, like the fact that I assume that cluster as a whole will have the same identity (ie. same dns names), same certs (CA at minimum) and I don't run statefull services within it at the moment. The bottom line is, that at this point a snapshot of etcd + certs + kubeadm provide a cluster I can easily shutdown for the night (as in completely wipe all vm's including control plane), left with only the backup.tgz, and restore the cluster to a fully operational state, with all the services that were running on it by starting new vm's in the morning before the cluster is needed by devs. They can still use it like if it never went down (apart for the night long gap in external prometheus used for metrics).

@iSynth that said I think it's obvious there are ways to backup and restore cluster, there are workflows to allow recovery to different cluster etc. they are just not possible as a universal feature of kubernetes due to how different each environment can be, and how different can be the software deployed on top of it. Pretty much meaning, as always in opensource world, that you must put some effort into research to find (or maybe help build) the right solution for you. It's there, you just need to look for it. :)

@sys-ops

This comment has been minimized.

Copy link

sys-ops commented Nov 9, 2018

  • OLD MASTER NODE

kubectl get all --all-namespaces -o yaml > kubernetes_all_objects_exported.yaml

  • NEW MASTER NODE

kubectl create -f ./kubernetes_all_objects_exported.yaml

@pieterlange

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment