Skip to content

ryo-watanabe/k8s-snap

Repository files navigation

k8s-snap

Features

  • Take a snapshot of all k8s resources which has verbs 'list', 'get', 'create' and 'delete'.
  • Restore k8s resources from a snapshot on any k8s cluster.
  • Select restoring resources according to 'excludeApiPathes' and 'excludeNamespaces' in preference.
  • Backup data stored on S3.
  • Run on a k8s with CRDs.

Restoring ditails

  • Restore resources basically by 'create', not by 'update'.
  • Restore apps(deployments, statefulsets, daemonsets) after other resources restored.
  • Restore PV definitions and PV/PVC boundings for specified storageclasses.
  • Do not restore token secrets, resources with owner references, endpoints with same name services.

TODO

  • Overwriting resources for specified api pathes.
  • 'Include' contexts in preference. Currently 'Exclude' only.

Options

$ /k8s-snap-controller \
--namespace=k8s-snap
param default
kubeconfig Path to a kubeconfig. Only required if out-of-cluster Optional
master The address of the Kubernetes API server. Overrides any value in kubeconfig. Only required if out-of-cluster. Optional
namespace k8s-snap Namespace for k8s-snap Optional
backupthreads 5 Number of backup threads Optional
restorethreads 2 Number of restore threads Optional
housekeepstore true Check and clean up orphan files on object store regularly (every 300 seconds) Optional
restoresnapshots true Restore snapshot from object store on start Optional
validatefileinfo true Validate size and timestamp of files on object store Optional
maxretryelaspsedminutes 5 Max elaspsed minutes to retry snapshot Optional

Deploy

$ kubectl apply -f artifacts/crd.yaml
$ kubectl apply -f artifacts/namespace-rbac.yaml

Set access/secret key in artifacts/cloud-credential.yaml and create a secret.

apiVersion: v1
kind: Secret
metadata:
  namespace: k8s-snap
  name: k8s-snap-ap-northeast-1
data:
  accesskey: [base64 access_key]
  secretkey: [base64 secret_key]

$ kubectl apply -f artifacts/cloud-credential.yaml

Make a bucket for snapshots.
Set object store endpoint and bucket name in artifacts/objectstore-config.yaml and create a config.

apiVersion: clustersnapshot.rywt.io/v1alpha1
kind: ObjectstoreConfig
metadata:
  name: k8s-snap-ap-northeast-1
  namespace: k8s-snap
spec:
  region: ap-northeast-1
  endpoint: ap-northeast-1.amazonaws.com
  bucket: k8s-snap
  cloudCredentialSecret: k8s-snap-ap-northeast-1

$ kubectl apply -f artifacts/objectstore-config.yaml

Set image and registry key in artifacts/deploy.yaml and deploy.

$ kubectl apply -f artifacts/deploy.yaml

To take s snapshot

Create a snapshot resource

apiVersion: clustersnapshot.rywt.io/v1alpha1
kind: Snapshot
metadata:
  name: cluster01-001
  namespace: k8s-snap
spec:
  clusterName: cluster01
  kubeconfig: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN3akND...
        server: https://cluster01.kubernetes.rywt.io:6443
      name: cluster
    contexts:
    - context:
        cluster: cluster
        user: remote-user
      name: context
    current-context: context
    kind: Config
    preferences: {}
    users:
    - name: remote-user
      user:
        token: eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVz....
  ttl: 720h
  availableUntil: 2020-07-01T02:03:04Z

Snapshot status

$ kubectl get snapshots.clustersnapshot.rywt.io -n k8s-snap
NAME             CLUSTERNAME   BACKUPDATE             STATUS
cluster02-001    cluster02     2019-04-19T08:35:23Z   Completed
cluster02-002    cluster02     2019-04-22T05:04:35Z   InProgress
scluster02-001   sccluster02   2019-04-12T03:49:30Z   Completed
phase status
""(empty string) Just created
InQueue Waiting for proccess
InProgress Taking snapshot
Failed Error ocuered in taking snapshot
Completed Snapshot done and available for restore

Completed snapshot status example

$ kubectl get snapshots.clustersnapshot.rywt.io -n k8s-snap scluster01-001 -o json | jq .status
{
  "availableUntil": "2019-06-19T03:45:08Z",     /*** Snapshot custom resource and data will be deleted on .. ***/
  "contents": [                                 /*** K8s resources backuped in snapshot ***/
    "/api/v1/namespaces/default/configmaps/kubelet-broken-pipe",
    "/api/v1/namespaces/default/endpoints/kubernetes",
    "/api/v1/namespaces/default/persistentvolumeclaims/test-pvc",
    "/api/v1/namespaces/default/secrets/default-token-gj2xd",
    :
  ],
  "numberOfContents": 429,                      /*** Number of backuped k8s resources in snapshot ***/
  "phase": "Completed",                         /*** Status of snapshot ***/
  "reason": "",
  "snapshotResourceVersion": "4521912",         /*** K8s ResourceVersion on which resources in snapshot synced ***/
  "snapshotTimestamp": "2019-05-20T03:45:08Z",  /*** Timestamp corresponding to the ResourceVersion ***/
  "storedFileSize": 138145,                     /*** File size on object store ***/
  "storedTimestamp": "2019-05-20T03:45:08Z"     /*** File timestamp on object store ***/
}

Failed snapshot status example

$ kubectl get snapshots.clustersnapshot.rywt.io -n k8s-snap scluster01-002 -o json | jq .status
{
  "availableUntil": null,
  "contents": null,
  "numberOfContents": 0,
  "phase": "Failed",
  "reason": "Unauthorized",      /*** Error message on snapshot failure including go library error message. Non predictable. ***/
  "snapshotResourceVersion": "",
  "snapshotTimestamp": null,
  "storedFileSize": 0,
  "storedTimestamp": null
}

To restore

Setup a restore preference

Edit artifacts/preference.yaml and create a preference.

Preference items format
excludeNamespaces Namespaces to exclude match exactly
excludeCRDs CRDs to exclude contains
excludeApiPathes Api pathes to exclude prefix,contains or prefix
(ToDo) includeNamespaces Namespaces to include match exactly
(ToDo) includeCRDs CRDs to include contains
(ToDo) includeApiPathes Api pathes to include prefix,contains or prefix
restoreAppApiPathes Api pathes to restore after other resources prefix,contains or prefix
restoreNfsStorageClasses Storageclasses to rebound PV/PVC prefix
(ToDo) restoreOptions excludeContext,overwriteExisting,etc.
  • Currently only 'exclude' contexts are valid in preference.

Create a restore resource

apiVersion: clustersnapshot.rywt.io/v1alpha1
kind: Restore
metadata:
  name: cluster02-cluster01-001-001
  namespace: k8s-backup
spec:
  clusterName: cluster02
  snapshotName: cluster01-001
  restorePreferenceName: exclude-kube-system
  kubeconfig: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN3akND...
        server: https://cluster02.kubernetes.rywt.io:6443
      name: cluster
    contexts:
    - context:
        cluster: cluster
        user: remote-user
      name: context
    current-context: context
    kind: Config
    preferences: {}
    users:
    - name: remote-user
      user:
        token: eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVz....
  ttl: 168h
  availableUntil: 2020-07-01T01:02:03Z
  • Set ttl with time.Duration format h/m/s. If not set, default to 168h0m0s(=7days).
  • Spec.TTL will be ignored when Spec.AvailableUntil is set.

Restore status

$ kubectl get restores.clustersnapshot.rywt.io -n k8s-snap
NAME                            CLUSTER      SNAPSHOT         TIMESTAMP              RV        EXCLUDED   CREATED   UPDATED   EXIST     FAILED    STATUS
scluster02-scluster01-001-001   scluster02   scluster01-001   2019-05-20T03:46:15Z   8514809   50         46        0         13        0         Completed
scluster02-scluster01-002-001   scluster02   scluster01-002   <no value>                       0          0         0         0         0         Failed
scluster02-scluster01-001-002   scluster02   scluster01-001   <no value>                       0          0         0         0         0         InProgress
phase status
""(empty string) Just created
InQueue Waiting for proccess
InProgress Doing restore
Failed Error ocuered in restore
Completed Restore done

Completed restore status example

$ kubectl get restores.clustersnapshot.rywt.io -n k8s-snap scluster02-scluster01-001-001 -o json | jq .status
{
  "alreadyExisted": [             /*** K8s resources existed and not tried to update ***/
    "/api/v1/namespaces/default",
    "/api/v1/persistentvolumes/pvc-4ae96dfc-625b-11e9-9d94-005056bc5df9",
    "/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/remote-user",
    "/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/logfilter-controller",
    :
  ],
  "created": [                   /*** Created k8s resources ***/
    "/api/v1/namespaces/fluent-bit",
    "/apis/rbac.authorization.k8s.io/v1/namespaces/default/rolebindings/clusterrolebinding-dtwx4",
    "/apis/rbac.authorization.k8s.io/v1/namespaces/default/rolebindings/clusterrolebinding-vmsqk",
    "/api/v1/namespaces/fluent-bit/configmaps/cattle-agent-no-such-host",
    :
  ],
  "excluded": [                  /*** K8s resources excluded in restoring by some reason - resource-path,(reason) ***/
    "/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/canal-calico,(not-binded-to-ns)",
    "/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/canal-flannel,(not-binded-to-ns)",
    "/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/cattle-admin-binding,(not-binded-to-ns)",
    "/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/cluster-admin,(not-binded-to-ns)",
    :
  ],
  "failed": null,                /*** K8s resources tried to create but failed - resource-path,error-message(<300chars) ***/
  "numAlreadyExisted": 13,       /*** Number of existed and not tried to update ***/
  "numCreated": 46,              /*** Number of created ***/
  "numExcluded": 50,             /*** Number of excluded in restoring by some reason ***/
  "numFailed": 0,                /*** Number of tried to create but failed ***/
  "numPreferenceExcluded": 319,  /*** Number of excluded in preference ***/
  "numSnapshotContents": 429,    /*** Number of k8s resources in the snapshot ***/
  "numUpdated": 0,               /*** Number of updated ***/
  "phase": "Completed",          /*** Status of restore ***/
  "preserveUntil": "2019-05-27T03:46:15Z",     /*** Restore custom resource will be deleted on .. ***/
  "reason": "",                  /*** Error message on restore failure including go library error message. Non predictable. ***/
  "restoreResourceVersion": "8514809",         /*** K8s ResourceVersion at restore finished ***/
  "restoreTimestamp": "2019-05-20T03:46:15Z",  /*** Timestamp corresponding to the ResourceVersion ***/
  "updated": null                /*** Updated k8s resources ***/
}

Failed restore status example

$ kubectl get restores.clustersnapshot.rywt.io -n k8s-snap scluster02-scluster01-002-001 -o json | jq .status
{
  "alreadyExisted": null,
  "created": null,
  "excluded": null,
  "failed": null,
  "numAlreadyExisted": 0,
  "numCreated": 0,
  "numExcluded": 0,
  "numFailed": 0,
  "numPreferenceExcluded": 0,
  "numSnapshotContents": 0,
  "numUpdated": 0,
  "phase": "Failed",
  "preserveUntil": null,
  "reason": "Snapshot data is not in status 'Completed'",   /*** Error message on restore failure including go library error message. Non predictable. ***/
  "restoreResourceVersion": "",
  "restoreTimestamp": null,
  "updated": null
}

To delete snapshot

Snapshot resources and files on object store automatically deleted when TTL expired.
You can delete a snapshot manually with:

$ kubectl delete snapshots.clustersnapshot.rywt.io -n k8s-snap cluster01-001

and also the corresponding file on object store automatically deleted.