Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable recovery using existing bucket #28

Merged
merged 1 commit into from Jan 14, 2020

Conversation

@dak1n1
Copy link
Contributor

dak1n1 commented Dec 18, 2019

If the managed-velero-operator namespace or CR gets deleted,
usually we would have no way of knowing which S3 bucket is associated
with the operator. We would be left with unmanaged S3 resources.

This commit adds the ability for managed-velero-operator to recover
from namespace deletion by searching for an existing S3 bucket
associated with velero backups for the cluster on which it is running.

https://issues.redhat.com/browse/OSD-2602

@openshift-ci-robot openshift-ci-robot requested review from jewzaam and mwoodson Dec 18, 2019
@dak1n1 dak1n1 force-pushed the dak1n1:restore_from_tag branch 3 times, most recently from dd3829f to 8626ffb Dec 20, 2019
@dak1n1 dak1n1 force-pushed the dak1n1:restore_from_tag branch from 8626ffb to ac2529f Jan 11, 2020
@openshift-ci-robot openshift-ci-robot added size/L and removed size/M labels Jan 11, 2020
@dak1n1 dak1n1 force-pushed the dak1n1:restore_from_tag branch 2 times, most recently from 9e65e79 to 38492b6 Jan 11, 2020
@dak1n1 dak1n1 changed the title [WIP] Enable recovery using existing bucket Enable recovery using existing bucket Jan 11, 2020
@dak1n1

This comment has been minimized.

Copy link
Contributor Author

dak1n1 commented Jan 11, 2020

Tested in stage. quay.io/sedgar/managed-velero-operator:v0.1.61-a3103de

It successfully recovers a bucket that has the right tags, and ignores the other buckets.

{"level":"info","ts":1578709818.3073783,"logger":"controller_velero","msg":"Reconciling Velero Installation","Request.Namespace":"openshift-velero","Request.Name":"cluster"}
{"level":"info","ts":1578709821.003564,"logger":"controller_velero","msg":"No S3 bucket defined. Searching for existing bucket to use"}
{"level":"info","ts":1578709824.879911,"logger":"controller_velero","msg":"Recovered existing bucket: managed-velero-backups-60b51325-7705-4e51-a81b-65b0f1e3e70b"}
{"level":"info","ts":1578709824.9836366,"logger":"controller_velero","msg":"Status updated for cluster","Request.Namespace":"openshift-velero","Request.Name":"cluster"}
{"level":"info","ts":1578709824.9838963,"logger":"controller_velero","msg":"Reconciling Velero Installation","Request.Namespace":"openshift-velero","Request.Name":"cluster"}
{"level":"info","ts":1578709827.3900568,"logger":"controller_velero","msg":"Verifing S3 Bucket exists","Request.Namespace":"openshift-velero","Request.Name":"cluster","S3Bucket.Name":"managed-velero-backups-60b51325-7705-4e51-a81b-65b0f1e3e70b","S3Bucket.Region":"us-east-1"}
{"level":"info","ts":1578709827.4830933,"logger":"controller_velero","msg":"Enforcing S3 Bucket encryption","Request.Namespace":"openshift-velero","Request.Name":"cluster","S3Bucket.Name":"managed-velero-backups-60b51325-7705-4e51-a81b-65b0f1e3e70b","S3Bucket.Region":"us-east-1"}
@dak1n1

This comment has been minimized.

Copy link
Contributor Author

dak1n1 commented Jan 11, 2020

/test verify

@dak1n1 dak1n1 force-pushed the dak1n1:restore_from_tag branch from 38492b6 to a3103de Jan 11, 2020
@dak1n1

This comment has been minimized.

Copy link
Contributor Author

dak1n1 commented Jan 11, 2020

/test verify

@dak1n1 dak1n1 requested review from cblecker and removed request for mwoodson and jewzaam Jan 11, 2020
@dak1n1

This comment has been minimized.

Copy link
Contributor Author

dak1n1 commented Jan 12, 2020

I actually need to make another change to this, since the existing buckets don't have the infraName tag yet and won't be recovered unless I add in a case for that.

@dak1n1 dak1n1 changed the title Enable recovery using existing bucket [WIP] Enable recovery using existing bucket Jan 12, 2020
@dak1n1

This comment has been minimized.

Copy link
Contributor Author

dak1n1 commented Jan 13, 2020

Tested the latest commit in stage. quay.io/sedgar/managed-velero-operator:v0.1.62-96a457e

If it doesn't find a bucket that matches the infraName, it will choose a bucket that is marked for velero backups. This will allow it to recover a bucket even if it doesn't have the infraName tag yet. (Since, when we roll this out, the tags won't exist on the buckets yet). It just chooses the first bucket it sees.

And this only happens in cases where the CR was deleted and re-created without a bucket name specified. Here is an empty CR being created, and how the reconcile loop handles it. (I'm also showing the list of buckets in this account prior to CR creation.)

[dakini@nibbana ~]$ aws s3 ls |awk '{print $3}'
dakini20200110-h8pzr-image-registry-us-east-1-elmisfstsishqwyq
managed-velero-backups-08b7ad44-5d73-4c22-9f7c-99e334e52154
managed-velero-backups-0eff6c60-bc01-44ec-863b-2855024b7306
managed-velero-backups-2d1a51ad-c9c5-441f-a34a-6173aa643dd3
managed-velero-backups-60b51325-7705-4e51-a81b-65b0f1e3e70b
managed-velero-backups-689cbfb1-c5c5-4c57-84ea-9a5116a7b5d3
managed-velero-backups-a0f92088-fe9b-4583-ba7f-8b628b00c0ed
managed-velero-backups-c008ac78-13d9-4d52-94f1-1969a676f73a
managed-velero-backups-c4c43916-ff9c-4788-aebe-76d926154738

[dakini@nibbana ~]$ for bucket in $(aws s3 ls |awk '{print $3}'); do echo $bucket ; aws s3api get-bucket-tagging --bucket=$bucket ; done
dakini20200110-h8pzr-image-registry-us-east-1-elmisfstsishqwyq
{
    "TagSet": [
        {
            "Key": "kubernetes.io/cluster/dakini20200110-h8pzr",
            "Value": "owned"
        },
        {
            "Key": "Name",
            "Value": "dakini20200110-h8pzr-image-registry"
        }
    ]
}
managed-velero-backups-08b7ad44-5d73-4c22-9f7c-99e334e52154
{
    "TagSet": [
        {
            "Key": "velero.io/backup-location",
            "Value": "default"
        }
    ]
}
managed-velero-backups-0eff6c60-bc01-44ec-863b-2855024b7306
{
    "TagSet": [
        {
            "Key": "velero.io/backup-location",
            "Value": "default"
        }
    ]
}
managed-velero-backups-2d1a51ad-c9c5-441f-a34a-6173aa643dd3
{
    "TagSet": [
        {
            "Key": "velero.io/backup-location",
            "Value": "default"
        }
    ]
}
managed-velero-backups-60b51325-7705-4e51-a81b-65b0f1e3e70b

An error occurred (NoSuchTagSet) when calling the GetBucketTagging operation: The TagSet does not exist
managed-velero-backups-689cbfb1-c5c5-4c57-84ea-9a5116a7b5d3
{
    "TagSet": [
        {
            "Key": "velero.io/backup-location",
            "Value": "default"
        }
    ]
}
managed-velero-backups-a0f92088-fe9b-4583-ba7f-8b628b00c0ed
{
    "TagSet": [
        {
            "Key": "velero.io/backup-location",
            "Value": "default"
        }
    ]
}
managed-velero-backups-c008ac78-13d9-4d52-94f1-1969a676f73a

An error occurred (NoSuchTagSet) when calling the GetBucketTagging operation: The TagSet does not exist
managed-velero-backups-c4c43916-ff9c-4788-aebe-76d926154738
{
    "TagSet": [
        {
            "Key": "velero.io/backup-location",
            "Value": "default"
        }
    ]
}
[dakini@nibbana ~]$ oc create -f /home/dakini/work/20200110/velero2.yaml
velero.managed.openshift.io/cluster created
{"level":"info","ts":1578945499.709985,"logger":"controller_velero","msg":"Reconciling Velero Installation","Request.Namespace":"openshift-velero","Request.Name":"cluster"}
{"level":"info","ts":1578945501.818652,"logger":"controller_velero","msg":"No S3 bucket defined. Searching for existing bucket to use"}
{"level":"info","ts":1578945502.0414138,"logger":"controller_velero","msg":"Recovered existing bucket: managed-velero-backups-08b7ad44-5d73-4c22-9f7c-99e334e52154"}
{"level":"info","ts":1578945502.0464835,"logger":"controller_velero","msg":"Status updated for cluster","Request.Namespace":"openshift-velero","Request.Name":"cluster"}
Copy link
Member

cblecker left a comment

This looks really great! Mostly some style nits, but nothing major.

After reviewing, yeah, let's drop the last commit. 1 hour after deploying this, all buckets should have the right tags, and we shouldn't need to recover in that time frame.

deploy/credential_request.yaml Outdated Show resolved Hide resolved
cmd/manager/main.go Outdated Show resolved Hide resolved
cmd/manager/main.go Outdated Show resolved Hide resolved
pkg/controller/velero/controller.go Outdated Show resolved Hide resolved
pkg/s3/bucket.go Show resolved Hide resolved
func TagBucket(s3Client *s3.S3, bucketName string, backUpLocation string) error {
// TagBucket adds tags to an S3 bucket. The tags are used to indicate that velero backups
// are stored in the bucket, and to identify the associated cluster.
func TagBucket(s3Client *s3.S3, bucketName string, backUpLocation string, infraName string) error {

This comment has been minimized.

Copy link
@cblecker

cblecker Jan 13, 2020

Member

Out of scope for this PR, but I think we should probably refactor this eventually to be a generic function that gets called multiple times for each time we want to tag a bucket. Just a thought for future work.

func TagBucket(s3Client *s3.S3, bucketName string, backUpLocation string) error {
// TagBucket adds tags to an S3 bucket. The tags are used to indicate that velero backups
// are stored in the bucket, and to identify the associated cluster.
func TagBucket(s3Client *s3.S3, bucketName string, backUpLocation string, infraName string) error {
err := ClearBucketTags(s3Client, bucketName)

This comment has been minimized.

Copy link
@cblecker

cblecker Jan 13, 2020

Member

Hmmm. I didn't realize we actually wiped tags every time we call this. This isn't really impotent, is it?

Is this intended? What were your thoughts behind this, @c-e-brumm?

This comment has been minimized.

Copy link
@c-e-brumm

c-e-brumm Jan 14, 2020

Contributor

The idea was that if we always deleted the tags and reset the tags they will always be the tags we want. This way a new tag can be added or a tag can be changed in one place in the code and it will always take effect on existing and new clusters. We won't need additional logic to validate tags if they get reset every time. Heavy handed but simple was the thought.

This comment has been minimized.

Copy link
@cblecker

cblecker Jan 14, 2020

Member

I gotcha. There are some potential issues I see with this in the future, so we may want to revisit at some point in the future to make this code more robust.

pkg/s3/bucket.go Outdated Show resolved Hide resolved
deploy/credential_request.yaml Outdated Show resolved Hide resolved
@dak1n1 dak1n1 force-pushed the dak1n1:restore_from_tag branch 2 times, most recently from 6f964a7 to b174e17 Jan 14, 2020
If the managed-velero-operator namespace or CR gets deleted,
usually we would have no way of knowing which S3 bucket is associated
with the operator. We would be left with unmanaged S3 resources.

This commit add the ability for managed-velero-operator to recover
from namespace deletion by searching for an existing S3 bucket
associated with velero backups for the cluster on which it is running.

https://issues.redhat.com/browse/OSD-2602
@dak1n1 dak1n1 force-pushed the dak1n1:restore_from_tag branch from b174e17 to a28325a Jan 14, 2020
@dak1n1

This comment has been minimized.

Copy link
Contributor Author

dak1n1 commented Jan 14, 2020

Tested with the most recent commit. quay.io/sedgar/managed-velero-operator:v0.1.61-a28325a
Bucket recovery is still working.

@dak1n1 dak1n1 changed the title [WIP] Enable recovery using existing bucket Enable recovery using existing bucket Jan 14, 2020
@cblecker

This comment has been minimized.

Copy link
Member

cblecker commented Jan 14, 2020

/lgtm
/approve

@openshift-ci-robot

This comment has been minimized.

Copy link

openshift-ci-robot commented Jan 14, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cblecker, dak1n1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit c5ab0f3 into openshift:master Jan 14, 2020
3 of 4 checks passed
3 of 4 checks passed
tide Not mergeable.
Details
ci/prow/images Job succeeded.
Details
ci/prow/test Job succeeded.
Details
ci/prow/verify Job succeeded.
Details
dak1n1 added a commit to dak1n1/managed-cluster-config that referenced this pull request Jan 14, 2020
This allows the managed-velero-operator to recover an existing S3 bucket.
openshift/managed-velero-operator#28
dak1n1 added a commit to dak1n1/managed-cluster-config that referenced this pull request Jan 14, 2020
This allows the managed-velero-operator to recover an existing S3 bucket.
openshift/managed-velero-operator#28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.