Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all versions of a file form the S3 bucket #9171

Merged
merged 6 commits into from May 27, 2020

Conversation

hakman
Copy link
Member

@hakman hakman commented May 24, 2020

Kops uses versioned S3 buckets for the state store, which is also used for storing etcd backups.
Currently versions are not deleted and the bucket usage keeps growing over time.

etcd-manager should be adjusted to use this change.

The change works for both versioned and unversioned buckets. In the case of unversioned bucket, the version ID is "null" and deletion still works as expected.

This won't fix older versions that were not deleted, but going forward removal should be permanent.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 24, 2020
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 24, 2020
@hakman
Copy link
Member Author

hakman commented May 24, 2020

/cc @rifelpet

util/pkg/vfs/s3fs.go Outdated Show resolved Hide resolved
@rifelpet
Copy link
Member

Hm this would effect every s3 object that Kops manages, correct? And we'll need a separate etcd-manager PR specifically for its backups?

I definitely think having etcd-manager delete all versions of its backups when it deletes the latest version of the object makes sense in essentially all situations. I'm wondering if there are situations where users would not want this behavior for all Kops objects though.

Maybe its worth listing the situations in which Kops removes s3 objects. Is it ever more than the kops delete commands? Could it be possible for someone to mistakenly delete a cluster or instance group, and if they'd only been using kops edit rather than storing their manifests locally, maybe their only recovery option would be restoring the latest version of the deleted manifest object? then a kops update cluster --yes would restore the cloud resources.

I'm wondering how we could go about making this optional, if we decide there are valid cases where we wouldn't want this behavior.

@olemarkus
Copy link
Member

There is a similar case for vault as well. There I permanently remove all versions, but that may indeed not be necessarily what users want.

@hakman
Copy link
Member Author

hakman commented May 24, 2020

This would indeed affect any S3 object that Kops manages.
etcd-manager vendors kops and uses vfs from 1.16.0 at the moment, so it will have to be updated to use a new tag.

Seems to me that at the moment Kops only uses the concept of versions for existing files to have some sort of history. Not sure if anyone expects to have the versions for deleted files.
In the case of the "delete cluster" is probably expected to be permanent, not keep around N invisible files.

Maybe removing an instance group by mistake would be inconvenient, not being able to restore it from S3 history. Though, we should see if this is the intended purpose for S3 versions. If we really want this, we could copy the file to some backup location instead.
That being said, we could add a new function like RemoveAll that will be used by etcd-manager especially and maybe other places if someone decides it is more appropriate than the current "fake" delete. It would be more complicated though as it would mean adding this to all vfs types, not just S3.

@rifelpet
Copy link
Member

If we added a new RemoveAll(), we could have the other VFS backends' RemoveAll() just call Remove() until someone implements the version removal if applicable.

@hakman
Copy link
Member Author

hakman commented May 25, 2020

If we added a new RemoveAll(), we could have the other VFS backends' RemoveAll() just call Remove() until someone implements the version removal if applicable.

Sounds good to me. I'll go ahead and do the change.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 25, 2020
@johngmyers
Copy link
Member

Kops references are: DeleteAllClusterState, DeleteInstanceGroup, DeleteKeysetItem (for the legacy format), DeleteSSHCredential, DeleteSecret, and dead code that I'll be removing in a separate PR.

Deleting the cluster and instance group are the only cases where I could consider a user wanting the backups. Not that I want the backups.

@hakman
Copy link
Member Author

hakman commented May 25, 2020

There is also the option to rename the previous Remove() to SoftRemove() and use it only for cluster and instance group.
The current behaviour is only valid for S3, not for any other VFS backed, so we should take this into consideration also for the final decision.

@hakman
Copy link
Member Author

hakman commented May 27, 2020

@justinsb any thoughts on how best to handle this?

return err
}

klog.V(8).Infof("removing file %s", p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "removing all versions of %s" or "remove file %s (all versions)" might be clearer

@@ -59,6 +59,9 @@ type Path interface {
// Remove deletes the file
Remove() error

// RemoveAll completely deletes the file (with all its versions and markers)
RemoveAll() error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: RemoveAllVersions might be clearer (vs this being a recursive delete or something akin to MkdirAll)

Prefix: aws.String(p.key),
}

response, err := client.ListObjectVersions(request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBD: will this require new IAM permissions?

@justinsb
Copy link
Member

So IIRC kops predates versioned buckets, or at least I wasn't aware of them at the time. People started using (quite rightly) for the kops state store, to recover from disasters where they accidentally deleted some of those files.

The interaction with etcd-manager backups was - however - unfortunate! We added that later, and I'm pretty sure it was not the intention of people that enabled versioned buckets to keep their backups forever.

So I'm all in favor of what we have here, where we delete those old versions, but only for etcd backup files.

I do wonder whether this will require new IAM permissions, but I think we should:

  1. try it and see
  2. support the etcd-manager backup store being on a different bucket. I have a related change for jwks support, so I can take a look at that.

In any case, first step is likely to get this merged; then we can get it into etcd-manager; then we can add any missing permissions to kops...

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 27, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman, justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2020
@k8s-ci-robot k8s-ci-robot merged commit 092cf13 into kubernetes:master May 27, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone May 27, 2020
@hakman
Copy link
Member Author

hakman commented May 27, 2020

Thanks for the explanation @justinsb. I created #9188 to address the nits.

k8s-ci-robot added a commit that referenced this pull request May 29, 2020
…9188-upstream-release-1.17

Automated cherry pick of #9171: Remove all versions of a file form the S3 bucket #9188: Fix nits for removal of S3 file versions
k8s-ci-robot added a commit that referenced this pull request May 29, 2020
…9188-upstream-release-1.16

Automated cherry pick of #9171: Remove all versions of a file form the S3 bucket #9188: Fix nits for removal of S3 file versions
@hakman hakman deleted the remove-s3-versions branch June 2, 2020 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants