Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Ensure that the order returned by ListObjects is consistent #999

Merged
merged 1 commit into from Oct 30, 2018

Conversation

bashofmann
Copy link
Contributor

When a backup is deleted, the delete method uses ListObjects to get a list of
files it needs to delete in s3. Different s3 implementations may return
the object lists in different, even non-deterministic orders, which can
result in the deletion not working because ark tries to delete a non empty folder
before it tries to delete the files in the folder.

Signed-off-by: Bastian Hofmann bashofmann@gmail.com

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2018

@bashofmann could you provide an example of when you ran into an error? s3 doesn't have real folders - they exist because of the / delimiter in an object's key. We only delete objects, so I'm surprised you'd run into any issues.

@bashofmann
Copy link
Contributor Author

At least in our quobyte s3, if you have these objects:

2018-10-24 13:56         0   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/
2018-10-24 12:48       914   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/ark-backup.json
2018-10-24 12:48       604   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024144838-logs.gz
2018-10-24 12:48       223   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024144838-results.gz
2018-10-24 13:56       604   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024155623-logs.gz
2018-10-24 13:56        82   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024155623-results.gz
2018-10-24 12:48      2368   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/test-backup-logs.gz
2018-10-24 12:48      5237   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/test-backup.tar.gz

and you try to delete the first entry, you get:

$ s3cmd rm s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/
ERROR: Parameter problem: Expecting S3 URI with a filename or --recursive: s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/

This is also visible in the ark logs where it tries to delete s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/ before the objects "in" it.

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2018

Ark should not be trying to delete a pseudo-directory (s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/ in your example). If you see it doing this, that's a problem. I wonder if Quobyte is returning this when Ark is listing objects? I just tested with s3 and it only returns real keys.

@bashofmann
Copy link
Contributor Author

Quobyte definitely returns this and I think in quobyte this pseudo-directory is actually a real key that also needs to be deleted.

If I have the entries from above:

2018-10-24 13:56         0   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/
2018-10-24 12:48       914   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/ark-backup.json
2018-10-24 12:48       604   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024144838-logs.gz
2018-10-24 12:48       223   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024144838-results.gz
2018-10-24 13:56       604   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024155623-logs.gz
2018-10-24 13:56        82   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024155623-results.gz
2018-10-24 12:48      2368   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/test-backup-logs.gz
2018-10-24 12:48      5237   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/test-backup.tar.gz

and I delete everything but the pseudo-directory

❯ s3cmd rm s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/ark-backup.json
delete: 's3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/ark-backup.json'

❯ s3cmd rm s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024144838-logs.gz
delete: 's3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024144838-logs.gz'

❯ s3cmd rm s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024144838-results.gz
delete: 's3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024144838-results.gz'

❯ s3cmd rm s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024155623-results.gz
delete: 's3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024155623-results.gz'

❯ s3cmd rm s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024155623-logs.gz
delete: 's3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/restore-test-backup-20181024155623-logs.gz'

❯ s3cmd rm s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/test-backup-logs.gz
delete: 's3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/test-backup-logs.gz'

❯ s3cmd rm s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/test-backup.tar.gz
delete: 's3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/test-backup.tar.gz'

the pseudo-directory is still listed as an existing key and must be explicitly deleted

❯ s3cmd la --recursive | grep test-backup
2018-10-25 14:58         0   s3://metakube-cluster-backup-z87l9ckv7f-ark/test-backup/

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2018

Ok, so this sounds like Quobyte is not conforming to how s3 normally functions.

Can you try running this and let us know the output?

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
)

func main() {
	if len(os.Args) != 3 {
		log.Fatalf("usage: %s BUCKET PREFIX\n", os.Args[0])
	}

	bucket := os.Args[1]
	prefix := os.Args[2]

	s, err := session.NewSession()
	if err != nil {
		log.Fatalf("error creating aws session/client: %v\n", err)
	}

	req := &s3.ListObjectsV2Input{
		Bucket: &bucket,
		Prefix: &prefix,
	}

	var ret []string
	c := s3.New(s)
	err = c.ListObjectsV2Pages(req, func(page *s3.ListObjectsV2Output, lastPage bool) bool {
		for _, obj := range page.Contents {
			ret = append(ret, *obj.Key)
		}
		return !lastPage
	})

	if err != nil {
		log.Fatalf("error listing: %v\n", err)
	}

	for _, s := range ret {
		fmt.Println(s)
	}
}

go run main.go metakube-cluster-backup-z87l9ckv7f-ark test-backup

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2018

(against a backup "folder" where the files still exist)

@bashofmann
Copy link
Contributor Author

go run main.go metakube-cluster-backup-nnlxpg5dxm-ark backup-tutorial-abc
backup-tutorial-abc/
backup-tutorial-abc/ark-backup.json
backup-tutorial-abc/backup-tutorial-abc-logs.gz
backup-tutorial-abc/backup-tutorial-abc.tar.gz

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2018

Ok. I think a better fix here would be to adjust the aws objectstore listobjects method to omit keys that end in / from the returned list. wdyt @heptio/ark-team?

@bashofmann
Copy link
Contributor Author

But with Quobyte this backup-tutorial-abc/ key needs to be deleted as well, otherwise it is still there.

@ncdc
Copy link
Contributor

ncdc commented Oct 25, 2018

Ok, that makes sense. Just need some more clarifying details:

  1. If you were to run s3cmd rm --recursive against Quobyte, does that work correctly?
  2. Could you please add a comment for why the sort is there?

We'll eventually want a unit test, but this file isn't set up to do unit testing easily. I can extract some changes I made from #629 to make it possible to unit test this, but that can be after this merges.

@bashofmann
Copy link
Contributor Author

bashofmann commented Oct 26, 2018

1:

Not really unfortunately, even though s3cmd says it deleted "the folder", it's still there:

❯ s3cmd rm --recursive s3://metakube-cluster-backup-fhgbvx65xg-ark/test-backup/
delete: 's3://metakube-cluster-backup-fhgbvx65xg-ark/test-backup/'
delete: 's3://metakube-cluster-backup-fhgbvx65xg-ark/test-backup/ark-backup.json'
delete: 's3://metakube-cluster-backup-fhgbvx65xg-ark/test-backup/test-backup-logs.gz'
delete: 's3://metakube-cluster-backup-fhgbvx65xg-ark/test-backup/test-backup.tar.gz'

❯ s3cmd la --recursive | grep "backup-fhgbvx65xg-ark/test-backup"
2018-10-26 13:31         0   s3://metakube-cluster-backup-fhgbvx65xg-ark/test-backup/

❯ s3cmd rm --recursive s3://metakube-cluster-backup-fhgbvx65xg-ark/test-backup/
delete: 's3://metakube-cluster-backup-fhgbvx65xg-ark/test-backup/'

❯ s3cmd la --recursive | grep "backup-fhgbvx65xg-ark/test-backup"

❯ s3cmd rm --recursive s3://metakube-cluster-backup-fhgbvx65xg-ark/test-backup/
WARNING: Remote list is empty.

2: I'll add a comment in a few minutes.

@ncdc ncdc self-assigned this Oct 26, 2018
@ncdc ncdc added this to the v0.10.0 milestone Oct 26, 2018
@bashofmann bashofmann force-pushed the fix-backup-file-deletion-order branch from de12b4f to 6b9a7b7 Compare October 26, 2018 13:52
@ncdc
Copy link
Contributor

ncdc commented Oct 26, 2018

LGTM. Let's get 1 more 👍 from another reviewer, and then you can squash & we'll merge. @wwitzel3 ptal

@nrb
Copy link
Contributor

nrb commented Oct 30, 2018

👍 from me

When a backup is deleted, the delete method uses ListObjects to get a list of
files it needs to delete in s3. Different s3 implementations may return
the object lists in different, even non-deterministic orders, which can
result in the deletion not working because ark tries to delete a non empty folder
before it tries to delete the files in the folder.

Signed-off-by: Bastian Hofmann <bashofmann@gmail.com>
@bashofmann bashofmann force-pushed the fix-backup-file-deletion-order branch from 6b9a7b7 to 8bbfc53 Compare October 30, 2018 19:39
@bashofmann
Copy link
Contributor Author

Commits are squashed.

@ncdc
Copy link
Contributor

ncdc commented Oct 30, 2018

Thanks @bashofmann!

@ncdc ncdc merged commit cb0e6f4 into vmware-tanzu:master Oct 30, 2018
@nrb
Copy link
Contributor

nrb commented Oct 31, 2018

@wwitzel3 Is there a backport PR for this yet?

@wwitzel3 wwitzel3 mentioned this pull request Nov 1, 2018
@wwitzel3
Copy link
Contributor

wwitzel3 commented Nov 1, 2018

Backport PR #1028

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants