Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blob: support walking through paths #241

Closed
dhowden opened this issue Jul 26, 2018 · 9 comments

Comments

Projects
None yet
6 participants
@dhowden
Copy link

commented Jul 26, 2018

Currently there is no way to read through paths in a blob store (i.e. to iterate over all paths in a bucket or under a particular path, equivalent to filepath.Walk in the std lib). This is maybe stretching the idea of a "blob store" and so might sit better somewhere else, but S3 and GCS have methods which implement this, and the functionality is very useful.

Are there plans to implement this in go-cloud?

Context: we have a storage-abstraction library which we're now planning to migrate to go-cloud instead of maintaining our own bridges to different platforms (sajari/storage#5). Our library also implements path-walking which we have found very useful, but as yet there is no equivalent for go-cloud that we can see :) (https://github.com/sajari/storage/blob/master/walk.go).

@zombiezen

This comment has been minimized.

Copy link
Contributor

commented Jul 26, 2018

Curious, what sort of guarantees do you need from visiting the blobs? The reason that Bucket currently doesn't support a "list" operation is that there's quite a bit of variance between cloud providers on what they offer (consistency, etc.). From what we've seen, most devs who need this type of functionality build a secondary index in a more strongly consistent datastore.

@dhowden

This comment has been minimized.

Copy link
Author

commented Jul 26, 2018

We typically use this functionality to perform idempotent operations on each object in a bucket (tools/background-processes that aggregating information, perform transformation etc...), so an eventual-consistency guarantee has been sufficient for us.

If implemented, the relevant API methods should include info detailing that the call is based on an (potentially) eventually-consistent data source, which is unfortunate. Understandable if this is a deal-breaker for inclusion.

@buchanae

This comment has been minimized.

Copy link

commented Jul 31, 2018

We also have a storage library which we'd like to see shared/normalized/eliminated by a larger community. Ours has stat and list.

Eventual consistency is acceptable for us. It seems better to have the lowest common denominator than to have nothing.

Side note, the term "walk" might not fit object/blob storage where directories are not a first class concept.

@JohnEmhoff

This comment has been minimized.

Copy link

commented Jul 31, 2018

We have a similar need -- our data is effectively write-once, read-many, where attributes of an object are stored in a key with a prefix of the owning object's key. Eventual consistency would be fine for us.

@kidtronnix

This comment has been minimized.

Copy link

commented Aug 15, 2018

+1

@zombiezen

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2018

Summarizing some offline discussion with @ijt, @neild, and others: if/when we do this, we should probably give the operation a name like InconsistentList (even for backends that support strong consistency) so that it clearly communicates expectations.

@vangent vangent self-assigned this Sep 27, 2018

@vangent

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2018

  • S3 has ListObjects
    • Takes a Prefix, MaxKeys (int64, page size), and Marker (string, starting point).
    • There's a helper for iterating through pages of results via ListObjectsPages
  • S3 also has ListObjectsV2 which looks pretty similar.
  • GCS has Objects
    • Takes a prefix.
    • Returns an iterator, which returns a PageInfo you can use for pagination.
  • Azure has ListBlobs
    • Takes Prefix, Marker, MaxResults.
  • For all three, you can specify a Delimiter in the request, which does some kind of pretend-it's-a-directory thing that I don't understand yet.
@zombiezen

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2018

The delimiter allows you to treat the key space semi-hierarchically. For example, if you had the following blobs:

  • foo
  • bar
  • baz/quux
  • baz/fido

And you did a list with empty prefix with the slash delimiter, you would get back foo, bar, and baz/ (the API makes some indication that baz/ is a dummy entry).

@vangent

This comment has been minimized.

Copy link
Contributor

commented Oct 18, 2018

This is done except for exposing As for List, and except for Delimiter support, tracked separately in #541 and #542.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.