Add s3util.ListObjects(url string, c Config) (ListObjectsResult, error) #7

hnakamur · 2013-06-05T16:06:32Z

This is a function for the GET Bucket (List Objects) API.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

…ror)

… uses "/", while thirdparty tools like S3fox, 3Hub and NativeS3FileSystem. Sort entries after trimming "_$folder$" suffixes.

…use as a marker for the next query.

… later list results.

kr · 2013-07-14T23:19:38Z

I thought I replied here before, apologies for missing that.

The name ListObjects seems redundant, why not just List?
(Or perhaps Readdir?)

The public interface here seems a bit complicated. Two new
types introduced. Is it possible to implement
http://godoc.org/os#FileInfo and avoid introducing a new
public type Content?

Also, is there any way to avoid exposing fields like Marker
and IsTruncated? Those are implementation details, which
ideally s3util would handle automatically. That is, is something
like this signature possible?

func Open(url string, c *Config) (*File, error)

func (f *File) List(n int) ([]os.FileInfo, error)

hnakamur · 2013-07-16T13:43:54Z

Hi, thanks for your reply.

When we add an API for listing buckets in the future,
I suppose it will be named as ListBuckets.
I thought ListObjects is a better name than List when having ListBuckets.
I think List may be OK, though.

I think Readdir is confusing.
With the name Readdir, I expect the result would be entries in one directory
like os.File.Readdir API.

I implemented the ListObjects API as a low level primitive API corresponding to:
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html
Two new types are needed to contain all response elements.

I think your signature for List() is somewhat misleading.
I had an impression that f is a direcotry and I will get entries in it.

I think this is better.
func List(marker *File, n int) ([]os.FileInfo, error)

Before we think for function signatures, we have to think about
a gotcha about directory names on S3.
The amazon S3 web app console sets directory names with
the suffix '/' (ex. 'foo/'). On the other hand tools like S3Fox
uses the suffix '$folder$' (ex. 'foo$folder$').

And we need to specify directory names with these suffixes
when we use them for markers for the next ListObjects call.
So we cannot just trim and throw away directory suffixes.

If we use os.File for file or directory entries, we must use
Name() for returning names with directory suffixes. And we have to
define another function
func TrimDirectorySuffix(f *File) string
to get a directory name without '/' or '_$folder$'.
I don't like to call TrimDirectorySuffix(file.Name()) to get actual names.

Could you tell me what you think?

kr · 2013-07-16T18:39:16Z

Package s3util isn't really meant for low level functions.
It's for convenient high level access. For example, the
user shouldn't have to keep track of the marker; this
package should do it for them.

S3 doesn't have directories, but it's possible to treat
objects as if they were in a hierarchy, and the amazon
api and docs encourage this. It seems reasonable to
present files that way. There's no way we could use
os.File, but we could make an s3.File that's analogous.
A File that corresponds to an actual object would
need to present the exact path of that object as its name.
A File that corresponds to an intermediate level of
hierarchy (aka a directory) would need to present as its
name the path up to that point, not including the trailing
path separator.

Since '/' is already the path separator, creating an empty
object ending in '/' causes a level of hierarchy to appear
with no extra logic. It seems unwise to use any other suffix
for these pseudo-directories.

Given the following objects:

sample.jpg
photos/2006/January/sample.jpg
photos/2006/February/sample2.jpg
photos/2006/February/sample3.jpg
photos/2006/February/sample4.jpg

This api could produce the following listings:

For "/":
photos
sample.jpg

For "/photos":
2006

For "/photos/2006":
February
January

etc.

Why can't List can work for listing both buckets and objects?

hnakamur · 2013-07-17T14:58:19Z

Now I understand that s3util is meant for high level access. Thanks for your explanation.

As for directory suffixes, I wish all tool out there used only '/'. In reality, there are already
a lot of directories with both suffixes '/' and '_$folder$, so I think it would be better for
s3util to process directories with both of them.

Your listing is a breadth-first search, but S3 List API is a depth-first search.
And the S3 List API has the limit for returning entries count. It returns 1000 entries at most.
So we need to call S3 List API multiple times when we have a lot of entries.
Actually we need to traverse all entries to get the top level listing.

I would like to control the count of S3 API calls because they costs money.
Also, I would like to process listings as I go getting them partially before I get total listings.

Yes, maybe List can work for listings both buckets and objects.

kr · 2013-07-18T18:53:04Z

The page you linked above,
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html,
shows the first step in a breadth-first search under heading
"Sample Request Using Prefix and Delimiter".

The key seems to be to supply the path separator as the delimiter param.

The design I suggest would perform exactly one S3 call per call to List.
Hopefully this is sufficient to control costs.

Just like for os.File.Readdir, List can let the user decide how many results
to get at once (up to the amazon limit), and continue where it left off in a
subsequent call.

hnakamur · 2013-07-21T06:02:39Z

I read samples in
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

By setting delimeter=/, you get only directory entries. So you have to do an extra API call for getting entries in directories. And those results have files and subdirectories mixed.

By just using the marker parameter and not using delimiter, the needed API call count is int((entries - 1)/ 1000) + 1 (1000 = the max entries count per an API call). And this is the minimum you can get.

kr · 2013-07-21T19:28:26Z

Files and directories aren't mixed. Files are listed in Contents,
and directories are in CommonPrefixes. In this example (copied
from amazon), the file is sample.html and the directory is photos.

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>example-bucket</Name>
  <Prefix></Prefix>
  <Marker></Marker>
  <MaxKeys>1000</MaxKeys>
  <Delimiter>/</Delimiter>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>sample.html</Key>
    <LastModified>2011-02-26T01:56:20.000Z</LastModified>
    <ETag>&quot;bf1d737a4d46a19f3bced6905cc8b902&quot;</ETag>
    <Size>142863</Size>
    <Owner>
      <ID>canonical-user-id</ID>
      <DisplayName>display-name</DisplayName>
    </Owner>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
  <CommonPrefixes>
    <Prefix>photos/</Prefix>
  </CommonPrefixes>
</ListBucketResult>

Doing a breadth-first traversal might still take a few more api calls than
the flat listing, but it seems much more convenient.

hnakamur · 2013-07-22T15:37:51Z

Thank you again for your explanation.
I confirmed that files are listed in Contents and directories are in CommonPrefixes with my sample program.

I tried to implement proposed APIs, but I found out we cannot get LastModified for directories.
Is it OK that f.ModTime() returns the zero value for time.Time if f is a directory?

kr · 2013-07-22T19:13:46Z

Is it OK that f.ModTime() returns the zero value for time.Time if f is a directory?

Yes, that seems reasonable. Also for Size() etc. Since directories
don't really exist, they can't have metadata.

hnakamur · 2013-07-23T00:01:02Z

Oh, I was wrong about directories. I knew S3 console creates entries for directories, but I thought we cannot get them with delimiter specified. Actually we can get them.

an empty directory created with S3 console.

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>go-s3</Name>
  <Prefix>s3util/foo/</Prefix>
  <Marker/>
  <MaxKeys>1000</MaxKeys>
  <Delimiter>/</Delimiter>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>s3util/foo/</Key>
    <LastModified>2013-06-07T07:52:45.000Z</LastModified>
    <ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag>
    <Size>0</Size>
    <Owner>
      <ID>a42a235b94cfe0f3fd630844e076307918c210d57a6e3499e813f564588716a4</ID>
      <DisplayName>hnakamur</DisplayName>
    </Owner>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
</ListBucketResult>

a file uploaded to the directory above.

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>go-s3</Name>
  <Prefix>s3util/hoge/</Prefix>
  <Marker/>
  <MaxKeys>1000</MaxKeys>
  <Delimiter>/</Delimiter>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>s3util/hoge/</Key>
    <LastModified>2013-07-22T23:31:55.000Z</LastModified>
    <ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag>
    <Size>0</Size>
    <Owner>
      <ID>a42a235b94cfe0f3fd630844e076307918c210d57a6e3499e813f564588716a4</ID>
      <DisplayName>hnakamur</DisplayName>
    </Owner>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
  <Contents>
    <Key>s3util/hoge/list_local.go.bak</Key>
    <LastModified>2013-07-22T23:36:04.000Z</LastModified>
    <ETag>"afda40162cce64840ffd7aae3b2d3094"</ETag>
    <Size>894</Size>
    <Owner>
      <ID>a42a235b94cfe0f3fd630844e076307918c210d57a6e3499e813f564588716a4</ID>
      <DisplayName>hnakamur</DisplayName>
    </Owner>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
</ListBucketResult>

When I had created my directory structures on S3 for my experiments and implementing ListObjects(), I initially uploaded files with 3Hub: Amazon S3 Client (for Mac OS X).
This tool is creating directories names with '$folder$' suffixes, like S3Fox Organizer(S3Fox).
Then I removed directory entries with '$folder$' suffixes on S3 console.
So now there are no entries for those directories.

If you use only S3 console to create directories,
you can get directory entries like the above examples. Sorry for confusion.
In this case, you can get metadata for directories.

Of course, if you use only S3 APIs, you can create file entries without parent directory entries.
In this case, you cannot get metadata for directories.

kr · 2013-07-23T03:27:22Z

Yes, in my interpretation, s3util/foo/ is technically an empty file,
and s3util/foo is the directory that holds it. The file's basename
(returned from method Name on FileInfo) would be the empty string.

hnakamur · 2013-07-23T17:10:29Z

Thanks for your comment. I close this pull request since I made another pull request #14 for new APIs.

hnakamur and others added 10 commits June 6, 2013 01:00

Add s3util.ListObjects(url string, c *Config) (*ListObjectsResult, er…

630eaab

…ror)

Trim directory key suffix "/" as well as "_$folder$". The AWS Console…

da8d96a

… uses "/", while thirdparty tools like S3fox, 3Hub and NativeS3FileSystem. Sort entries after trimming "_$folder$" suffixes.

Move defer reader.Close() to ListObjects().

fde893c

move example code to separage package

80e9b00

Add Path field to Contents in order to get the original key later to …

6757571

…use as a marker for the next query.

Group folder suffix consts.

de06da6

Rename Contents type to Content.

731c974

Add field comment to Content.Path

3522273

Stop sorting contents because a dir with "_$folder$" suffix may be in…

dcdf7c1

… later list results.

Move public function and type upper in the source file.

d4f288c

hnakamur mentioned this pull request Jul 23, 2013

Add s3util.List() for getting entries in a directory #14

Closed

hnakamur closed this Jul 23, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add s3util.ListObjects(url string, c Config) (ListObjectsResult, error) #7

Add s3util.ListObjects(url string, c Config) (ListObjectsResult, error) #7

hnakamur commented Jun 5, 2013

kr commented Jul 14, 2013

hnakamur commented Jul 16, 2013

kr commented Jul 16, 2013

hnakamur commented Jul 17, 2013

kr commented Jul 18, 2013

hnakamur commented Jul 21, 2013

kr commented Jul 21, 2013

hnakamur commented Jul 22, 2013

kr commented Jul 22, 2013

hnakamur commented Jul 23, 2013

kr commented Jul 23, 2013

hnakamur commented Jul 23, 2013

Add s3util.ListObjects(url string, c *Config) (*ListObjectsResult, error) #7

Add s3util.ListObjects(url string, c *Config) (*ListObjectsResult, error) #7

Conversation

hnakamur commented Jun 5, 2013

kr commented Jul 14, 2013

hnakamur commented Jul 16, 2013

kr commented Jul 16, 2013

hnakamur commented Jul 17, 2013

kr commented Jul 18, 2013

hnakamur commented Jul 21, 2013

kr commented Jul 21, 2013

hnakamur commented Jul 22, 2013

kr commented Jul 22, 2013

hnakamur commented Jul 23, 2013

kr commented Jul 23, 2013

hnakamur commented Jul 23, 2013

Add s3util.ListObjects(url string, c Config) (ListObjectsResult, error) #7

Add s3util.ListObjects(url string, c Config) (ListObjectsResult, error) #7