Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 FS implementation #90

Closed
wants to merge 1 commit into from
Closed

S3 FS implementation #90

wants to merge 1 commit into from

Conversation

rgarcia
Copy link

@rgarcia rgarcia commented May 22, 2016

This is a mostly-working/mostly-tested implementation of S3 as an FS.

There are some gaps in the implementation, but it is otherwise pretty functional:

  • It doesn't support non-offset == 0 reads or writes
  • It doesn't support Truncate
  • It doesn't support Seek with whence set to 2
  • It doesn't support opening a file in append mode

All of these things increase the complexity of the implementation significantly (some are questionably well-defined, e.g. append in an eventually-consistent FS like S3), so for now I've left them out.

In order to reduce clutter in the root package github.com/spf13/afero, I put the whole implementation + tests in a sub package, github.com/spf13/afero/s3. In order to leverage the generic Fs tests in afero_test.go, I copy-pasted them into a sub package github.com/spf13/afero/tests. I also added a few tests to this file as I went along.

In order to run the tests you must pass a bucket: go test -v github.com/spf13/afero/s3 -s3bucket <test bucket>. You also need to set AWS_REGION, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY.

Let me know what you think 😄

@CLAassistant
Copy link

CLAassistant commented May 22, 2016

CLA assistant check
All committers have signed the CLA.

@dim
Copy link

dim commented Jun 28, 2016

👍

@ns-cweber
Copy link

Could this get some love? I am very interested in using this in my project. Depending on a fork is never ideal.

if os.IsNotExist(err) {
return fs.OpenFile(name, os.O_CREATE, 0777)
}
return (*S3File)(nil), err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you're returning a valid interface to a nil pointer? While everyone should be checking the error, CacheOnReadFs checks the interface for nil, and this breaks in hard-to-troubleshoot ways. While I don't think anyone should be ignoring the error, returning a valid interface isn't doing anyone any favors either.

if strings.Contains(err.Error(), "404") {
statDir, err := fs.statDirectory(name)
return statDir, err
}
Copy link

@ns-cweber ns-cweber Feb 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than checking the string, I think it would be better to check the error code:

if err, ok := err.(awserr.Error); ok {
    if err.Code() == "NotFound" {
        return fs.statDirectory(name)
    }
}

Copy link

@ns-cweber ns-cweber Feb 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a runnable example to play around with (just manipulate the bucket name):

package main

import (
	"log"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/awserr"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
)

func main() {
	sess, err := session.NewSession()
	if err != nil {
		fmt.Fprintln(os.Stderr, err)
		os.Exit(-1)
	}

	if _, err := s3.New(sess).HeadObject(&s3.HeadObjectInput{
		Bucket: aws.String("bucketName"),
		Key:    aws.String("/missing/key/"),
	}); err != nil {
		if err, ok := err.(awserr.Error); ok {
			fmt.Println("CODE:", err.Code())
		}
	}
}


var s3bucket string

func init() {
Copy link

@nicored nicored Apr 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done as an integration test. Instead, you could Monkey Patch s3 methods (mock cannot really be done).

// my_test.go
...
monkey.PatchInstanceMethod(reflect.TypeOf(s3Session), "GetObject", s3_mocks.GetObjectMockFn())
...
// mocks_s3.go

type ClosingBuffer struct {
    *bytes.Buffer
}

func (cb *ClosingBuffer) Close() error {
	return nil
}

func GetObjectMockFn() func(_ *s3.S3, _ *s3.GetObjectInput) (*s3.GetObjectOutput, error) {
	return func(_ *s3.S3, _ *s3.GetObjectInput) (*s3.GetObjectOutput, error) {
		cb := &ClosingBuffer{bytes.NewBufferString(`This is the content of a file`)}

		var rc io.ReadCloser
		rc = cb
		rc.Close()

		return &s3.GetObjectOutput{
			Body: rc,
		}, nil
	}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Minio as a semi-functional mock?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which would mean that your unit tests would depend on Minio, which I'm against.
Your unit tests should not depend on external services to work, but Minio would be a good candidate for integration testing, but then I'd rather connect my tests to a test bucket in s3.
I've had a chat with a few people who run their unit tests as integration tests to avoid to have to re-write parts of their tests, which sounds wrong to me.
Here, we're using the aws sdk, which we assume is stable and has been tested, so what you want to is just test your app logic and mock what functions will return, and allow you to also test how you handle errors you'd not get when aws behaves fine. It gives you more flexibility in your unit tests to catch all edge cases you can think of. I do agree that writing mocks can often add complexity in your tests, but it's worth it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can see where you're coming from. Using S3 itself as an integration test sounds potentially slow (and $) too, but I guess that depends on a lot of factors.

With mock vs integration... neither way seems to be perfect, both being needed. 😉

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, storage is really really cheap these days, 0.0245 per gb for standard storage.

Copy link

@nicored nicored left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some concerns about unit tests

@jcftang
Copy link

jcftang commented Oct 5, 2017

I know this has been sitting around for a while, but it would be nice to see if one of the s3 backends could be merged in

@aviau
Copy link

aviau commented Apr 27, 2018

Any news on this? I would also love to see this merged.

@rgarcia Would you rebase your PR?

@aviau
Copy link

aviau commented Jul 19, 2018

Someone should probably move this PR to absfs/afero

mrjoshuak added a commit to absfs/afero that referenced this pull request Aug 20, 2018
@mrjoshuak mrjoshuak mentioned this pull request Aug 20, 2018
@@ -24,7 +24,7 @@ import (
// readDirNames reads the directory named by dirname and returns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the function name in the comment?

panic("read after close")
}
if f.offset != 0 {
panic("TODO: non-offset == 0 read")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe open a follow/up issue to track this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, this should probably be fixed from the get-go.

case 1:
f.offset += int(offset)
case 2:
// can probably do this if we had GetObjectOutput (ContentLength)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic("write after close")
}
if f.offset != 0 {
panic("TODO: non-offset == 0 write")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be fixed, otherwise Write in existing files is not useful?

Bucket: aws.String(f.bucket),
Key: aws.String(f.name),
Body: readSeeker,
ServerSideEncryption: aws.String("AES256"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own info. how is the key managed server-side? Is it fully managed by Amazon?

Key: aws.String(name),
})
}
return file, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

return file, nil

Key: aws.String(nameClean),
})
if err != nil {
if strings.Contains(err.Error(), "404") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awserr is a thing so please try to cast to typed errors

https://docs.aws.amazon.com/sdk-for-go/api/aws/awserr/

@0xmichalis
Copy link
Collaborator

@rgarcia if you have time addressing the outstanding comments, I can help with merging.

@rgarcia
Copy link
Author

rgarcia commented Mar 27, 2020

Thanks for the review @Kargakis and hi everyone. I made this PR almost 4 years ago and since then my team's priorities have shifted and we no longer use afero. So unfortunately it's very unlikely that I will find the time to finish the work here.

@dim
Copy link

dim commented Mar 30, 2020

Thanks @rgarcia, that's why we have come up with our own spin on afero https://github.com/bsm/bfs, it has adapters for in-memory, file system, S3, GCS, FTP and soon AZ Blob

@fclairamb
Copy link

Hi guys,

I reworked a fork of this implementation here and I'm interested by your feedback on it. It's based on @wreulicke's fork of @aviau's fork of @rgarcia's code and basically fixes some limitations it had. The biggest limitation I had is that it didn't handle the streaming (for upload or download) of files. I used a pipe and a go-routine to handle that.

In my humble opinion:

  • Each Fs implementation should be in its own repository so that each developer can choose the one he wants
  • You should recommend some of them or even create an afero-contrib github organization where you share the core Fs

I intend to build some other Afero.Fs for a multi-backend ftp server which was initially only a library.

@0xmichalis
Copy link
Collaborator

0xmichalis commented May 20, 2020 via email

@pinpox
Copy link

pinpox commented Jun 17, 2022

Is there still work being done on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet