New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serve: add s3 server #6461
serve: add s3 server #6461
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This is great work :-)
If you can get the s3 integration tests more or less passing then this will be a useful feature. I would have thought you could do auth with the existing httplib auth?
Sounds OK to me. Note you only need to do this on backends with the Feature
Hmm... Assuming for a moment that we only support listing with a prefix which is a directory boundary (which is all rclone uses) then you can use the max depth controls on listing.
It would probably be more efficient to set up listing filters and just use
Metadata is an optional feature. If the backend doesn't support it that is fine. You can probably disable the Line 101 in 7e54782
If this is serving a bucket based backend then it could potentially be more efficient but we don't expose the prefix listings that they have - maybe we should? |
Passed with the latest commit (on my end). Note that I used a small cache to store metadata, which is volatile and won't be transferred to any backend. Maybe change later to adapt with fs/vfs.
added
serve/httplib doesn't support per-request auth like AWS-Signature-V4 etc, so make a separate one for s3 serve.
I've modified the prefixParser to assume that slash is the default delimiter when not specified. That should make major requests goes to directory-based handler rather than fuzzy search.
Good idea but might need some works on fs/vfs interface? |
This is looking good - well done :-) Do you want me to review the code yet? I should probably give it a go and try to break it! |
Almost done, I think you can start reviewing the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very nice :-)
See inline for comments.
cmd/serve/s3/auth.go
Outdated
"github.com/rclone/rclone/cmd/serve/s3/signature" | ||
) | ||
|
||
func (p *Server) authMiddleware(handler http.Handler) http.Handler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make the auth proxy work with this server?
https://github.com/rclone/rclone/blob/master/cmd/serve/proxy/proxy.go
} | ||
|
||
// newBackend creates a new SimpleBucketBackend. | ||
func newBackend(fs *vfs.VFS, opt *Options) gofakes3.Backend { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to implement this without using the VFS, using the lower level fs commands only.
However the VFS provides caching etc which could be useful. It will introduce inefficiencies though of course... For example you can run ListR directly on bucket backed backends, but via the VFS it will list each directory one by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the high request rate of S3, it should be better to use VFS on some backends with strict rate limits. I'd added some logic for the BucketBased backend to execute direct fs.ListR
, hope this will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to also have a VFS passthrough mode or disable VFS entirely? I may need it because I will load balance the rclone-based S3 server using multiple nodes (they restrict it by IP traffic) so having a VFS layer may cause cache invalidation problem.
cmd/serve/s3/help.go
Outdated
Please note that some clients may require HTTPS endpoints. | ||
See [#SSL](#ssl-tls) for SSL configuration. | ||
|
||
Use ` + `--host-bucket` + ` if you want to use bucket name as a part of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what we call path style vs virtual host style in the s3 backend? If so we should probably name it similarly
--s3-force-path-style If true use path style access if false use virtual hosted style (default true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. Currently using --force-path-style
.
Do you want to merge this for 1.60 which I want to release in the next week or two? We could merge it with beta in the docs. |
Seems that some race issues occurred with |
Im not sure what caused that to error there, I forked your repo and ran the action myself and it originally errored out somewhere completely different and I tried to just update the repo with the 100+ commits on the main upstream branch and running the tests again in GitHub seems promising. The Github Actions don't seem to fail on the same test every time which makes me think it has nothing to do with the code changes but has everything to do with the Github Action job runners maybe deprioritizing these long running tasks for free GitHub accounts like my own. Also running make racequicktest locally your serve tests pass just fine on my MacBook and it just complains about osxfuse not being installed(I never installed it so it makes sense for the mount tests in macOS) I would say this should be fine even ignoring the random failing GitHub action tests that don't even pertain to the changes in this pr. In any case I am super excited to have this feature and plan to test it with some of my hobby projects when it makes it to beta or release. I will be able to test it with a few different cloud backends and can use it for large "buckets" with 4mil "objects" easily whenever this pr lands. |
I can confirm that re-running the failed job with zero code changes it eventually passed (3rd try) https://github.com/blaineam/rclone/actions/runs/3589303971/jobs/6041966413 |
@Mikubill do you want me to merge this for rclone 1.61? I think it is done isn't it +/- a few test failures. |
I setup a docker container to test it with one of my backends and I could not get it to work with my mountain duck client at least not fully. It loaded the list of "buckets" but all the spaces were replaced with + signs and I couldn't open any of the buckets. I think it may have been a misconfiguration of my mountain duck because the s3 tests seem to be passing but I thought I should report this concern. Thinking on this I don't think buckets on aws s3 can have spaces I'm going to try a different root but this is probably fine |
Thank you for testing writes @blaineam I did some more testing locally and figured out that it only affects uploads with v4 authorization. After tracing the issue through the code, I ended up in gofakes3, which had missing support for chunked multipart uploads as well as a bug in the chunked reader implementation. I created a pull request rclone/gofakes3#1 with the required fixes. With these fixes applied, everything works now in my limited testing with minio client (with and without v4 authorization). |
That explains why my 1.4MB pdf doesn't have any issues because it likely fits within the size of a single chunk. Nice work patching the gofakes3 fork! |
This looks amazing! |
@ncw Does your milestone add mean this feature will come finally? Like many others I really love to see this coming but it seems a bit this PR is stalled. Unfortunately MinIO with rclone via Fuse seems to be not really stable. Just curious if I can hope on this? :) |
I would love to help getting this PR merged, but I'm not sure what issues need to get fixed to get there. All the issues I discovered in my testings are fixed and I'm successfully running this branch on a staging environment for a few weeks now without any issues. |
Github says that there are merge conflicts to begin with. |
You probably can share your parameters? I compiled the PR successfully and started the s3 server but were not able to add buckets etc. |
This seems to work in general, but I have trouble to use it with authentication. and then upload something with mc, having set the same key & secret
|
Perhaps try JuiceFS. Supports webdav/sftp storage (rclone serve’d), POSIX compliant, and can run an S3 gateway based on minio
… On 23 May 2023, at 13:30, dan3o3 ***@***.***> wrote:
@ncw Does your milestone add mean this feature will come finally? Like many others I really love to see this coming but it seems a bit this PR is stalled. Unfortunately MinIO with rclone via Fuse seems to be not really stable. Just curious if I can hope on this? :)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.
|
The issue I've faced was the one described and fixed by @oniumy , it just never ended up in this PR. So a |
I've created a PR that solves the conflicts: #7062 |
Hi, I gave this a try and it seems to work but the process gets OOM killed after a while. How to reproduce:
$ cat ~/.mc/config.json
....
"rclone": {
"url": "http://localhost:8081",
"accessKey": "test",
"secretKey": "test",
"api": "S3v4",
"path": "auto"
}
...
$ mc mb rclone/foo
$ mc cp /tmp/largefile rclone/foo/largefile
$ mc cp /tmp/largefile rclone/foo/largefile
... It looks like each upload consumes about 2x the file size in memory and the memory is not released (fast enough?) after upload and eventually you get
The branch from #7062 behaves the same. |
@tomyl hi could you try to set |
@individual-it Thanks, that indeed fixes the OOM panic for me. I tried |
Hmm, this really needs fixing so the file gets streamed and not held in memory. |
} | ||
|
||
hasher := md5.New() | ||
w := io.MultiWriter(f, hasher) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This md5 hash object gets initialized, and then it gets a copy of all the bytes from the input. However hasher.Sum(nil) is never called, and the md5 results seem to be never used at all. There is a report that PUT operations use up memory equal to twice the input filesize. Could the md5 hasher be grabbing extra memory until .Sum() called, but then when the hasher object goes out of scope the memory gets released?
And if nothing is ever done with the md5sum, why not just remove it entirely and avoid using up any memory for the md5 calculation?
I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.65 Thank you everyone for contributing - it is a great feature. @Mikubill would you like to move your gofakes3 repo to the rclone org and become a member of the rclone project? I think we need to do major surgery on it to stop it caching multipart uploads in memory so it would be good if it was in the rclone project at that point. |
Great to hear and wow - thank you everyone for making this feature come true. Also i am happy to move gofakes3 to rclone and help strengthening codebase in the future. |
@Mikubill great to hear that :-) Drop me an email to nick@craig-wood.com and we can discuss mechanics. |
What is the purpose of this change?
Implements s3 server for rclone - #3382
Built with gofakes3 and serve/httplib. Some integration tests have passed so far, but still more work to do:
Problems
Listobjects
needs to list all files to apply the prefix filter. In the implementation I usedwalk.ListR(ctx,fs)
, but this seems to be an expensive operation, especially when there are thousands of files in the directoryChecklist