New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serve: add s3 server - the second #7062
Conversation
there was an issue when using the |
I note that this doesn't credit any of the original authors... What I would do is squash Mikubill's work into a single commit. Rebase and amend that commit until it compiles. Then any changes you make put in a seperate commit. |
@ncw no question, the work is done by @Mikubill, I didn't meant to claim it I've done as you have suggested and now there are two commits: The first makes the server basically work
The second commit:
I don't understand why the |
Nice - thank you. Can you change the commits so their first lines are house style?
To something like
Thank you. I'll re-run the CI now. I won't be able to give this a thorough review until after (the delayed) 1.63 is out the door but I'd like to merge for 1.64. |
8da1b83
to
ee645f4
Compare
Is this still a candidate for 1.64? We could merge it as experimental? What do you think? |
@ncw from my point of view this is ready to be merged. Yesterday I've done a rebase on current master and all tests pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking excellent :-) Especially good work on the tests!
I put some minor comments inline. The most concerning is probably the VFS vs Fs one - we should probably have a plan for changing that even if we don't do it immediately.
What is the status of this library? github.com/Mikubill/gofakes
I see you forked it. Do you plan to upstream your changes? Or pull back changes from the upstream?
I think we can get this merged for v1.64 :-)
func (db *s3Backend) getObjectsListArbitrary(bucket string, prefix *gofakes3.Prefix, response *gofakes3.ObjectList) error { | ||
|
||
// ignore error - vfs may have uncommitted updates, such as new dir etc. | ||
_ = walk.ListR(context.Background(), db.fs.Fs(), bucket, false, -1, walk.ListObjects, func(entries fs.DirEntries) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its best to use the VFS or the lower level Fs layer. Trying to mix and match causes problems as the VFS's view of the world is different the Fs as you've discovered.
Just wondering why you used walk.ListR
here?
cmd/serve/s3/logger.go
Outdated
case gofakes3.LogInfo: | ||
fs.Debugf(nil, fmt.Sprintln(v...)) | ||
default: | ||
panic("unknown level") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
panic on an unknown log level seems a little unkind! Perhaps make it fallthrough
to the error
case?
cmd/serve/s3/logger.go
Outdated
// fs.Infof(nil, fmt.Sprintln(v...)) | ||
switch level { | ||
case gofakes3.LogErr: | ||
fs.Errorf(nil, fmt.Sprintln(v...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that those nil
should probably be something like "serve s3" so the logs have a context?
cmd/serve/s3/s3_test.go
Outdated
return | ||
} | ||
|
||
func RandString(n int) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a re-implementation of lib/random.String()
?
|
||
Please note that some clients may require HTTPS endpoints. | ||
See [#SSL](#ssl-tls) for SSL configuration. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice (future work) to get this to support the auth proxy. That allows different login credentials to choose a different vfs under the control of a short program which opens up the use of the rclone servers amazingly.
cmd/serve/s3/utils.go
Outdated
|
||
switch b := node.(type) { | ||
case vfs.Node: | ||
o = b.DirEntry().(fs.Object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that o
may be nil
here.
This happens when a file is in the process of being uploaded but hasn't made it to the storage yet.
In fact I worked around this exact problem in rclone serve sftp
here 7fc573d
cmd/serve/s3/utils.go
Outdated
case vfs.Node: | ||
o = b.DirEntry().(fs.Object) | ||
case fs.DirEntry: | ||
o = b.(fs.Object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use case fs.Object
then you can avoid a potential panic?
case fs.Object:
o = b
It seems like the operation is single-threaded. If I have a long blocking task for Seems like the current copy feature is not optimal. If I have a super large, deep folders with lots of items (40 directories, more than 10K files each), then the whole copy operation would stuck on Also, I suggest removing the lock entirely, and turn to optimistic concurrency control instead, since S3 is an eventual consistency file provider, and strong guarantee of things like bucket listing or whether there are new updates or not doesn't matter. |
@individual-it we missed v1.64 but there is always v1.65! Did you see the review comments above? |
@ncw I have made some requested changes. Could you review it again, please? |
cmd/serve/s3/backend.go
Outdated
return result, err | ||
} | ||
|
||
hasher := md5.New() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at the memory usage on Mikubill's original PR, and noticed that this hash object never has the Sum() method called, so it soaks up memory and CPU calculating the MD5sum but never actually collects it for any purpose. There was an observation that the memory usage rises as an object is uploaded, up to 2x object size, and then drops back down - I suspect this MD5 object contributes to it.
Maybe just remove it and see how it effects memory usage? There doesn't seem to be any place where Sum() would be used in the current code anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sychan I removed the hasher. Please, have a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
Let me know when you are ready for another review. I'd like to merge this for v1.65. Maybe we'll mark it as experimental to start with. |
@ncw PR is ready for another round of review |
using the mc (minio) client file encoding were wrong see rclone/gofakes3#2 for details
I tested this feature with quite large amount of data, serving it via s3. Everything worked well so far. Would be great to see this merged anytime soon. |
Tests failed in windows. What does it suggest? --- FAIL: TestMkdirAllOnDrive (0.01s)
mkdir_windows_test.go:82:
Error Trace: D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:82
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:117
Error: Received unexpected error:
mkdir \\?\: The filename, directory name, or volume label syntax is incorrect.
Test: TestMkdirAllOnDrive
--- FAIL: TestMkdirAllOnUnusedDrive (0.00s)
mkdir_windows_test.go:92:
Error Trace: D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:92
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:97
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:135
Error: Should be true
Test: TestMkdirAllOnUnusedDrive
Messages: mkdir \\?\: The filename, directory name, or volume label syntax is incorrect.
mkdir_windows_test.go:92:
Error Trace: D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:92
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:98
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:135
Error: Should be true
Test: TestMkdirAllOnUnusedDrive
Messages: mkdir \\?\: The filename, directory name, or volume label syntax is incorrect.
mkdir_windows_test.go:92:
Error Trace: D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:92
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:99
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:135
Error: Should be true
Test: TestMkdirAllOnUnusedDrive
Messages: mkdir \\?\: The filename, directory name, or volume label syntax is incorrect.
mkdir_windows_test.go:92:
Error Trace: D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:92
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:100
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:135
Error: Should be true
Test: TestMkdirAllOnUnusedDrive
Messages: mkdir \\?\: The filename, directory name, or volume label syntax is incorrect.
mkdir_windows_test.go:92:
Error Trace: D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:92
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:101
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:135
Error: Should be true
Test: TestMkdirAllOnUnusedDrive
Messages: mkdir \\?\: The filename, directory name, or volume label syntax is incorrect.
mkdir_windows_test.go:92:
Error Trace: D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:92
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:102
D:/a/rclone/rclone/lib/file/mkdir_windows_test.go:135
Error: Should be true
Test: TestMkdirAllOnUnusedDrive
Messages: mkdir \\?\: The filename, directory name, or volume label syntax is incorrect.
FAIL
FAIL github.com/rclone/rclone/lib/file 0.092s |
These are failing in master... Something has changed in the CI environment. I will fix this separately! I'm planning to try to merge this PR this week. I'll start by pulling it locally and giving it a good test! |
I have pulled this locally and I'm polishing it so please don't add any more commits. There appears to be a problem with upload - if you upload a 1G file then Also, does anyone know why there is so much locking in Otherwise this is looking excellent :-) I will merge it with an experimental tag in due course. |
I have finished my polishing of this code! v1.65.0-beta.7484.b80be190d.pr-7062-serve-s3 on branch pr-7062-serve-s3 (uploaded in 15-30 mins) For some reason I couldn't push it back to this PR like I normally do. The main changes are in b80be19
I wrote a lot of documentation and did a lot of testing and I'm very happy with the code - well done everyone :-) We need to fix this at some point (from the docs)
At some point, but I put a workaround in for rclone so you can disable multipart uploads when using rclone as a client for this server. I propose to merge this - any objections? We can add more PRs on top, but let's get this main part of the work merged for v1.65. |
I would be very happy if it could get merged (any maybe tested by a wider audience) |
I guess it is to prevent TOCTOU, I already raised this concern before |
I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.65 Thank you everyone for contributing - it is a great feature. @wizpresso-steve-cy-fan I did remove the locking. I think it is OK for an eventually consistent system like S3 which doesn't give guarantees like a file system does. |
The memory usage used to increase by 2x the upload size until the an unused md5 hasher object was removed, now I guess it only increases by 1x object size. A quick poke through the gofakes3 code seems to show that on an S3 PUT, they read the whole object into memory, which is probably why you're still seeing a lot of memory usage. This new capability will be great for downloading over S3, but pretty limited for uploading larger objects. https://github.com/johannesboyne/gofakes3/blob/master/backend/s3mem/backend.go#L224 |
@sychan - it is only multipart uploads which use a lot of extra memory. I tweaked the |
What is the purpose of this change?
Implements s3 server for rclone
This is basically #6461 by @Mikubill rebased on current master and solved the conflicts
Also I've removed the http auth (I don't understand why it was needed) and added the fix of @oniumy to get S3auth working (rclone/gofakes3#1)
Give Honor to Whom It Is Due:
Checklist