Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3: Prefixes with '//' causes rclone commands to fail / look for the wrong objects #5858

Open
nkemnitz opened this issue Dec 4, 2021 · 11 comments

Comments

@nkemnitz
Copy link

nkemnitz commented Dec 4, 2021

What is the problem you are having with rclone?

I have an object on an S3-compatible bucket that contains two consecutive forward slashes, e.g. s3remote://bucket/odd//name/indeed. Any rclone command that tries to read files with that prefix, such as rclone lsf s3remote://bucket/odd//name/ will fail with error message:

2021/12/04 00:05:06 NOTICE: S3 bucket bucket path odd//name/: Odd name received "odd/name/indeed"

After some quick investigation, I am pretty confident it's just a setting in the S3 SDK that needs to be turned on for S3 clients, see aws/aws-sdk-go#2559. I tried to add a simple .WithDisableRestProtocolURICleaning(true) to the code below, but it didn't change the behavior:

rclone/backend/s3/s3.go

Lines 1806 to 1812 in 408d9f3

awsConfig := aws.NewConfig().
WithMaxRetries(ci.LowLevelRetries).
WithCredentials(cred).
WithHTTPClient(client).
WithS3ForcePathStyle(opt.ForcePathStyle).
WithS3UseAccelerate(opt.UseAccelerateEndpoint).
WithS3UsEast1RegionalEndpoint(endpoints.RegionalS3UsEast1Endpoint)

I am not yet familiar with go (or with the rclone code base), so I might have just missed something extremely obvious. Any ideas? If it turns out to be an easy fix, I will try to get a PR working.

@nkemnitz
Copy link
Author

nkemnitz commented Dec 6, 2021

Minor addition: The above change works correctly, but is not sufficient, because rclone itself also has a few places where it treats prefixes like filesystem paths and in the process, destroys the prefix, e.g.:

rclone/backend/s3/s3.go

Lines 1714 to 1719 in 408d9f3

// split returns bucket and bucketPath from the rootRelativePath
// relative to f.root
func (f *Fs) split(rootRelativePath string) (bucketName, bucketPath string) {
bucketName, bucketPath = bucket.Split(path.Join(f.root, rootRelativePath))
return f.opt.Enc.FromStandardName(bucketName), f.opt.Enc.FromStandardPath(bucketPath)
}

path.Join(...) replaces the // with a single /. I changed it to a simple f.root + rootRelativePath for now, which allows me to correctly list the keys with prefix s3remote://bucket/odd//prefix/

Even with that, there is still an issue if one wanted to only list keys with prefix s3remote://bucket/odd//, which should exclude e.g. s3remote://bucket/odd/less_odd_prefix/.

@ncw
Copy link
Member

ncw commented Dec 6, 2021

This is a known issue - there are quite a few other issues about this!

The solution is to add an encoding which turns illegal paths into legal paths so maps objects starting with / into a unicode equivalent and back again - see https://rclone.org/overview/#encoding for the current system.

Rclone expects paths to be valid file paths so doing it in the encoding layer is the correct place.

@nkemnitz
Copy link
Author

nkemnitz commented Dec 6, 2021

Thanks - in the link you shared I can see / listed as character that should be encoded by default. So if I understand correctly, the bug is that rclone is supposed to encode the two ASCII // as two Unicode //, and somehow that is not working?
Or do you mean it's not a bug and I forgot to apply an option somewhere?

@stekern
Copy link

stekern commented Apr 15, 2022

I seem to have stumbled upon this behavior as well, and it was somewhat tricky to debug.

I'm using rclone to sync a local directory to S3, and I could not understand why rclone always re-uploaded every local file despite no changes being made.

I discovered that there was a double forward-slash in the S3 prefix I uploaded to, and when removing this, synchronization worked as expected -- that is, unchanged local files were skipped.

I think it may be worth a mention of this behavior in the documentation for the Amazon S3 backend as it can silently lead to unnecessarily high bandwidth (and potentially storage charges if S3 versioning is enabled). I initially expected rclone to either successfully handle a remote target containing such "unexpected" characters or fail / output an error or warning

@ncw
Copy link
Member

ncw commented Apr 15, 2022

@stekern do you mean you were writing rclone copy /blah s3://bucket/path or something else?

@stekern
Copy link

stekern commented Apr 15, 2022

@ncw I'm on MacOS 12.3.1, using rclone 1.58.0 and the sync command is fairly vanilla: $ rclone sync "/absolute/path/to/example" "aws-s3:bucket-name/prefix//absolute/path/to/example".

@ncw
Copy link
Member

ncw commented Apr 17, 2022

@stekern yes this is the problem - I just wanted to check exactly what you were doing. This case doesn't work yet. It sounds like it could be worked around in the s3 backend, but for a proper fix I think we need to do the / encoding described above.

@microyahoo
Copy link

I have met same issue, any progress on it? @ncw

@yxmeco
Copy link

yxmeco commented Mar 21, 2024

So has this problem been solved? I have encountered this problem.

I synchronize from remote A to remote B, and there are some directories in remote A that are similar

/A/b/c.txt, but ignored synchronization.

There is another question I hope to receive help with. What do I need to do to force directory structure consistency when copying?

I checked

aws/aws-sdk-go#2559

https://github.com/zhucan/rclone/commits/v1.59.1-dev

But it seems that rclone did not introduce this DisableRestProtocolURIClean

@yxmeco
Copy link

yxmeco commented Mar 21, 2024

--local-encoding support like dubleslash ?
for a/b//c.txt
ERROR : a/b/: Entry doesn't belong in directory "a/b" (same as directory) - ignoring

@ncw

@nareshdh
Copy link

nareshdh commented May 9, 2024

I am still getting
ERROR : folder/b/: Entry doesn't belong in directory "folder/b" (same as directory) - ignoring
because the file path in s3 has extra '/' (folder/b//file).
Has this problem been resolved?
--local-encoding doesn't help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants