Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync unnecessarily re-uploads some files to S3, when a marker filename contains a space #3799

Closed
markwainwright opened this issue Dec 10, 2019 · 3 comments

Comments

@markwainwright
Copy link

markwainwright commented Dec 10, 2019

What is the problem you are having with rclone?

Some files are being unnecessarily re-uploaded to S3 every time I run sync.

My understanding of the cause so far:

  • This happens when the NextMarker – the key of the last object returned in a single S3 ListObjects request – contains a space.
  • e.g. in the example below the object key a 1000.txt is encoded by AWS to a+1000.txt in the S3 ListObjects response XML.
  • When setting it to the next request's marker parameter, rclone encodes it again, so it becomes marker=a%2B1000.txt.
  • AWS decodes this to a+1000.txt rather than the expected a 1000.txt, so any subsequent objects that start with a and a space are omitted.
  • This makes rsync think they aren't on the remote, so it re-uploads them.
  • I've linked a log below (with request and response bodies) demonstrating this, and a minimal reproduction case
  • By default this can only affect folders containing >1000 files, but it becomes less of an edge case when --fast-list is used, as it could affect any sync of >1000 files in total

What is your rclone version (output from rclone version)

v1.50.2-086-ga186284b-beta (also tested with v1.50.2)

Which OS you are using and how many bits (eg Windows 7, 64 bit)

macOS 10.14.6, 64-bit

Which cloud storage system are you using? (eg Google Drive)

AWS S3

The command you were trying to run (eg rclone copy /tmp remote:tmp)

This is my minimal reproduction case:

mkdir files
for i in {0001..1100}; do touch "files/a $i.txt"; done
rclone sync files/ "s3:<bucket name>" --config rclone.conf --use-server-modtime --update --log-level DEBUG --dump headers,bodies
rclone sync files/ "s3:<bucket name>" --config rclone.conf --use-server-modtime --update --log-level DEBUG --dump headers,bodies

rclone.conf:

[s3]
type = s3
provider = AWS
region = us-west-2
env_auth = true

On the second run (and all subsequent runs), the last 100 files are re-uploaded unnecessarily.

In my case, this was causing tens of GBs of photos to be re-uploaded every time I ran a sync.

A log from the command with the -vv flag (eg output from rclone -vv copy /tmp remote:tmp)

Log from the second run: https://paste.ee/p/FQu37

ncw added a commit that referenced this issue Dec 11, 2019
Before this patch we were failing to URL decode the NextMarker when
url encoding was used for the listing.

The result of this was duplicated listings entries for directories
with >1000 entries where the NextMarker was a file containing a space.
@ncw
Copy link
Member

ncw commented Dec 11, 2019

Excellent debugging :-) And thank you for the repro and log - both of which were very useful.

I managed to replicate this after setting the provider correctly in my config (yes that is code for a 30 minute trip down a rabbit hole ;-)

This bug got introduced when we introduced URL encoding into the listings to fix listings with non XML representable characters (eg control characters).

Try this - it should fix it hopefully!

https://beta.rclone.org/branch/v1.50.2-085-g2dddcc52-fix-3799-s3-nextmarker-beta/ (uploaded in 15-30 mins)

@ncw ncw added this to the v1.51 milestone Dec 11, 2019
@markwainwright
Copy link
Author

@ncw Thanks for looking into this so quickly! I can confirm that your fix resolves the issue for me.

@ncw ncw closed this as completed in 0ecb8bc Dec 12, 2019
@ncw
Copy link
Member

ncw commented Dec 12, 2019

Thanks for testing.

I've merged this to master now which means it will be in the latest beta in 15-30 mins and released in v1.51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants