You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some files are being unnecessarily re-uploaded to S3 every time I run sync.
My understanding of the cause so far:
This happens when the NextMarker – the key of the last object returned in a single S3 ListObjects request – contains a space.
e.g. in the example below the object key a 1000.txt is encoded by AWS to a+1000.txt in the S3 ListObjects response XML.
When setting it to the next request's marker parameter, rclone encodes it again, so it becomes marker=a%2B1000.txt.
AWS decodes this to a+1000.txt rather than the expected a 1000.txt, so any subsequent objects that start with a and a space are omitted.
This makes rsync think they aren't on the remote, so it re-uploads them.
I've linked a log below (with request and response bodies) demonstrating this, and a minimal reproduction case
By default this can only affect folders containing >1000 files, but it becomes less of an edge case when --fast-list is used, as it could affect any sync of >1000 files in total
What is your rclone version (output from rclone version)
v1.50.2-086-ga186284b-beta (also tested with v1.50.2)
Which OS you are using and how many bits (eg Windows 7, 64 bit)
macOS 10.14.6, 64-bit
Which cloud storage system are you using? (eg Google Drive)
AWS S3
The command you were trying to run (eg rclone copy /tmp remote:tmp)
Before this patch we were failing to URL decode the NextMarker when
url encoding was used for the listing.
The result of this was duplicated listings entries for directories
with >1000 entries where the NextMarker was a file containing a space.
Excellent debugging :-) And thank you for the repro and log - both of which were very useful.
I managed to replicate this after setting the provider correctly in my config (yes that is code for a 30 minute trip down a rabbit hole ;-)
This bug got introduced when we introduced URL encoding into the listings to fix listings with non XML representable characters (eg control characters).
What is the problem you are having with rclone?
Some files are being unnecessarily re-uploaded to S3 every time I run
sync
.My understanding of the cause so far:
NextMarker
– the key of the last object returned in a single S3ListObjects
request – contains a space.a 1000.txt
is encoded by AWS toa+1000.txt
in the S3ListObjects
response XML.marker
parameter, rclone encodes it again, so it becomesmarker=a%2B1000.txt
.a+1000.txt
rather than the expecteda 1000.txt
, so any subsequent objects that start witha
and a space are omitted.--fast-list
is used, as it could affect any sync of >1000 files in totalWhat is your rclone version (output from
rclone version
)v1.50.2-086-ga186284b-beta
(also tested withv1.50.2
)Which OS you are using and how many bits (eg Windows 7, 64 bit)
macOS 10.14.6, 64-bit
Which cloud storage system are you using? (eg Google Drive)
AWS S3
The command you were trying to run (eg
rclone copy /tmp remote:tmp
)This is my minimal reproduction case:
rclone.conf:
On the second run (and all subsequent runs), the last 100 files are re-uploaded unnecessarily.
In my case, this was causing tens of GBs of photos to be re-uploaded every time I ran a sync.
A log from the command with the
-vv
flag (eg output fromrclone -vv copy /tmp remote:tmp
)Log from the second run: https://paste.ee/p/FQu37
The text was updated successfully, but these errors were encountered: