Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rclone failed to process files whose filename or path contains "‛" #6098

Open
wegood9 opened this issue Apr 11, 2022 · 3 comments
Open

Rclone failed to process files whose filename or path contains "‛" #6098

wegood9 opened this issue Apr 11, 2022 · 3 comments

Comments

@wegood9
Copy link

wegood9 commented Apr 11, 2022

The associated forum post URL from https://forum.rclone.org

What is the problem you are having with rclone?

What is your rclone version (output from rclone version)

rclone v1.58.0

Which OS you are using and how many bits (e.g. Windows 7, 64 bit)

debian 11.2 (64 bit)

Which cloud storage system are you using? (e.g. Google Drive)

SFTP and local filesystem

The command you were trying to run (e.g. rclone copy /tmp remote:tmp)

rclone copy /tmp/test /tmp/test2
rclone copy /tmp/test remote:test

A log from the command with the -vv flag (e.g. output from rclone -vv copy /tmp remote:tmp)

2022/04/11 12:14:16 DEBUG : rclone: Version "v1.58.0" starting with parameters ["rclone" "copy" "/tmp/test" "/tmp/test2" "-vvv"]
2022/04/11 12:14:16 DEBUG : Creating backend with remote "/tmp/test"
2022/04/11 12:14:16 DEBUG : Using config file from "/home/username/.config/rclone/rclone.conf"
2022/04/11 12:14:16 DEBUG : Creating backend with remote "/tmp/test2"
2022/04/11 12:14:16 DEBUG : Local file system at /tmp/test2: Waiting for checks to finish
2022/04/11 12:14:16 DEBUG : Local file system at /tmp/test2: Waiting for transfers to finish
2022/04/11 12:14:16 ERROR : test‛‛s: Failed to copy: failed to open source object: open /tmp/test/test‛‛s: no such file or directory
2022/04/11 12:14:16 DEBUG : tests: md5 = d41d8cd98f00b204e9800998ecf8427e OK
2022/04/11 12:14:16 INFO : tests: Copied (new)
2022/04/11 12:14:16 ERROR : Attempt 1/3 failed with 1 errors and: failed to open source object: open /tmp/test/test‛‛s: no such file or directory
2022/04/11 12:14:16 ERROR : test‛‛s: Failed to copy: failed to open source object: open /tmp/test/test‛‛s: no such file or directory
2022/04/11 12:14:16 DEBUG : Local file system at /tmp/test2: Waiting for checks to finish
2022/04/11 12:14:16 DEBUG : tests: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/04/11 12:14:16 DEBUG : tests: Unchanged skipping
2022/04/11 12:14:16 DEBUG : Local file system at /tmp/test2: Waiting for transfers to finish
2022/04/11 12:14:16 ERROR : Attempt 2/3 failed with 1 errors and: failed to open source object: open /tmp/test/test‛‛s: no such file or directory
2022/04/11 12:14:16 ERROR : test‛‛s: Failed to copy: failed to open source object: open /tmp/test/test‛‛s: no such file or directory
2022/04/11 12:14:16 DEBUG : tests: Size and modification time the same (differ by 0s, within tolerance 1ns)
2022/04/11 12:14:16 DEBUG : tests: Unchanged skipping
2022/04/11 12:14:16 DEBUG : Local file system at /tmp/test2: Waiting for checks to finish
2022/04/11 12:14:16 DEBUG : Local file system at /tmp/test2: Waiting for transfers to finish
2022/04/11 12:14:16 ERROR : Attempt 3/3 failed with 1 errors and: failed to open source object: open /tmp/test/test‛‛s: no such file or directory
2022/04/11 12:14:16 INFO :
Transferred: 0 B / 0 B, -, 0 B/s, ETA -
Errors: 1 (retrying may help)
Checks: 2 / 2, 100%
Transferred: 1 / 1, 100%
Elapsed time: 0.1s

2022/04/11 12:14:16 DEBUG : 4 go routines active
2022/04/11 12:14:16 Failed to copy: failed to open source object: open /tmp/test/test‛‛s: no such file or directory

Bug Description

If the source filename contain a special character ‛ (Unicode U+201B), rclone will try processing the file with double ‛ which of course doesn't exist.

In another scenario, I tried synchronizing files from a remote server to local computer through sftp. The local computer only has the directory structure where some folders' names contain ‛. When copying, rclone showed notices like "Duplicate directory found in destination - ignoring", created those folders with double "‛" and considered as non-existent even when I ran rclone copy again after a successful transfer.

rclone dedupe will also fail if I use it at two folders with names containing single ‛ and double ‛ respectively. It prompts "Can't have duplicate names here. Perhaps you wanted --by-hash ? Continuing anyway." and "error listing: directory not found"

Procedure to reproduce this bug:
mkdir /tmp/test && touch /tmp/test/test‛s && touch /tmp/test/tests
rclone copy /tmp/test /tmp/test2 -vv

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.
@ncw
Copy link
Member

ncw commented Apr 11, 2022

That unicode character is used by rclone as part of its filename escaping routines - see https://rclone.org/overview/#restricted-filenames

For instance if you run rclone help flags local-encoding you'll get something like

  --local-encoding MultiEncoder   The encoding for the backend (default Slash,Dot)

Which means you can't store a file with a / in the name nor a file called . or .. on the local disk.

Let's take your example and see what is happening

mkdir /tmp/test && touch /tmp/test/test‛s && touch /tmp/test/tests
$ tree /tmp/test
/tmp/test
├── test‛s
└── tests

but

$ rclone tree /tmp/test
/
├── tests
└── test‛‛s

So what is happening here is that rclone is seeing there is a special quote character and doubling it up.

However the reverse doesn't seem to be working

So there is something up with the name encoding/decoding but I can't quite put my finger on what at the moment...

@ncw ncw added the bug label Apr 11, 2022
@ncw ncw added this to the v1.59 milestone Apr 11, 2022
@ncw
Copy link
Member

ncw commented Apr 12, 2022

On disk name is decoded from Local format to get a binary path, then re-encoded in rclone Standard format

Action Result
On disk test‛s
Decoded from Local test‛s
Encoded as Standard test‛‛s

When an rclone name needs to be turned into an on disk name, it is decoded from Standard format then re-encoded in Local format.

Action Result
Rclone format test‛‛s
Decoded from Standard test‛s
Encoded as Local test‛‛s

The point of the escape character was to make it so that characters which were to be encoded could be passed through unchanged.

So if Asterisk encoding was in effect we decode to * and encode * to . If there is a in the decoded form then we encode it with ‛*.

The problem is that there are strings that don't round trip through a Decode -> Encode -> Decode cycle. Whereas all strings can run through an Encode -> Decode -> Encode cycle unchanged.

This example uses Asterisk encoding as an illustration.

Local.Decode("test‛s") = "test‛s"
Local.Encode("test‛s") = "test‛‛s", OK false  <-- this is the problem this issue is about

Local.Encode("test‛s") = "test‛‛s"
Local.Decode("test‛‛s") = "test‛s", OK true <-- Encode -> Decode -> Encode is fine

Local.Decode("star*s") = "star*s"
Local.Encode("star*s") = "star*s", OK true <-- this shows the normal case for * encoding

Local.Encode("star*s") = "star‛*s"
Local.Decode("star‛*s") = "star*s", OK true <-- Encode -> Decode -> Encode is fine

Local.Decode("star‛*s") = "star‛*s"
Local.Encode("star‛*s") = "star‛‛*s", OK false <- we can't have a * in the encoded state

Local.Encode("star‛*s") = "star‛‛*s"
Local.Decode("star‛‛*s") = "star‛*s", OK true <-- Encode -> Decode -> Encode is fine

Local.Decode("star‛*s") = "star*s"
Local.Encode("star*s") = "star‛*s", OK true <- this shows how ‛ are supposed to be used

Local.Encode("star‛*s") = "star‛‛‛*s"
Local.Decode("star‛‛‛*s") = "star‛*s", OK true <-- Encode -> Decode -> Encode is fine

In the case where rclone creates all the file names then this doesn't matter as rclone always created files with Encoded names which can always be decoded. However when the user creates the file names directly on the local disk or the remote cloud storage system then there are names which can't be Decoded uniquely which then cause this problem.

This explains what is causing the problem. I need to think more on how to fix it.

@ncw
Copy link
Member

ncw commented Apr 15, 2022

The fundamental flaw here is that there are Encoded states which can't be uniquely Decoded.

The rational for adding the ‛ character was to make sure there are no Decoded states which can't be uniquely Encoded. However that is the wrong way round - we don't ever care about Decoded states as we always present Encoded states to the user, either on the cloud storage, or on the local disk or in the command line (in Standard format). We need all Decoded states to be uniquely encodable.

So as a thought experiment, what if we switched the role of ‛ so that when Decoding we use it to escape characters we can't represent.

Using the Asterisk example above this is what currently happens which shows the non roundtrip

Encoded Decoded Encoded
star*s star*s star*s
star*s star*s star*s
star‛s star‛s star‛‛s

However with the sense of the quote encoding reversed, it does the right thing

Encoded Decoded Encoded
star*s star*s star*s
star*s star‛*s star*s
star‛s star‛‛s star‛s

When I get time, I'll make a prototype of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants