The design of encoding for restricted characters is counter-intuitive #7456

URenko · 2023-11-26T11:35:51Z

The associated forum post URL from `https://forum.rclone.org`

https://forum.rclone.org/t/cant-access-file-with-in-the-name/43068

What is the problem you are having with rclone?

Can't access file with ‛ ( U+201B, SINGLE HIGH-REVERSED-9 QUOTATION MARK, the character used to escape restricted characters) in the name.

What is your rclone version (output from `rclone version`)

rclone v1.64.2
- os/version: debian 12.2 (64 bit)
- os/kernel: 6.1.0-13-amd64 (x86_64)
- os/type: linux
- os/arch: amd64
- go/version: go1.21.3
- go/linking: static
- go/tags: none

Which OS you are using and how many bits (e.g. Windows 7, 64 bit)

Linux, 64bit

Which cloud storage system are you using? (e.g. Google Drive)

Local and OneDrive, et.al. Here I take local as the example.

The command you were trying to run (e.g. `rclone copy /tmp remote:tmp`)

echo 'here is the content' > '‛'
rclone cat ‛
rclone cat ‛‛
rclone cat ‛‛‛
rclone cat ‛‛‛‛

A log from the command with the `-vv` flag (e.g. output from `rclone -vv copy /tmp remote:tmp`)

rclone cat ‛ -vv
<7>DEBUG : rclone: Version "v1.64.2" starting with parameters ["rclone" "cat" "‛" "-vv"]
<7>DEBUG : rclone: systemd logging support activated
<7>DEBUG : Creating backend with remote "‛"
<7>DEBUG : Using config file from "/home/<myusername>/.config/rclone/rclone.conf"
<7>DEBUG : fs cache: renaming cache item "‛" to be canonical "/tmp/test/‛‛"
<3>ERROR : : error listing: directory not found
<7>DEBUG : 4 go routines active
Failed to cat with 2 errors: last error was: directory not found

After reading the code, I understand that the story is like this:

For human input, rclone treated it as already be encoded as a way called "Standard", which means EncodeZero | EncodeSlash | EncodeCtl | EncodeDel | EncodeDot
Then rclone will decode it and encode back with the encoding of this backend （FromStandardPath). And use the encoded path to access the backend.
encoding = None, which corresponds to EncodeZero, does NOT mean no encoding. Actually, it means the encoding NUL(0x00) → ␀, Therefore, we have no way to indicates rclone not to use encoding if we follow the design faithfully.

However, there is a short-circuit simplified code in FromStandardPath:

If the target encoding is equal to "Standard", FromStandardPath will do nothing but just return the input path.

Therefore, One trick to solve my problem is use "Standard" as the encoding for backend:

$ rclone cat ‛ --local-encoding None,Slash,Ctl,Del,Dot
here is the content

I think this issue reflects that the current design needs improvement, as currently:

None actually do NUL(0x00) → ␀ and ‛ → ‛‛
None,Slash,Ctl,Del,Dot behave like the encoding is disabled

which is counter-intuitive.

How to use GitHub

Please use the 👍 reaction to show that you are affected by the same issue.
Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
Subscribe to receive notifications on status change and new comments.

The text was updated successfully, but these errors were encountered:

ncw · 2023-11-27T11:16:10Z

I think the handling of ‛ is wrong in the current design - see #6098 for an explanation.

I think this issue reflects that the current design needs improvement, as currently:

None actually do NUL(0x00) → ␀ and ‛ → ‛‛

None,Slash,Ctl,Del,Dot behave like the encoding is disabled

That is interesting and probably explains some of the confusion that this topic generates. Some backends use the Standard encoding directly so have been skipping some encoding whereas others don't.

PS If I was doing character encodings from scratch again, I wouldn't choose the wide letters as these are used widely in CJK languages which I didn't know at the time I chose them.

ncw added bug encoding labels Nov 27, 2023

pitsi mentioned this issue Dec 11, 2023

Rclone 1.65 fails to list the contents of a specific folder which contains an onenote file #7499

Closed

ncw mentioned this issue Apr 15, 2024

Files on Windows vfs lost when file names contain CJK punctuations #7760

Open

URenko mentioned this issue Apr 22, 2024

fix issues related to encoding #7791

Open

5 tasks

URenko mentioned this issue May 6, 2024

Windows: Destination Encoding characters are improperly escaped #7824

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The design of encoding for restricted characters is counter-intuitive #7456

The design of encoding for restricted characters is counter-intuitive #7456

URenko commented Nov 26, 2023 •

edited

ncw commented Nov 27, 2023

The design of encoding for restricted characters is counter-intuitive #7456

The design of encoding for restricted characters is counter-intuitive #7456

Comments

URenko commented Nov 26, 2023 • edited

The associated forum post URL from https://forum.rclone.org

What is the problem you are having with rclone?

What is your rclone version (output from rclone version)

Which OS you are using and how many bits (e.g. Windows 7, 64 bit)

Which cloud storage system are you using? (e.g. Google Drive)

The command you were trying to run (e.g. rclone copy /tmp remote:tmp)

A log from the command with the -vv flag (e.g. output from rclone -vv copy /tmp remote:tmp)

How to use GitHub

ncw commented Nov 27, 2023

URenko commented Nov 26, 2023 •

edited

The associated forum post URL from `https://forum.rclone.org`

What is your rclone version (output from `rclone version`)

The command you were trying to run (e.g. `rclone copy /tmp remote:tmp`)

A log from the command with the `-vv` flag (e.g. output from `rclone -vv copy /tmp remote:tmp`)