-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/fs: Encode filenames containing reserved characters (fixes #1734) #7876
Conversation
b9b0db3
to
5cec9fa
Compare
4bb4f36
to
cbd4e10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just read some comments and thought briefly about the overall thing, on the assumption that you might want this merged at some point.
lib/fs/encoder.go
Outdated
// * For example, if the user selects the fat encoder, saves | ||
// some encoded filenames, and then changes the encoder back to | ||
// standard. On ext4 filesystems, this could lead to filenames being | ||
// written both with their original names (using the standard | ||
// encoder), and their encoded names (using the fat encoder). | ||
// | ||
// That's why the standard encoder must ignore encoded filenames when | ||
// found on disk, to avoid "seeing" both the original filename, and | ||
// its encoded equivalent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that file names with the magic unicode characters in them, which are synced using passthrough
or in previous versions of Syncthing, will be deleted when switching to standard
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing. I think you're right, the passthrough
encoder needs to be the default. I need to rethink this whole concept. Edit: The passthrough
encoder is now the default.
// to "see" these encoded filenames, they need to select an encoder, | ||
// such as the fat encoder, that will decode these filenames, or | ||
// configure all connected devices to use the passthrough encoder. | ||
names = f.encoder.filter(names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this result in an inconsistent state accessible through a filesystem interface, as in a file might be filtered here, but I can stat, delete, ... it? I'd expect the same treatments for names on all filesystem methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. Presumably, Syncthing wouldn't "see" the files to know to stat or delete them, as the directory list and walk functions ignore these files on disk. But perhaps it found them in the database, from a previous scan. If we want all fs functions to skip over these files (i.e., treat them as non-existent) I will need to add logic to do so. I think this logic could live in fs.rooted(), which can return a not found error. This is a good idea. I'll need to add some tests for this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm about to push these changes to this PR. Unfortunately, I recreated the add/fs-encoder branch off of main, and will force-push these changes. I hope this is acceptable. My apologies if not.
Why would that be the case? If a single node enables fat encoding, it should still receive and send the unencoded names that include the reserved characters over the wire, so from the perspective of other nodes nothing changes. Right? And why would you want to enable the fat encoder on a non-fat linux node? |
As an alternative to the proposed encoding here, Samba has to solve the same problem. There a recommended (as in: used as the primary example) replacement strategy is to replace the reserved characters with rare unicode characters that look similar:
This has the advantage that the characters are still recognizable when viewed with another app, unlike the private use area replacements. reference: https://www.samba.org/samba/docs/current/man-html/vfs_catia.8.html |
To quote Shakespear: That way madness lies |
The Samba guys apparently disagree. Of course there is a trade between simplicity and having everything configurable, and if there was a solution that would just work for everyone, simplicity would be the way to go. An option that would be simpler on the configuration would be to have another encoder that encodes to Samba's visually similar characters. |
If a node uses the Why? Because all nodes default to the Why? Because we want to remain backwards compatible, and we need to avoid creating both encoded filenames (
You wouldn't. You would enable the |
We could add easily add this encoding logic to this PR, but the characters But using this encoding doesn't solve the problem of existing files with these encoded characters in their filenames. Syncthing will see these filenames as encoded, and will then decode them, leading to out-of-sync errors. As they say, the devil's in the details. |
As far as I can tell, escape characters can only end up on non-updated nodes if a node first enables the I haven't read through all the changes of this PR in detail, but in my understanding it should work as follows: Say node A is a linux system that allows all characters in filenames, and node B is a windows system that doesn't allow colons, among others. Currently node A sends file Only if then node B switches back to the
That risk always exists, also when using PUA unicode characters. They are 'private use area', so any app can use them however it wants to. There's bound to be someone somewhere trying to sync files that use those PUA characters. Which is why this encoder logic should be opt-in and allow the user to make a choice. Although as far as I understand Syncthing, it won't lead to out-of-sync errors, Syncthing will just assume that the decoded name is the real name and sync that. But I agree such an encoder could be added as an alternative to the |
I agree, but since how Syncthing works now is effectively running as the One way to solve this issue is: on startup, the If on that first scan, encoded filenames are found, the folder is paused, and the GUI would report the number of files found, and maybe the first 10 filenames, or so, and asks the user to either:
If the user makes a selection, the EncoderSelected flag is set to true, and the folder is unpaused. When EncoderSelected is set, the filenames are not scanned on future startups. If EncoderSelected is false, and the The EncoderSelected flag can be set/reset via folder options. For CLI users, we add an Lastly, If this makes sense, and there is interest, I would consider adding this logic to this PR. But I would prefer this PR be accepted first, before I start on that effort. |
Hm. I really don't like pausing folders because some condition was detected during an upgrade. The likelihood of discovering that situation is very low, as upgrades usually happen without user interaction and without any notice. It causes trouble for someone relying on the synchronization and noticing a paused folder only after some data is missing, e.g. when used as part of a backup strategy. |
@acolomb It doesn't need to pause the folder. It could simply default to the |
### Purpose Firefox uses the last specified favicon link for bookmarks, but only if it is available on initial page load. Remove the second link and use ng-href to change the icon instead. I'm not really familiar with AngularJS, feel free to offer suggestions for improvements. ### Testing Briefly tested on Firefox 124.0.2 and Chrome 123.0.6312.105.
### Purpose In preparation for quic-go v0.43.0. see quic-go/quic-go#4455
This sounds way too complicated and magical for the quite niche risk it tries to prevent. Note that unwanted encoded filenames can only spread after manual changes (changing to the fat encoder) and then downgrading or switching back the encoder. And we are not looking at data loss or data corruption, only at duplicated files or conflicts*, which a simple script could detect and fix. (Documentation could include such a script, in case people are in need of it.) * at least, based on my limited knowledge of how Syncthing works. If this could result in data loss the situation is different. If we are discussing other ways to implement this feature, let me make a different proposal: Let there be two settings:
Selecting an encoder requires the relevant disallowed file name characters to also be checked. If an option to disallow characters is selected, Syncthing will reject incoming files containing those characters and ignore them if it finds them on its disk (unless they are used by an active encoder). In both cases a message will be shown in the GUI and the folder will show as 'out of sync'. The documentation might encourage users to verify everything still syncs correctly after changing one of these settings. Or maybe that is superfluous, changing folder settings always risks breaking the sync. For existing folders we default to no encoder and no characters disallowed. For new folders we could default to disallowing the \xf000 range, in order to make it easier for users to enable the fat encoder. Users could just change the settings if they wanted something different. Some considerations:
If the |
@JanKanis I really appreciate your feedback, and you have some great ideas that I haven't considered before. Perhaps we move this discussion to a wiki page, where we come up with a design document, and see if it gets any traction. If it does, I am willing to scrap this PR, and steal its best bits on a new one. |
@rasa Sounds fine to me. While I am a developer, I don't know anything about Syncthing development procedures. I just ended up at this PR because I want to sync files from linux to android that contain reserved characters. I just had a look at the wiki and it looks like design work happens there, so go ahead. |
#9528) ### Purpose Adds a new metric `syncthing_connections_active` which equals to the amount of active connections per device. Fixes #9527 <!-- Describe the purpose of this change. If there is an existing issue that is resolved by this pull request, ensure that the commit subject is on the form `Some short description (fixes #1234)` where 1234 is the issue number. --> ### Testing I've manually tested it by running syncthing with these changes locally and examining the returned metrics from `/metrics`. I've done the following things: - Connect & disconnect a device - Increase & decrease the number of connections and verify that the value of the metric matches with the amount displayed in the GUI. ### Documentation https://github.com/syncthing/docs/blob/main/includes/metrics-list.rst needs to be regenerated with [find-metrics.go](https://github.com/syncthing/docs/blob/main/_script/find-metrics/find-metrics.go) ## Authorship Your name and email will be added automatically to the AUTHORS file based on the commit metadata. --------- Co-authored-by: Jakob Borg <jakob@kastelo.net>
Based on user request from Weblate, user @Scrambled777. Looks promising based on the profile: https://hosted.weblate.org/user/Scrambled777/
…to fs/add-encoder
@JanKanis Sorry for the long delay. I finally got the initial draft posted to #9539 . Let me know your thoughts. I know it's different from what you described, but I think it captures the essence of your ideas: only two encoders, and allowing the FAT encoder to re-encode if needed (though using a different encoding scheme, as I thought long and hard about your idea that "For new folders we could default to disallowing the \xf000 range". But in the end, I think if a user wants this functionality they can add If you and others give me the go ahead, I will close this PR, and start a new one, since the logic is going to be much simpler. Thanks again for the excellent feedback! /cc @calmh @AudriusButkevicius @imsodin @bt90 @tomasz1986 |
@rasa Nice writeup. I would like to make some comments, but I can't edit the wiki due to not being a member of the Syncthing github organisation. How about you open up the wiki for public editing? |
@JanKanis I don't know if I have the power to do that, or if the gods would look kind on me if I did. So I moved it to #9539 instead. If you'd rather I move it to a Google Doc, so you can feedback inline, lemme know. Thanks again for your help flushing this out. |
Purpose
Many commonly used filesystems do not allow certain characters or names in file and directory names. For example, the NTFS, exFAT, FAT32 and reFS filesystems do not allow the characters
/\"*:<>?|
in the file name or directory name component of a path. On Windows. reserved filenames such asCON
andNul.txt
, and filenames ending with a period, or space are also not allowed. Syncthing currently rejects these files causing out-of-sync errors.Windows Subsystem for Linux (WSL), Cygwin, Msys2, git-bash, Linux's CIFS driver, and other similar systems solve this issue by encoding these characters, to allow filenames containing these characters to be saved to disk. They do this by replacing the reserved characters with UNICODE characters in the private use area (\uf000-\uf0ff). This conversion system was first developed in Interix over 25 years ago.
This PR adds an advanced folder option titled "Encoder Type" which has three options:
passthrough
,fat
, andstandard
.passthrough
encoder sends and receives encoded filenames without decoding them. This is how Syncthing currently works, and is the default option.fat
option encodes as described above.standard
option does not encode, it only decodes. While thestandard
encoder doesn't encode, it ignores all other encoded filenames, in case a user sets the encoder type, and then sets it back tostandard
.Here's a table of the different encoders:
passthrough
fat
"*:<>?|
,\x01-\x1f
standard
Windows reserved filenames
If this PR is accepted, we may want to address Windows reserved filenames such as
CON
andNul.txt
, and filenames ending with a period, or space, via two new encoders:windows
"*:<>?|
, trailing spaces and periods,\x01-\x1f
con
encoded asco\uf06e
)Reserved filenames are incompatible with WSL/Cygwin/git-bash/Msysw, etc, as they are encoded, whereas WSL saves reserved filenames as is.
wsl
"*:<>?|
,\x01-\x1f
con
saved ascon
via\\?\
prefix)On Windows, reserved filenames cannot easily be opened in File Explorer, CMD, etc. (The user would need to prefix the full path with
\\?\
).Unencoded characters
None of the encoders encode NUL (
\x00
). While some filesystems, such as HFS+ allow file and directory names to contain slash (/
) characters, none of the encoders in this PR encode this character. On Windows, this PR does not encode file and directory names containing backslashes (\
). These unencoded characters will result in out-of-sync errors on systems that don't allow these characters.Testing
All the tests pass.
go vet
is clean.go lint
is clean.go fmt
produces no output.I have tested it with Windows, Linux, Android 11, macOS and iOS clients pushing changes to a Windows client. Directory listings showed the correct filenames in WSL, Cygwin, Msys2, and git-bash environments. I also created filenames containing reserved characters in WSL, Cygwin, Msys2 and git-bash, and these transferred successfully to Windows, Linux, Android 11, macOS, and iOS clients.
Documentation
If this PR is accepted, I will draft a PR to update the documentation.
Testing
To test this PR, simply switch to this branch on all nodes. The
passthrough
encoder is enabled by default.To test the other encoders, all nodes need to be set to either
standard
orfat
. Typically, unix like systems will usestandard
and Windows (NTFS), and FAT-based systems such as Android, will usefat
.While
standard
can't be used on FAT/NTFS-based systems without errors,fat
can be used on any system, but it makes little sense, if the underlying file system is not FAT/NTFS-based.Authorship
My name and email are already in the AUTHORS file.
Links
For reference, see:
https://cygwin.com/cygwin-ug-net/using-specialnames.html
http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx
https://en.wikipedia.org/wiki/Filename#In_Windows
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file
For implementations, see
https://github.com/mirror/newlib-cygwin/blob/fb01286fab9b370c86323f84a46285cfbebfe4ff/winsup/cygwin/path.cc#L435
https://github.com/billziss-gh/winfsp/blob/6e3a8f70b2bd958960012447544d492fc6a2f1af/src/shared/ku/posix.c#L1250
https://github.com/torvalds/linux/blob/master/fs/cifs/cifs_unicode.h#L27