New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/fs: Add encoder filesystem (fixes #1734) #7828
Conversation
I have a few questions regarding the behaviour of such files in Windows. I have not tested the actual PR, so some of them may be a non-issue.
|
@tomasz1986 All good questions. Thanks for asking.
I renamed to and from filenames with reserved characters on each platform, and saw no problems, other than on my Android client, which allows for spaces and dots at the end of a names, but doesn't like the reserved characters, for some reason, on its root file system. Moving files works as expected, with the above proviso. Deleting works as expected.
Yes. The only issue is that the 0xf0xx characters display funny in CMD, and File Explorer. Of course, if you installed a font that populated these 0xf0xx characters with glyphs, this display issue would be solved.
Yes, I've only tested on NTFS so far. According to https://en.m.wikipedia.org/wiki/Comparison_of_file_systems#Limits this should work on ReFS. It's not clear if it will work on exFAT, FAT32, or other filesystems. I will test exFAT and FAT32 (on Windows) and report back. If it does work on exFAT, it would be trivial to enable this fix on any platform that can mount exFAT partitions. BTW, I haven't tested on macOS, but will soon. |
What happens if someone syncs I suspect this will cause data loss, hence any form of translation is lossy. Also, I'd expect (have not gone through the code in detail) that this causes ping pong, namely, I sync |
That would be an issue, if
You're right. This PR should be fixed to flag received files containing 0xf0xx characters as invalid, This could easily happen if someone turns on the "Auto Fix Invalid Files" switch, and then turns it off after syncing files. These 0xf0xx characters would then be synced as is, causing duplicate files to appear, but no data loss that I can see (as they really are the same file, in a sense).
I haven't seen that yet, but it's certainly possible. I will report any issues found. For now, let's consider this PR a draft, until I test on exFAT, FAT32, macOS, and address syncing files with 0xf0xx characters issue noted above. Thanks for the feedback. |
This is an interesting PR. It has some promise. It seems like names with private-space characters must be rejected on the disk side when this option is disabled (scanning error), and similarly rejected on the network side when the option enabled (as other invalid file names are today), as these can't be represented on the local filesystem any more. This might mean we can no longer sync files with such names in a regular setup, something that might annoy existing users.
This might in fact be to everyone's advantage, as the user sees it, thinks "what the hell?" and fixes the filename to something portable. :) The display might be even worse if the Windows box isn't running UTF-8 at all. (I also think the naming leaves something to be desired, but that can be discussed once it's technically sound.) |
OK, so I tested it, and it works fine in Windows on exFAT and FAT32 partitions. It also works fine on macOS, on a APFS filesystem. The only issue (other than my Android not creating files with I will now update this PR to flag received files that contain 0xf0xx characters as invalid. We should decide if we'll want to allow users to use the Private Use characters 0xf000-0xf07f for some other use. If so, we could add another config option to enable this (which would toggle off the "Auto Fix Invalid Files" config option.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to reject items with private unicode chars when scanning without this option enabled? If it shares such items with a device using this option, that other device will just reject them.
@imsodin :
The problem comes when a user enables this option, shares some files, and then disables the option. Here's an example:
The quick fix is for device A to simply ignore all files containing \uf00xx characters when the option is disabled. This is effectively the old functionality, where the filesystem simply failed writing "B>A", leading to an out of sync error. Of course, the duplicate file issue can still appear if the user downgrades the client on device A. |
After seeing how the different filesystems handle filenames, how does this sound as a plan:
In the UI, we show:
And an input field that accepts:
The default option would be The The |
A NewEncoderFilesystem ensures that paths that contain characters that are reserved on filesystems such as NTFS can be safety stored. It does this by replacing the reserved characters with UNICODE characters in the private use area (\uf000-\uf07f). This conversion is compatible with Cygwin, Git-Bash, Msys2, Windows Subsystem for Linux (WSL), and other platforms. For reference, see: https://cygwin.com/cygwin-ug-net/using-specialnames.html http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx https://en.wikipedia.org/wiki/Filename#In_Windows https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file For implementations, see https://github.com/mirror/newlib-cygwin/blob/fb01286fab9b370c86323f84a46285cfbebfe4ff/winsup/cygwin/path.cc#L435 https://github.com/billziss-gh/winfsp/blob/6e3a8f70b2bd958960012447544d492fc6a2f1af/src/shared/ku/posix.c#L1250 Signed-off-by: Ross Smith II <ross@smithii.com>
I would mark this PR as draft, if I could, as it needs a lot more testing, though it would be safe to deploy as it defaults to the current functionality, unless the user changes an advanced setting. The code works running on Windows, and syncing to and fro with Windows, Liunx, Android, MacOS, and iOS clients. But I am seeing errors when I restart Syncthing. It seems some code paths use a BasicFilesystem, instead of a EncoderFilesystem. I will delve into this, and see if I can pinpoint the issues. Also, I need to create tests for the android, plan9, ios, and safe encoders. I also need to rewrite the windows encoder test to pass on non-Windows systems, as users can mount exFAT partitions on any OS. And I need to test it on other OSs, and especially the All feedback is appreciated. |
Wow, you are really moving ahead quickly :) That's also why
is lagging behind. There's a lot to unpack in the implementation and while I think I get it conceptionally it still needs thinking about the consequences of the behaviour. It adds complexity both to the implementation and user scenarios. Currently things are simple: We do whatever works, and have a few conditions where we say "this doesn't work" if the result isn't desirable (mostly windows being windows :) ). I do think the mechanism quite nicely compartmentalizes all the complexity. My recommendation: Don't spend too much time on polish and testing all the corner cases until "concept review" happened, i.e. there's a general consensus that the way chosen here is good to go. |
I think this is slowly getting out of hand, because something that was a few tens of lines has now got to nearly 2k lines. As suggested, you should wait for feedback before investing more time, just in case you are investing your time into something that is going the wrong way. |
I have significant concerns with this. It has expanded somewhat in scope since I looked. Some current thoughts, not necessarily things that are broken with the current approach but that need addressing one way or the other at some point.
So there's a lot to unpack here. Given that this affects interoperability between systems in various ways the testing matrix quickly becomes an n-dimensional beast that we'll never be sure of. I think for this to have a chance to live it needs to be short, sweet, and unambiguously correct in all the simple cases we can dream up. That means starting with the simplest thing that could possibly work. Lots of different encoders and autodetection and stuff is probably the antithesis of that. |
Thank you all for the valuable feedback. Let's close this PR, as it was always just a proof-of-concept/draft PR. If I can develop something that actually works, that is worthy of review, and that incorporates the feedback above, I will create another PR. To respond to @calmh's feedback:
The system where the functionality was turned off will then share the encoded files, and receiving systems that have more accommodating filesystems, such as ext4, will see duplicate files. This can be addressed by having the basicFS reject encoded files, which is effectively how things work now: files that can't be stored on the underlying filesystem are simply rejected.
The various encoders address the limitations imposed by the filesystem and the OS. For example, On the Android OS, on a exFAT/FAT32 filesystem, files with
That won't happen, as the filename decoder is shared across all encoders:
Sorry, I can't answer that. See next comment.
That's a good question. Forgive me for asking a follow up: If a user turns on encryption, will an external tool see the original unencoded filename? If not, then does it make sense for external wrappers/tools to be looking at our file store? If so, how is file-content encryption any different than file-name encoding? In a sense, this PR simply "encrypts" the filenames to hide their names from legacy-hobbled OSes, using an industry standard filename-encoding scheme (Microsoft in WSL, etc).
Wow, that certainly sounds ominous. In response, I would say that Interix solved the interoperability issue of Windows failing to host Linux files 25+ years ago using this encoding hack, so it's a tried and tested solution, that numerous systems have since implemented, such as WSL. Implementing this logic in Syncthing will simply add it to the growing list of environments that support this hack.
Ok, stay tuned. Thanks to all again for your critical, yet non-discouraging, feedback. |
See #7876 |
Superseded by #7876
Purpose
A NewEncoderFilesystem ensures that paths that contain characters that are reserved on certain filesystems (such
"*:<>?|
as NTFS, exFAT, etc.) can be safety stored. It does this by transparently replacing the reserved characters with UNICODE characters in the private use area (\uf000-\uf07f). This conversion is compatible with Cygwin, Git-Bash, Msys2, Windows Subsystem for Linux (WSL), and other platforms.Testing
All the tests pass.
go vet
is clean.go lint
is clean.go fmt
produces no output.I have tested it with Linux, Android, macOS and iOS clients pushing changes to a Windows client. Directory listings showed the correct filenames in WSL, Cygwin, Msys2, and git-bash environments. I also created filenames containing reserved characters in WSL, Cygwin, Msys2 and git-bash, and these transferred successfully to Linux, Android, macOS, and iOS clients.
On my Android fuse filesystem, files with
<>:"|?*
are rejected, but files ending in dot, or space are accepted.Documentation
If this PR is accepted, I will submit a PR to update the documentation.
Authorship
My name and email are already in the AUTHORS file.
Links
For reference, see:
https://cygwin.com/cygwin-ug-net/using-specialnames.html
http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx
https://en.wikipedia.org/wiki/Filename#In_Windows
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file
For implementations, see
https://github.com/mirror/newlib-cygwin/blob/fb01286fab9b370c86323f84a46285cfbebfe4ff/winsup/cygwin/path.cc#L435
https://github.com/billziss-gh/winfsp/blob/6e3a8f70b2bd958960012447544d492fc6a2f1af/src/shared/ku/posix.c#L1250