New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check file names for conflicts on Windows #3810
Conversation
var data syscall.Win32finddata | ||
handle, err := syscall.FindFirstFile(pathp, &data) | ||
if err == syscall.ERROR_FILE_NOT_FOUND { | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaks handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If err != nil
no valid handle is returned
path = filepath.Join(path, part) | ||
pathp, err := syscall.UTF16PtrFromString(path) | ||
if err != nil { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to understand under what circumstances this fails, and potentially bubble the error up if the failure is not an actual conflict.
Does this really avoid the issue though? Should not normalization also happen in the scanner on the Windows side so that we don't get these entries in the index to start with from there? |
This is not doing any normalization, it just checks if Windows resolves the file names as we expect. (e.g. don't open
The scanner enumerates the folder contents with FindFirstFile and FindNextFile, the returned file names are already safe. |
I think so. For good measure I also added the checks to |
I'd argue that we should expand the paths and then do the checks if its supposed to be read or not. |
You mean in E.g. the file E.g. the file |
I am perfectly fine missing look alike short paths. |
In reality, this appears as a change in the parent directory containing both (and even if it doesn't, the inotify implementations aggregate changes into a common parent), so (almost?) all inotify implementations will send a change to the directory containing both FOOB~1 and foobar.
This can't happen.
|
I don't see what the problem is with scanning the parent when we are in doubt. We will end up with both folders in the database until it gets cleared on the next full scan. If you want I can also move the checks into the scanner and just handle conflicting paths as non-existing. It's the same outcome but less expensive as we don't rescan the whole parent.
I only tested this on Linux, it might be different on Windows. If I rename the folder
It doesn't aggregate two changes in one directory.
Right, that example is wrong. |
I moved the checks into the scanner, it now just skips subs that are conflicting or traverse a Symlink. This solution is much simpler. I also added a check that I previously missed, conflicting paths are no correctly marked as deleted in the database.
I now also tested it on Windows and the behavior is the same as on Linux |
What happens when I rename a directory |
The folder FOO and it's content gets added to the database. The database entry foo and its "children" get updated as deleted. Without this PR the folder FOO and it's content gets added to the database, but foo doesn't get updated, because Syncting tests the existence with Lstat("foo") but Windows really does Lstat("FOO"). With this PR it behaves as expected. |
So the only thing that bugs me about this PR is the fact that we just add another case where we fail.. |
If you have the folders foo and FOO on a Linux machine, you can't sync both to Windows. I don't see a way around this, you just can't have both folders on Windows. The current behavior is that it mixes the folders foo and FOO together on Windows (at least when Syncthing thinks that foo and FOO exist) and then syncs the mess back to Linux. With this PR it only syncs FOO (or whatever folder already exists) to Windows and shows a error for the other.
This problem is solved by the same solution as the casing problem. Important: There is a problem with updating on Windows. If someone changed the casing of a folder on one Windows machine (e.g. renamed foo to Foo). As I described above Syncthing now thinks that foo and Foo exist. On other synced Windows machines the folder is still called foo but Syncthing also thinks that Foo and foo exist. With this PR Syncthing correctly detects that foo doesn't exist on the first machine and that Foo doesn't exist on the other machines. The result of this is that the first machine sends an update where foo is deleted and the others send an update where Foo is deleted. |
How about a couple of tests on the check conflict stuff? Only running on Windows is fine for now, although we should ideally do the same on Mac in the long term. |
I think it's more that we previously had a number of cases where we fail, and we've reduced that set. Yes there are still failure cases, but they're a subset of what we had before. That's got to be a good thing, right? (Someone tell me to shut up if I'm wrong). |
If this actually solves the case only rename case on Windows and fails more cleanly in some other screwed up scenarios that's good enough for me. I still want tests though. |
@st-review lgtm It would be sweet if someone else who understands windows and is sober would review this as well before merging. |
@calmh: Noted! Need another LGTM or explicit merge command. |
@st-review merge lib/model: Handle filename conflicts on Windows. |
GitHub-Pull-Request: #3810 LGTM: calmh
Have you read that part? |
I think its not any worse than it is now, case only renames on windows didn't work previously. |
But it will delete the files. If we remove the conflicting entries from the local index or invalidate the whole database, it will only show a sync conflict. |
Well this hasn't been released yet, so you can write a db migration |
Won't all this break spectacularly if there are two Windows machines currently syncing, with one having a tree rooted in |
That's exactly what I warned about.
If the wipe the index or remove conflicting entries, it will be less spectacular. No files will be deleted, but it will show a sync conflict until the user manually renamed the folder on one machine. Another (without requiring user action) way would be to search the database for conflicting entries and decide for one in a deterministic manner (e.g. lexicographic order). |
Right. I missed that consequence, sorry. I'll revert & reopen this as that's dangerous as it is, and merge with the solution in place. |
What solution do you like more? Wiping the index and showing conflicts (and letting the user decide) or automatically deciding for one of the conflicting names (e.g. |
Database migrations are handled a bit half-heartedly. There is a config version, which gets updated by migrations, see lib/config/config.go. Other code can trigger on the version change, typically in cmd/syncthing/main.go, there is an example there that drops delta indexes based on migrating from config version 15 that you can see as an example. The actual change, whatever it is, would need to live in lib/db. As for what to do... Wiping the index is rather unsafe, especially if the cluster is out sync at the time. For a cluster of Windows devices, the proper action seems to me like it would be to resolve the setup into a consistent casing; that is, correct on-disk casing to in-index casing and we're back in sync. If we have When there are case sensitive machines involved as well this is less straightforward. I'm not sure there is an automated solution that properly covers it. Of course, there is no way to tell if a cluster is composed of only case sensitive devices, only case insensitive devices, or a mix. This brings me back to thinking that we should only support case insensitive file handling by default, meaning that there could not exist both |
Can the case insensitivity be made a configuration flag? Otherwise I would be concerned about this possibly breaking Unix systems. |
I've added an implementation of CheckNameConflict and tests for Unix. If Syncthing runs on Linux and syncs a folder that lives on FAT or NTFS paths are case-insensitive and short names exist. The file system on Android is also mounted case-insensitive.
Access to the folder is required, otherwise checking for name conflicts is not possible. We need to keep track of the migration status for each folder, because a folder might be unhealthy at the moment and can only be migrated later.
This doesn't solve short names, this PR or something similar is required anyway. Migrating will also be the same because someone might already have short names in the database, which need to be cleaned up in order to prevent data loss. (Although that problem is easier to ignore.) |
Protect against scanning of short names on Windows. (maybe caused syncthing#3800) Protect against the possibility of traversing Symlinks.
This reverts commit bf3da33.
Prevent scanner from following Symlinks and scanning the contents of colliding paths on Windows (e.g. scanning "foo" for the sub path "Foo")
Make CheckNameConflict system-independent and move system-dependent code into function FindRealFileName.
Prevent deletion of files with conflicting names in the cluster
I've added a version number for each folder to the database to keep track of the individual update status. The folder gets updated before the first scan when the folder is healthy. Another problem is that we have to filter conflicting files from indexes that are sent to previous versions of syncthing. |
To be continued, I guess |
Was the problem fixed in another manner? |
The problem is still present. The proper solution is what is so far lacking, I think. |
Purpose
On Windows different paths can point to the same file. This is the case for paths with different casing (e.g. foo and FOO) or short names (e.g. foo.barbaz and FOO~1.BAR).
This causes unexpected complications when syncing with non-Windows systems and can lead to inadvertently deleted files.
Furthermore it causes security issues.
External tools like syncthing-inotiify can wreak havoc by passing conflicting names to syncthing.
This PR adds checks to avoid this conflicts.