Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check file names for conflicts on Windows #3810

Closed
wants to merge 13 commits into from

Conversation

Unrud
Copy link
Contributor

@Unrud Unrud commented Dec 17, 2016

Purpose

On Windows different paths can point to the same file. This is the case for paths with different casing (e.g. foo and FOO) or short names (e.g. foo.barbaz and FOO~1.BAR).
This causes unexpected complications when syncing with non-Windows systems and can lead to inadvertently deleted files.
Furthermore it causes security issues.

External tools like syncthing-inotiify can wreak havoc by passing conflicting names to syncthing.

This PR adds checks to avoid this conflicts.

var data syscall.Win32finddata
handle, err := syscall.FindFirstFile(pathp, &data)
if err == syscall.ERROR_FILE_NOT_FOUND {
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaks handle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If err != nil no valid handle is returned

path = filepath.Join(path, part)
pathp, err := syscall.UTF16PtrFromString(path)
if err != nil {
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to understand under what circumstances this fails, and potentially bubble the error up if the failure is not an actual conflict.

@calmh
Copy link
Member

calmh commented Dec 17, 2016

Does this really avoid the issue though? Should not normalization also happen in the scanner on the Windows side so that we don't get these entries in the index to start with from there?

@Unrud
Copy link
Contributor Author

Unrud commented Dec 17, 2016

Should not normalization also happen in the scanner on the Windows side

This is not doing any normalization, it just checks if Windows resolves the file names as we expect. (e.g. don't open foo/bar when the remote site requested FOO/bar or .stignore when an atacker requests STIGNO~1)

so that we don't get these entries in the index to start with from there?

The scanner enumerates the folder contents with FindFirstFile and FindNextFile, the returned file names are already safe.
I guess we could check if the sub folder paths that are passed to internalScanFolderSubdirs are conflict-free. But I don't think that this is a big issue.

@Unrud
Copy link
Contributor Author

Unrud commented Dec 17, 2016

Does this really avoid the issue though?

I think so. For good measure I also added the checks to internalScanFolderSubdirs just in case an external tool like syncthing-inotify does something strange.

@AudriusButkevicius
Copy link
Member

AudriusButkevicius commented Dec 17, 2016

I'd argue that we should expand the paths and then do the checks if its supposed to be read or not.

@Unrud
Copy link
Contributor Author

Unrud commented Dec 18, 2016

I'd argue that we should expand the paths and then do the checks if its supposed to be read or not.

You mean in internalScanFolderSubdirs? It's a very rare case that an external tool sends a short path or wrong casing (otherwise there would be a lot of bug reports like #3800) and we can't just expand the paths from external tools.

E.g. the file FOOB~1 (a long path that looks like a short path) gets renamed to foobar . Inotify sends the sub paths FOOB~1 and foobar. We expand FOOB~1 to foobar. We missed the deletion of FOOB~1.

E.g. the file FOOB~1 (a long path that looks like a short path) and the file foobar exist and FOOB~1 gets deleted. Inotify sends the sub path FOOB~1. We expand FOOB~1 to foobar. We scan foobar (no harm down) but miss the deletion of FOOB~1.

@AudriusButkevicius
Copy link
Member

I am perfectly fine missing look alike short paths.

@canton7
Copy link
Member

canton7 commented Dec 18, 2016

E.g. the file FOOB~1 (a long path that looks like a short path) gets renamed to foobar . Inotify sends the sub paths FOOB~1 and foobar.

In reality, this appears as a change in the parent directory containing both (and even if it doesn't, the inotify implementations aggregate changes into a common parent), so (almost?) all inotify implementations will send a change to the directory containing both FOOB~1 and foobar.

E.g. the file FOOB~1 (a long path that looks like a short path) and the file foobar exist and FOOB~1 gets deleted. Inotify sends the sub path FOOB~1. We expand FOOB~1 to foobar.

This can't happen.

C:\Users\Antony\tmp>mkdir FOOBAR~1

C:\Users\Antony\tmp>mkdir foobarbaz

C:\Users\Antony\tmp>dir /x
 Volume in drive C has no label.
 Volume Serial Number is 6C48-4D99

 Directory of C:\Users\Antony\tmp

18/12/2016  12.04 pm    <DIR>                       .
18/12/2016  12.04 pm    <DIR>                       ..
18/12/2016  12.04 pm    <DIR>          FOOBAR~2     foobarbaz
18/12/2016  12.03 pm    <DIR>                       FOOBAR~1
               0 File(s)              0 bytes
               4 Dir(s)  26,276,069,376 bytes free

C:\Users\Antony\tmp>rmdir FOOBAR~1

C:\Users\Antony\tmp>dir /x
 Volume in drive C has no label.
 Volume Serial Number is 6C48-4D99

 Directory of C:\Users\Antony\tmp

18/12/2016  12.04 pm    <DIR>                       .
18/12/2016  12.04 pm    <DIR>                       ..
18/12/2016  12.04 pm    <DIR>          FOOBAR~2     foobarbaz
               0 File(s)              0 bytes
               3 Dir(s)  26,276,069,376 bytes free

@Unrud
Copy link
Contributor Author

Unrud commented Dec 18, 2016

I am perfectly fine missing look alike short paths.

I don't see what the problem is with scanning the parent when we are in doubt. We will end up with both folders in the database until it gets cleared on the next full scan.

If you want I can also move the checks into the scanner and just handle conflicting paths as non-existing. It's the same outcome but less expensive as we don't rescan the whole parent.

In reality, this appears as a change in the parent directory containing both

I only tested this on Linux, it might be different on Windows. If I rename the folder foo to bar, the sub paths [bar foo] get passed to the scanner. Anyway that is really not something we should rely on.

and even if it doesn't, the inotify implementations aggregate changes into a common parent

It doesn't aggregate two changes in one directory.

This can't happen.

Right, that example is wrong.

@Unrud
Copy link
Contributor Author

Unrud commented Dec 21, 2016

I am perfectly fine missing look alike short paths.

I moved the checks into the scanner, it now just skips subs that are conflicting or traverse a Symlink. This solution is much simpler.

I also added a check that I previously missed, conflicting paths are no correctly marked as deleted in the database.

If I rename the folder foo to bar, the sub paths [bar foo] get passed to the scanner.

I now also tested it on Windows and the behavior is the same as on Linux

@calmh
Copy link
Member

calmh commented Dec 21, 2016

What happens when I rename a directory foo to FOO on Windows with this in place?

@Unrud
Copy link
Contributor Author

Unrud commented Dec 21, 2016

The folder FOO and it's content gets added to the database. The database entry foo and its "children" get updated as deleted.

Without this PR the folder FOO and it's content gets added to the database, but foo doesn't get updated, because Syncting tests the existence with Lstat("foo") but Windows really does Lstat("FOO").
I tested this between two Windows machines. The initial state is, that the folder foo exists on both machines. After renaming foo to FOO on one mashine, the global index contains 2 folders and foo doesn't get renamed on the second machine. If you also throw Linux into the mix, it gets the folders FOO and foo.

With this PR it behaves as expected.

@AudriusButkevicius
Copy link
Member

So the only thing that bugs me about this PR is the fact that we just add another case where we fail..
I'd rather we did some hack to work around the issue, and sacrifice something (such as not being able to sync 8.3 look-a-like folders) for the sake of keeping things in sync.

@Unrud
Copy link
Contributor Author

Unrud commented Dec 21, 2016

So the only thing that bugs me about this PR is the fact that we just add another case where we fail..

If you have the folders foo and FOO on a Linux machine, you can't sync both to Windows. I don't see a way around this, you just can't have both folders on Windows.

The current behavior is that it mixes the folders foo and FOO together on Windows (at least when Syncthing thinks that foo and FOO exist) and then syncs the mess back to Linux. With this PR it only syncs FOO (or whatever folder already exists) to Windows and shows a error for the other.

such as not being able to sync 8.3 look-a-like folders

This problem is solved by the same solution as the casing problem.
According to the Windows documentation short names don't follow a consistent scheme across file systems. Meaning you can't reliably detect/filter them beforehand.

Important: There is a problem with updating on Windows. If someone changed the casing of a folder on one Windows machine (e.g. renamed foo to Foo). As I described above Syncthing now thinks that foo and Foo exist. On other synced Windows machines the folder is still called foo but Syncthing also thinks that Foo and foo exist. With this PR Syncthing correctly detects that foo doesn't exist on the first machine and that Foo doesn't exist on the other machines. The result of this is that the first machine sends an update where foo is deleted and the others send an update where Foo is deleted.
The local database needs to be invalidated on Windows and the folders need to be fully re-scanned.
Edit: It would also be enough to scan the local index for conflicting entries (that are not marked as deleted) and remove them.

@calmh
Copy link
Member

calmh commented Dec 21, 2016

How about a couple of tests on the check conflict stuff? Only running on Windows is fine for now, although we should ideally do the same on Mac in the long term.

@canton7
Copy link
Member

canton7 commented Dec 21, 2016

So the only thing that bugs me about this PR is the fact that we just add another case where we fail..

I think it's more that we previously had a number of cases where we fail, and we've reduced that set. Yes there are still failure cases, but they're a subset of what we had before.

That's got to be a good thing, right? (Someone tell me to shut up if I'm wrong).

@calmh
Copy link
Member

calmh commented Dec 21, 2016

If this actually solves the case only rename case on Windows and fails more cleanly in some other screwed up scenarios that's good enough for me. I still want tests though.

@calmh
Copy link
Member

calmh commented Dec 22, 2016

@st-review lgtm

It would be sweet if someone else who understands windows and is sober would review this as well before merging.

@st-review
Copy link

@calmh: Noted! Need another LGTM or explicit merge command.

@AudriusButkevicius
Copy link
Member

@st-review merge

lib/model: Handle filename conflicts on Windows.

@st-review
Copy link

👌 Merged as 01e50eb. Thanks, @Unrud!

@st-review st-review closed this Dec 22, 2016
st-review pushed a commit that referenced this pull request Dec 22, 2016
@Unrud
Copy link
Contributor Author

Unrud commented Dec 23, 2016

Important: There is a problem with updating on Windows.

Have you read that part?

@AudriusButkevicius
Copy link
Member

I think its not any worse than it is now, case only renames on windows didn't work previously.

@Unrud
Copy link
Contributor Author

Unrud commented Dec 23, 2016

I think its not any worse than it is now, case only renames on windows didn't work previously.

But it will delete the files. If we remove the conflicting entries from the local index or invalidate the whole database, it will only show a sync conflict.
Anyway, I just wanted to make sure that you didn't miss my warning.

@AudriusButkevicius
Copy link
Member

Well this hasn't been released yet, so you can write a db migration

@calmh
Copy link
Member

calmh commented Dec 23, 2016

Won't all this break spectacularly if there are two Windows machines currently syncing, with one having a tree rooted in Foo and the other having the same tree rooted in foo? They would currently mostly be happily syncing, as you'd expect since Windows is case insensitive. With this in place that's no longer possible, regardless of migrations or index wipes?

@Unrud
Copy link
Contributor Author

Unrud commented Dec 23, 2016

Won't all this break spectacularly if there are two Windows machines currently syncing, with one having a tree rooted in Foo and the other having the same tree rooted in foo?

That's exactly what I warned about.

regardless of migrations or index wipes?

If the wipe the index or remove conflicting entries, it will be less spectacular. No files will be deleted, but it will show a sync conflict until the user manually renamed the folder on one machine.

Another (without requiring user action) way would be to search the database for conflicting entries and decide for one in a deterministic manner (e.g. lexicographic order).
E.g. we found foo, Foo and FOO are conflicting and decide for FOO.
Next we check if the folder/file foo or Foo exist and rename it to FOO.
All machines are now in the same state and syncing works fine.

@calmh
Copy link
Member

calmh commented Dec 23, 2016

Right. I missed that consequence, sorry. I'll revert & reopen this as that's dangerous as it is, and merge with the solution in place.

@calmh calmh reopened this Dec 23, 2016
@Unrud
Copy link
Contributor Author

Unrud commented Dec 23, 2016

merge with the solution in place.

What solution do you like more? Wiping the index and showing conflicts (and letting the user decide) or automatically deciding for one of the conflicting names (e.g. Foo or foo).
Is there a place for database migrations?

@calmh
Copy link
Member

calmh commented Dec 26, 2016

Database migrations are handled a bit half-heartedly. There is a config version, which gets updated by migrations, see lib/config/config.go. Other code can trigger on the version change, typically in cmd/syncthing/main.go, there is an example there that drops delta indexes based on migrating from config version 15 that you can see as an example. The actual change, whatever it is, would need to live in lib/db.

As for what to do... Wiping the index is rather unsafe, especially if the cluster is out sync at the time.

For a cluster of Windows devices, the proper action seems to me like it would be to resolve the setup into a consistent casing; that is, correct on-disk casing to in-index casing and we're back in sync. If we have Foo, Foo/bar and foo/baz we'd correct the directory name to Foo whatever it was before, Foo/bar is already OK and foo/baz would need to be updated in the index as Foo/baz.

When there are case sensitive machines involved as well this is less straightforward. I'm not sure there is an automated solution that properly covers it. Of course, there is no way to tell if a cluster is composed of only case sensitive devices, only case insensitive devices, or a mix.

This brings me back to thinking that we should only support case insensitive file handling by default, meaning that there could not exist both Foo/baz and foo/baz in the index. (#2739)

@Smiley73
Copy link
Contributor

Can the case insensitivity be made a configuration flag? Otherwise I would be concerned about this possibly breaking Unix systems.

@Unrud
Copy link
Contributor Author

Unrud commented Dec 27, 2016

I've added an implementation of CheckNameConflict and tests for Unix. If Syncthing runs on Linux and syncs a folder that lives on FAT or NTFS paths are case-insensitive and short names exist. The file system on Android is also mounted case-insensitive.

Database migrations are handled a bit half-heartedly.

Access to the folder is required, otherwise checking for name conflicts is not possible. We need to keep track of the migration status for each folder, because a folder might be unhealthy at the moment and can only be migrated later.

This brings me back to thinking that we should only support case insensitive file handling by default,

This doesn't solve short names, this PR or something similar is required anyway. Migrating will also be the same because someone might already have short names in the database, which need to be cleaned up in order to prevent data loss. (Although that problem is easier to ignore.)

Unrud added 13 commits January 19, 2017 12:44
Protect against scanning of short names on Windows. (maybe caused syncthing#3800)
Protect against the possibility of traversing Symlinks.
Prevent scanner from following Symlinks and scanning the contents of colliding paths on Windows (e.g. scanning "foo" for the sub path "Foo")
Make CheckNameConflict system-independent and move system-dependent code into function FindRealFileName.
Prevent deletion of files with conflicting names in the cluster
@Unrud
Copy link
Contributor Author

Unrud commented Jan 19, 2017

I've added a version number for each folder to the database to keep track of the individual update status. The folder gets updated before the first scan when the folder is healthy.
As for now I just marked files that don't exist but have conflicting names as invalid to prevent the deletion of files in the cluster.
Automatic resolution of the conflicts will be added later, although I'm not sure if it's a good idea to just select one of the conflicting casings. The user renamed the files for a reason and might not be very happy if we randomly undo this changes.

Another problem is that we have to filter conflicting files from indexes that are sent to previous versions of syncthing.

@calmh
Copy link
Member

calmh commented Jun 25, 2017

To be continued, I guess

@calmh calmh closed this Jun 25, 2017
@Unrud
Copy link
Contributor Author

Unrud commented Jun 25, 2017

To be continued, I guess

Was the problem fixed in another manner?

@calmh
Copy link
Member

calmh commented Jun 25, 2017

The problem is still present. The proper solution is what is so far lacking, I think.

@st-review st-review added the pr-merged Legacy label used in the past for pull requests merged to the main tree label Jan 15, 2018
@st-review st-review added the frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion label Jul 15, 2018
@syncthing syncthing locked and limited conversation to collaborators Jul 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Issues closed and untouched for a long time, together with being locked for discussion pr-merged Legacy label used in the past for pull requests merged to the main tree
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants