Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block paths which might be misinterpreted as alternate file streams #686

Closed

Conversation

latkin
Copy link

@latkin latkin commented Mar 3, 2016

ref #679

Windows disallows the colon : character in file names. However many
win32 file APIs allow path specifications of the form <file path>:<stuff>
when reading or writing files. These are interpreted as pointing to
the alternate data stream named <stuff> within the <file path> file.

Documentation on alternate data streams:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364404(v=vs.85).aspx

Git for Windows, ignorant of file streams, will incorrectly map a Unix
file named like foo:bar into the bar alternate stream of a file
foo. This results in an unexpected file foo with size 0 in the working
tree, and (depending on core.fscache setting), the expected "foo:bar" file
being flagged as deleted (or maybe not).

It would be preferrable if Git for Windows detected such files and issued
errors, similar to how it does for various other invalid path situations.
This would help reduce pain and make things less confusing for those
working in a mixed Unix/Windows team.

This change adds a check for ':' so that we never accidentally unpack a file
into an alternate stream by accident. Any file path with a ':' is considered
invalid, which is perfectly sensible for the purposes of git.

If such a file is indeed detected and blocked, users can instruct git to
totally ignore it via git update-index --assume-unchanged, just like
they need to today for other invalid path situations.

NB - a determined Windows user can still confuse the system in certain ways
by explicitly creating alternate streams, but that requires exceptional
user effort and is judged to be not worth pursuing at this time.

Signed-off-by: Lincoln Atkinson lincoln.atkinson@hotmail.com

ref git-for-windows#679

Windows disallows the colon `:` character in file names. However many
win32 file APIs allow path specifications of the form `<file path>:<stuff>`
when reading or writing files. These are interpreted as pointing to
the *alternate data stream* named `<stuff>` within the `<file path>` file.

Documentation on alternate data streams:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364404(v=vs.85).aspx

Git for Windows, ignorant of file streams, will incorrectly map a Unix
file named like `foo:bar` into the `bar` alternate stream of a file
`foo`. This results in an unexpected file `foo` with size 0 in the working
tree, and (depending on core.fscache setting), the expected "foo:bar" file
being flagged as deleted (or maybe not).

It would be preferrable if Git for Windows detected such files and issued
errors, similar to how it does for various other invalid path situations.
This would help reduce pain and make things less confusing for those
working in a mixed Unix/Windows team.

This change adds a check for ':' so that we never accidentally unpack a file
into an alternate stream by accident. Any file path with a ':' is considered
invalid, which is perfectly sensible for the purposes of git.

If such a file is indeed detected and blocked, users can instruct git to
totally ignore it via `git update-index --assume-unchanged`, just like
they need to today for other invalid path situations.

NB - a determined Windows user can still confuse the system in certain ways
by explicitly creating alternate streams, but that requires exceptional
user effort and is judged to be not worth pursuing at this time.

Signed-off-by: Lincoln Atkinson <lincoln.atkinson@hotmail.com>
@latkin latkin force-pushed the latkin-block-alternate-streams branch from 4e1a55d to c4a42f2 Compare March 4, 2016 02:31
@kblees
Copy link

kblees commented Mar 4, 2016

I wonder if it would make sense to refuse other special characters as well. E.g. on Unix it is perfectly valid to create a file with a backslash...I haven't checked how Git for Windows deals with that.

Edit: sorry, is_dir_sep already checks for that...I had in mind that the backslash-check was in Git for Windows only...

Other special characters would include glob chars (?*[]), $PATH separators (:;), control chars (ASCII 0 - 31) and perhaps shell operators (<>|&$...).

And maybe make the feature configurable to be more acceptable to upstream folks?

@latkin
Copy link
Author

latkin commented Mar 4, 2016

@kblees this would not be wrong to do, but it's not necessary. Eventually these paths are passed through mingw to Windows file APIs, which can be relied on to return errors if the paths are totally bogus/contain illegal chars.

Issues only arise when the path in question is not valid as a literal file name, but is still acceptable to Windows APIs due to some kind of special encoded format.

e.g. there is already a check for this in the same function I updated

    if (has_dos_drive_prefix(path))
        return 0;

This catches files named like d:foobar.txt, where the path is not valid as a filename, but encodes a valid path on the D:\ drive.

I am merely extending this check to also catch encodings of alternate data streams.

@kblees
Copy link

kblees commented Mar 7, 2016

@kblees this would not be wrong to do, but it's not necessary.

This depends on what you want to achieve with this patch.

If you just want proper error detection on Windows, the additional check should be made Windows-specific (e.g. wrapped in a macro that evaluates to nothing on other platforms, similar to has_dos_drive_prefix).

If, on the other hand, you want to prevent illegal file names on all platforms (which IMO would be a Good Thing in a multi-platform project), the check should be platform independent (as your patch currently is), it should probably be configurable, and it should check for more illegal characters (as Git on Mac/Unix will obviously not be able to call Windows APIs for file name validation).

@latkin
Copy link
Author

latkin commented Mar 7, 2016

All I aim to do with this patch is to handle this one Windows-specific edge case. I didn't realize code from this repo flowed back to non-Windows repos, thank you for raising that.

I've refactored into a macro-style approach with a no-op implementation for non-Windows, matching has_dos_drive_prefix. I hope this is what you had in mind.

I'd be happy to add some tests, but I'm finding it challenging to understand the test system or see where is a sensible place to plug in. It's somewhat difficult, too, as it's not possible to add a file with a problematic filename. Is there a facility for unit-testing?

@dscho
Copy link
Member

dscho commented Mar 8, 2016

Thank you, @latkin, for your contribution!

Since we are a (Windows-specific) downstream project of the Git project, we inherit their rules. And some of those rules are very strict because the Git developers not only care a lot about clean code, but also about a clean history. A lot. 😄

So when the patch that claims to refactor a filename check actually changes the formatting of dozens of comments, it is safe to assume that the Git maintainer would simply reject the patch.

The reason this patch would be rejected is not just arbitrary pettiness, but it is a very good reason: every patch has to be reviewed carefully to ensure that it does not introduce inadvertent regressions. And it is just really, really (and unnecessarily) hard to review a patch that is dominated by formatting changes (that would even violate the coding conventions of Git) and whose actually functional changes are consequently really hard to find.

If you do not believe me how hard it is to see whether your changes contain any bugs, have a look yourself: https://github.com/git-for-windows/git/pull/686/files (GitHub even refuses to show the diff for mingw.c because there are so many changes).

The aim of the Git for Windows project always has been to collect, vet and mature Windows-specific patches with the end goal of contributing the patches upstream. That means if our patches do not conform to the upstream conventions, either I have to clean them up, or I have to ask the contributors to change them to conform to Git's norms.

The good news is that I actually had worked on the problem this Pull Request tries to address. Well, not the alternate file streams, but preventing Git from trying to check out file names that are illegal in DOS file names (whose convention NTFS inherited). The issue came up in the context of preventing anything to be checked out into .git/ by mistake or intent.

I brushed up the patch and pushed it here: master...dscho:dos-filenames

@latkin please have a look and comment.

@latkin
Copy link
Author

latkin commented Mar 8, 2016

Oh sheesh, I did not realize so much formatting had changed, of course you shouldn't have to deal with that. I must have accidentally triggered auto-format in Visual Studio inadvertently. My apologies!

@latkin latkin force-pushed the latkin-block-alternate-streams branch from ad8b721 to 1ee81d7 Compare March 8, 2016 22:12
… Windows

Signed-off-by: Lincoln Atkinson <lincoln.atkinson@hotmail.com>
@latkin latkin force-pushed the latkin-block-alternate-streams branch from 1ee81d7 to e6806a6 Compare March 8, 2016 22:14
@latkin
Copy link
Author

latkin commented Mar 8, 2016

@dscho I have corrected the egregious formatting changes, sorry about that

@dscho
Copy link
Member

dscho commented Mar 9, 2016

@latkin have you had a chance to look at my dos-filenames branch?

@latkin
Copy link
Author

latkin commented Jul 19, 2016

Re: dos-filenames branch - that looks good.

8.3 short names won't always be problematic - an isolated file named ABCDEF~1.TXT can be created and modified without trouble or ambiguity. Problems do arise, though, when ABCDEF~1.TXT and ABCDEFGHI.TXT are next to each other in the same repo. Blocking that risk altogether seems acceptable to me, even if it's not always strictly necessary.

This PR has been untouched for 4 months. Please let me know what else is required for acceptance, or if you aren't interested and would prefer to use the approach in your branch, go ahead and close this out.

@dscho
Copy link
Member

dscho commented Jul 19, 2016

8.3 short names won't always be problematic - an isolated file named ABCDEF~1.TXT can be created and modified without trouble or ambiguity.

The issue is not so much whether you can create it. The issue is whether it can interfere with your collaborators' work trees' files. And a file ABCDEF~1.TXT can very much interfere with an existing file called abcdefghijklmnopqrstuvwxyz.txt.

This PR has been untouched for 4 months.

Sorry, I should have made it clearer that I require some effort on your side, too. I am a maintainer and need to rely on contributors to perform the bulk of the work. Anything else would not scale.

@latkin
Copy link
Author

latkin commented Jul 19, 2016

Yes, I am asking explicitly what is that remaining effort that you require? What still needs to be done here? I am happy to continue volunteering my time to take care of it.

Otherwise, if this isn't a contribution you are interested in, please close it.

@dscho
Copy link
Member

dscho commented Jul 20, 2016

what is that remaining effort

That is easy.

  • As I hinted by providing an alternative patch series, the change presented in this PR is not complete enough. It only excludes file names containing coions. There are many more characters that are illegal in Windows' file names.
  • I specifically asked you to have a look at my patch and comment.
  • The dos-filenames topic branch needs to be split up, and augmented by tests.
  • The dos-filenames branch needs to be rebased to upstream's master branch, for easy submission directly to the upstream Git project.

In other words, I would strongly suggest to take the dos-filenames branch as a starting point, clean it up to the point where it can be submitted to the Git mailing list, and then do exactly that.

By the way, I would like to point out that I suggested to rewrite the history to avoid uglifying verify_path() in the first commit only to rectify that problem (that did not exist before this patch series) in the second commit. This has not happened, and while it is not necessary if going the dos-filenames route I outlined, it would be good to be kept in mind when working on polishing that branch: upstream Git strongly prefers patch series that tell a nice and cohesive story.

@dscho
Copy link
Member

dscho commented Oct 13, 2017

@latkin to be quite honest, I would be really excited and thankful if you continued to work on this.

@latkin
Copy link
Author

latkin commented Oct 13, 2017

Ah, had forgotten about this. I might be able to revisit in the next couple weeks, thanks for the reminder.

@assarbad
Copy link

@latkin

This catches files named like d:foobar.txt, where the path is not valid as a filename, but encodes a valid path on the D:\ drive.

I may be entirely wrong here, but I think d:foobar.txt is perfectly acceptable, at the very least to cmd.exe. But it's possible that this is specific to the command interpreter. Additionally, thinking of ADS, this name is also ambiguous in another way (file or folder in current directory, i.e. simply a relative file name). And yes, ADS can also piggyback on directories (which simply don't have a default data stream) ...

Anyway, my point here is. Suppose you have drives C: and D: and a file D:\baz\foobar.txt. Now suppose you open cmd.exe from the Run Dialog (Win+R) you should end up in your %USERPROFILE% and I will assume that this is C:\Users\Username on a modern Windows (7 and up). Now run the following sequence of commands (I am including the prompt - i.e. the part before the > to show where the command interpreter currently is):

C:\Users\Username> notepad D:foobar.txt

this should give an error in a freshly opened command prompt. Now run the following sequence:

C:\Users\Username> cd D:\baz
C:\Users\Username> notepad D:foobar.txt

That should open the file D:\baz\foobar.txt in Notepad. It is, btw, the reason one has to explicitly change the drive (either with cd /d or using D: in this case) to end up on D:.

Anyway, because Git for Windows also runs in the command interpreter, perhaps it's not the best idea to block this kind of path? ...

@dscho

  • As I hinted by providing an alternative patch series, the change presented in this PR is not complete enough. It only excludes file names containing coions. There are many more characters that are illegal in Windows' file names.

The official documentation resides over at "Naming Files, Paths, and Namespaces". Another quite useful and comprehensive article over on Project Zero explains some further intricacies of path names on Windows.

Since Git for Windows aims for the Win32 subsystem, it seems logical, however, to assume the strictest notion of what entails a valid path name (which includes stuff like no trailing dot - . - in file or folder names, which isn't even documented).

The alternative namespaces (Win32, Win32 device and NT "native") would in theory still allow to escape some of the rules imposed by Win32, but I am not sure how much of that would get swallowed by "Cygwin". That is, a file or folder named NUL or with a name ending in a dot can still be handled from Win32 by using a namespace closer to "native" to "escape" the path name translations ... (the Project Zero article has a great overview in that regard).

@dscho
Copy link
Member

dscho commented Dec 26, 2019

I'll close this, as it has been addressed as part of git@817ddd6.

@dscho dscho closed this Dec 26, 2019
@dscho
Copy link
Member

dscho commented Jan 1, 2020

For the record, we also integrated it into Git for Windows' master via #2440.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants