Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

win32: improve loading files from long paths #12119

Closed
wants to merge 1 commit into from

Conversation

pgasior
Copy link
Contributor

@pgasior pgasior commented Aug 9, 2023

Long path support on Windows in MPV didn't work in all scenarios:

To fix that, function that converts to UNC path when needed was added. It appends \\?\ and converts / to \. It is used to convert file path before passing it to CreateFileW.
Lua and JS mp.utils.split_path returns directory with trailing slash. mp.utils.readdir used simple string append which caused double slash in path. UNC paths do not support this, so mp_path_join was used instead.

@pgasior pgasior marked this pull request as draft August 9, 2023 21:32
@pgasior pgasior marked this pull request as ready for review August 9, 2023 21:42
osdep/io.c Outdated
char *filename_copy = talloc_strdup(talloc_ctx, path);
if (strlen(filename_copy) >= MAX_PATH &&
strncmp("\\\\?\\", filename_copy, 4) != 0) {
for (int i = 0; i < strlen(filename_copy); i++) {
Copy link
Member

@avih avih Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop condition i < strlen(filename_copy) does strlen on each iteration (each char of that string), which makes it O(N^2). Big no.

Similar loop which is more correct would be (untested):

char *s = filename_copy;
while ((s = strchr(s, '/')))
    *s++ = '\\';

There might also be some existing string replace function, but not sure about that.

@avih
Copy link
Member

avih commented Aug 10, 2023

The commit messages are missing an explanation.

In general, please explain the exact problem and solution at the commit messages. You can add more "meta" info at the PR web page, but the commit messages should be sufficient to understand what gets fixed, why, and how (where applicable and/or where it's not obvious).

The lua and js commits effectively fix identical code in identical way. They can be squashed into a single commit with a single useful commit message, or you can keep them separate, have a useful commit message at the first of them, and at the second refer to the first commit for full explanation.

To fix that, function that converts to UNC path when needed was added

This conversion tries to avoid touching existing UNC paths, but it only detects one form of a single UNC scheme - "dos drive".

E.g. it detects that \\?\c:\ is already UNC, but fails to detect \\.\c:\ or \\localhost\c$\ or \\whatever\share\ etc, and so it will try to make them UNC again - and break them while at it.

Also, as far as I know the \\?\ (and \\.\) scheme is only for "dos drive" absolute paths, so if the path is relative (like foo/bar relative to CWD), I think it would get broken by prefixing it with \\?\. I did not investigated it, but I think that if you want to make UNC from a relative path - you should first make it absolute - which as far as I know mpv very much tries to avoid.

Without giving it too much thought, I would think that a good enough detection is probably when the string begins with \\.

I'm not too thrilled with making a copy unconditionaly even if no change is required, but let's put that asside for now.

I think that a useful UNC normalization procedure would be anytime the length reaches MAX_PATH:

  • Convert any / to \.
  • Convert any sequence of consecutive \ into a single \, but don't make \\ into \ at the very begining of the string.
  • If it begins with \\ then assume by now it's already valid UNC.
  • If it's absolute "dos drive" <drive-letter>:\<stuff> then make it UNC by prefixing it.
  • Else - not sure. Probably do nothing for now (e.g. relative path).

Lua and JS mp.utils.split_path returns directory with trailing slash. mp.utils.readdir used simple string append which caused double slash in path. UNC paths do not support this, so mp_path_join was used instead.

I can see the PR changes plain concat of DIR + '/' + FILE into mp_path_join, but I don't think I see anything about utils.split_path. Is it touched/fixed/addressed in some way?

[EDIT] or maybe you mean that mp_join_path knows to join without creating double slash in this case? I think this should be instead fixed while normalizing it as a UNC path, but using mp_path_join here doesn't hurt.

* autoload.lua failed to process files that exceeded MAX_PATH

Did you analyze this issue?
Do you know why it happens?
Does this get fixed in the current PR code? I don't think I see anything related here. I can see you apply the UNC thing in mp_stat and mp_open but not in mp_opendir or mp_readdir.

I'm also asking because while I believe that report, I don't understand why it happens after a quick look at the source code of mp_opendir and mp_readdir. I would think that it should either succeed to open the dir and then succeed to list all files in it, or fail to open the dir (maybe if it exceeds MAX_PATH) and then fail to list any file.

I don't think I see anywhere that MAX_PATH limit is applied to DIR plus FILE together...

But maybe I'm looking at it wrong. I do believe the report, but didn't try it out yet.

[EDIT] Ah, I see where it breaks, stat on the combined DIR+FILE breaks if the combined length reaches MAX_PATH, which is indeed fixed by using mp_path_join to ensure no double //, and then making it UNC.

@rossy thoughts on this PR approach, scope, and details?

@pgasior
Copy link
Contributor Author

pgasior commented Aug 10, 2023

Thanks for very informative comments. I looked again at WinAPI docs and found something that I missed yesterday. Since Windows 10 1607 it should be possible to enable handling of long path without converting them to UNC. It requires change to registry and to application manifest. I'll test that later today
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry

Would that be better approach? On other hand it will still be broken for older Windows versions.

@avih
Copy link
Member

avih commented Aug 10, 2023

To summarize the changes and addressed issues in this PR so far:

  1. Allow mp_open and mp_stat to work with paths which reach or exceed MAX_PATH, by first changing them to UNC. This allows to open long absolute dos paths from CLI or the client interface.
  2. Allow the lua/js readdir to list file names which, combined with the dir, reach or exceed MAX_PATH, by using mp_path_join instead of DIR + '/' + FILE, because that avoids an issue of double / with the naive concat, which, in turn, result in valid UNC conversion at stat on the combined DIR+FILE.

Known issues:

  • The UNC normalization fail to detect some forms of existing UNC paths (like \\.\whatever or \\host\share\whatever), currently it breaks with relative paths, and it doesn't ensure only single backslashes are used (double would be valid for normal path but invalid with UNC).
  • Unclear whether this fixes using the lua/js readdir on paths which reach or exceed MAX_PATH - before we add the file names at the dir. I would guess it would need UNC normalization also at mp_opendir.
  • the UNC normalization function makes a copy even if no change is required. Maybe split it to two functions - one which tests if a normalization is required, and one which does the copy and normalization?
  • The O(N^2) issue due to the loop condition at the UNC normalization.

@avih
Copy link
Member

avih commented Aug 10, 2023

Windows 10 1607 it should be possible ...
It requires change to registry and to application manifest

Would that be better approach?

If this requires the registry change, then no.

mpv won't make that chage, and we can't expect the user to do that either.

On other hand it will still be broken for older Windows versions.

True as well, or maybe even prevent running on win7 (not sure, but some manifest changes can make the binary incompatible with some earlier versions of windows).

The UNC normalization code is not huge, so if it can be kept small and useful, it should be OK IMO.

@pgasior
Copy link
Contributor Author

pgasior commented Aug 11, 2023

Updated code to address issues.
I changed the logic for UNC normalization.

  • Always convert relative paths to absolute, because we don't know if CWD + path will exceed MAX_PATH, and then apply UNC prefix
  • Apply GetFullPathNameW to absolute paths longer than MAX_PATH because it will normalize path separators, and then apply UNC prefix

mp_opendir and mp_readdir are still issue and will fail on paths longer than MAX_PATH before adding filename. Mingw-w64 provides implementation of dirent.h that has internal limitation for MAX_PATH https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-crt/misc/dirent.c
To fix that different implementation must be provided. https://github.com/win32ports/dirent_h looks promising, but might require some changes as it's way of handling UNC doesn't seem to work with rest of my implementation

@avih
Copy link
Member

avih commented Aug 11, 2023

mp_opendir and mp_readdir are still issue and will fail on paths longer than MAX_PATH before adding filename. Mingw-w64 provides implementation of dirent.h that has internal limitation for MAX_PATH

OK, assuming this is correct, then I'm not sure it's worth making mp_open and mp_stat work with long paths, while keeping mp_opendir broken for long paths, because invoking mpv with a directory argument - so that mpv enumerates the files and adds them to the playlist - is a very common use case, and this use case will remain broken with long paths.

So it would be a very partial fix without mp_opendir IMO.

For what it's worth, I don't think mp_readdir has an issue, because it only holds the file name, and the buffer has enough room for a single file (MAX_FILENAME * 3 as UTF8 which can hold MAX_FILENAME of wchar_t).

Fixing mp_opendir will require a new underlaying implementation of [_w]opendir, and currently I'm not sure it's worth adding such implementation to the mpv code to work around this issue.

So there's that...

Specifically about your changes:

I don't think we need the first commit ("player: use mp_path_join to construct path in script_readdir "), because the UNC normalization should ensure there are no double [back]slashes (except at the begining).

The logic is this: the UNC normalization takes a path which, if it wasn't too long, would work fine, and then converts it to UNC.

UNC paths are more strict than non-UNC paths, and so the normalization process should process them to make them work as UNC. It already converts / to \ - which is part of this strictness, and it should similarly ensure there are no \\ except maybe at the begining.

And so the burden on making it work is on the normalization function, not on the caller which uses a path which is generally correct, but happens to be not strict enough for UNC.

So if we add this commit (which I think we should not), it would be after it already works without it because the normalization makes it work, and only if the join function does a better job in some sense than DIR + / + FILE.

  • Always convert relative paths to absolute, because we don't know if CWD + path will exceed MAX_PATH, and then apply UNC prefix

Not sure I get this. You want to normalize short relative paths at mp_open so that later, if they're concatenated to CWD, then it doesn't become too long?

At which stage exactly is CWD concatenated with the relative path after you normalize it?

mpv generally tries very hard to work with the user-provided paths and not extend them in artificial ways. I don't think it cares about CWD when it opens a relative dir.

In general, the UNC normalization is very low level at lowest libc wrappers (like mp_open). The caller doesn't know or care that mp_open normalizes it to make it work, and so only the places where the system API is used need this normalization, preferably via a wrapper - like at mp_open.

And so, if it gets concatenated and becomes too long - then you normalize it when the combined path is used with some low-level API - like mp_open or mp_stat. You don't need to normalize relative paths at mp_open if they're not too long for the underlaying wrapped API.

Do correct me if I'm wrong.

  • Apply GetFullPathNameW to absolute paths longer than MAX_PATH because it will normalize path separators, and then apply UNC prefix

It seems to me you apply it to all normalized paths, which includes long absolute paths, and all relative paths?

I might understand if you do that for long relative paths to make them absolute, so that you can make them UNC, but I don't understand exactly what it does with absolute paths. Does it convert // (double slash) into \ (single backslash) ? Is the result guaranteed to be valid when prefixed with \\?\ ?

Also, are you sure it's working? from the GetFullPathNameW docs on [in] lpFileName:

By default, the name is limited to MAX_PATH characters. To extend this limit to 32,767 wide characters, prepend \\?\ to the path

But I don't see you prepend \\?\ to the input... (I only see you prepend it to the result).

How does this length calculation work?

    size_t buffer_size = strlen(path) + 1;
    wchar_t *buffer = talloc_array(talloc_ctx, wchar_t, buffer_size + 4);

It looks to me that you pre-allocate for the wchar_t* result the length of the char* string, plus 1 (for final \0) plus 4 (for the prefixed \\?\).

At the very least, it's not obvious how you go from length of char to length in wchar_t.
EDIT - that part is probably OK, because N chars would become at most N wchar_ts if we're converting from UTF8.

And then, if GetFullPathNameW converts a relative path into absolute, then it becomes longer, and then it would probably be too big for the buffer?

I'll stop here for now. It's been too long already...

@pgasior
Copy link
Contributor Author

pgasior commented Aug 11, 2023

First commit ("player: use mp_path_join to construct path in script_readdir ") is still needed. When starting file that has long path from explorer it will pass UNC to mpv and mp.get_property("path") will return that UNC. Old implementation of mp.utils.readdir will then break UNC by appending extra /.

Regarding GetFullPathNameW. It also normalizes full path if it is passed, so removes multiple slashes and replaces / with \. Buffer size won't be a problem. Function is called twice. First call may fail when buffer is too small and it will return required size. Then talloc_realloc is called on old buffer and second call will succeed.
As for length limitation it looks like documentation is wrong there, and it just doesn't care about MAX_PATH:
https://stackoverflow.com/a/38038887
dotnet/runtime#14062 (comment)
rust-lang/rust#32689 (comment)

Regarding CWD + filename. CreateFileW cares. If CWD is very long and just relative filename is passed then it will fail if resulting absolute path(resolved internally in CreateFileW) is longer than MAX_PATH

As for usefulness of this change. It may look like partial solution without changing [_w]opendir, but solves case where path < MAX_PATH and path + filename > MAX_PATH and ensures mpv doesn't break UNC if it gets one.
That's the issue that me and #11539 had. Directory didn't exceed MAX_PATH, but after appending filename it did.
Changing [_w]opendir might be done in other PR.

@avih
Copy link
Member

avih commented Aug 12, 2023

First commit ("player: use mp_path_join to construct path in script_readdir ") is still needed. When starting file that has long path from explorer it will pass UNC to mpv and mp.get_property("path") will return that UNC. Old implementation of mp.utils.readdir will then break UNC by appending extra /.

All the more reason to ensure it's getting normalized.

Many places use / to concatenate paths, because that's the standard. Even mpv itself does that when given a directory argument where it creates a playlist from the files at this directory. Third party scripts do that too, etc.

Similarly, code which deals with paths doesn't pay much attention to //, because that's standard too - also on windows, except with UNC.

We can't expect every piece of mpv code or 3rd party script to handle all paths with the care which UNC paths require (and if we do expect that, we should be ready for some dissappointments).

If a path happens to be UNC, then it needs to be normalized at mp_open, mp_stat, mp_opendir and anywhere else which could break due to UNC strictness.

Buffer size won't be a problem. Function is called twice

Right, missed that. Thanks. I'm pretty sure at the very least it's guaranteed to become longer if it's relative, but not a big issue - it will be called twice. That's OK.

As for length limitation it looks like documentation is wrong there, and it just doesn't care about MAX_PATH

Regarding CWD + filename. CreateFileW cares. If CWD is very long and just relative filename is passed then it will fail if resulting absolute path(resolved internally in CreateFileW) is longer than MAX_PATH

Huh. So the docs on both of these APIs are wrong, and in different ways? The CreateFileW doc on [in] lpFileName says exactly the same thing which GetFullPathNameW says about [in] lpFileName - that it's limited to MAX_PATH unless prefixed with \\?\ (or the registry thing which removes the MAX_PATH limit) - and then it becomes up to (roughly) 32767.

It doesn't mention anywhere (that I could notice) that the length depends on CWD somehow.

Do you know whether this "long CWD path can break short relative paths" apply elsewhere too? like in stat, or other APIs which takes paths/files? Or is it only a CreateFileW thing?

Can you find any references, preferably official, to this notion other than from your experiments?

If eventually we accept that the docs are wrong as described, then this requires a comment at the source code which describes how it works, and why things are needed or not needed, and what we assume/accept/etc, on both those APIs.

So the answer to whether normalization is needed would probably be:

  • If it looks like UNC (which I think should include //foo/bar/baz) then normalize if it contains / anywhere, or \\ except at the begining.
  • If it's not UNC and absolute, then normalize if it's a too-long.
  • If it's not UNC and relative, then normalize if it becomes too long together with CWD (calculate or estimate).

Does that sound about right?

Then, for the normalization itself, do we need anything other than ensuring it doesn't have / and doesn't have \\ except at the begining (and prepending \\?\ for non-UNC absolute paths)?

If that's about it, then I prefer to do this in mpv code, because it's not entirely clear to me what GetFullPathNameW does, and I prefer the modifications to be as minimal and as controlled as possible.

Except for relative paths (which are too long with CWD), because making an absolute path from a relative one is, IMHO, best left for official APIs.

As for usefulness of this change. It may look like partial solution without changing [_w]opendir

It doesn't look partial. It is very much partial.

ensures mpv doesn't break UNC if it gets one.

That is probably true.

solves case where path < MAX_PATH and path + filename > MAX_PATH
That's the issue that me and #11539 had. Directory didn't exceed MAX_PATH, but after appending filename it did.

Sure, but you do realize it's a very very narrow use case of "support long paths" - where the path is just shy of MAX_PATH, and when adding few more chars of the filename then it becomes bigger than MAX_PATH.

If the original path was only few chars longer then the solution in this PR won't help it, and if it was only few chars shorter then there probably wouldn't be any issue to begin with.

To me, it's so narrow that it couldn't be seriously considered a solution, not even partial.

To work (with directory listings, and opening a directory) It needs the stars to align way more than they are likely to be aligned.

So I'd consider directory listing generally broken with long paths, except for some very lucky coincidences.

But we could say that it solves UNC inputs, and that it solves long absolute paths, and that it solves relative paths which are too long when considering CWD.

And this does have value, so let's continue with that premise.

@pgasior
Copy link
Contributor Author

pgasior commented Aug 12, 2023

If a path happens to be UNC, then it needs to be normalized at mp_open, mp_stat, mp_opendir and anywhere else which could break due to UNC strictness.

You're right. I did some experiments and it seems GetFullPathNameW can normalize broken UNC too. I think it would be best to always call it, except when path is absolute and shorter than MAX_PATH. It is able to deal with very broken paths like c:\very\//\/\//\/broken\path even if it has UNC prefix. CreateFileW also accepts such broken paths if they are shorter than MAX_PATH.
I think mp_opendir, mp_stat and mp_open are only functions that can potentially get broken paths from user or scripts. Not sure about mp_mkdir. With normalisation in those 3 functions I removed commit that changed path joining logic in scripts.

Do you know whether this "long CWD path can break short relative paths" apply elsewhere too? like in stat, or other APIs which takes paths/files? Or is it only a CreateFileW thing?

stat on Windows is mp_stat and uses CreateFileW internally so it is affected. That why I also applied same logic as with mp_open
Seems like Windows has some kind of global limitation that CWD can't be longer than MAX_PATH If you launch process from such path it's CWD is set to C:\ and calling SetCurrentDirectory also fails for me even with UNC, again contrary to what docs say.

Can you find any references, preferably official, to this notion other than from your experiments?

I think the link I posted in previous comment to issue comment on official dotnet repo by Microsoft employee seems to be closest to official information. WinAPI docs seem to omit weird edge cases and have some parts of description copy-pasted.

Allow mp_open and mp_stat to work with paths which reach or exceed
MAX_PATH, by normalizing and changing them to UNC if needed. This
allows to open long absolute dos paths from CLI or the client interface.
Relative paths are also converted because CreateFileW fails when
cwd + path >= MAX_PATH. Using GetFullPathNameW has also nice side
effect of removing duplicated path separators and converting them to \.
@avih
Copy link
Member

avih commented Aug 13, 2023

I think it would be best to always call it, except when path is absolute and shorter than MAX_PATH

As I said, mpv tries very hard to not mess with paths more than necessary, and when we do mess with them, it should be as minimal and controlled as possible, therefore I disagree with that statement.

It is able to deal with very broken paths like c:\very\//\/\//\/broken\path even if it has UNC prefix

First of all, this doesn't actually need normalization, right?

But if it had UNC prefix, or if it was too long, then the only way it's broken is that "it has '/' anywhere, or "\\\\" except at the begining", which is within my suggested normalization test, and it gets fixed by my suggested normalization procedure.

So please first reply to my question if the suggested normalization test and procedure cover the cases we know about, and if not - what's missing.

For reference, here's a possible implementation of that procedure (without prepending \\?\):

// convert all '/' to '\\', and then replace any sequence of '\\' with
// a single '\\' - except at the begining where up to two are allowed.
void normalize_path_construction(char *path)
{
    char *s, *out;

    for (s = path; (s = strchr(s, '/')); *s++ = '\\') /* empty */;

    if (*path && (out = s = strstr(path + 1, "\\\\"))) {
        while ((*out++ = *s++)) {
            while (*s == '\\' && s[-1] == '\\')
                ++s;
        }
    }
}

calling SetCurrentDirectory...

mpv doesn't do that. How is this relevant?

Seems like Windows has some kind of global limitation that CWD can't be longer than MAX_PATH If you launch process from such path it's CWD is set to C:\

Here's how I interpret this statement: Calling getcwd in mpv incorrectly returns <drive-letter>:\ if the CWD is longer than MAX_PATH, therefore we can't test whether CWD + relative-path is longer than MAX_PATH, and so the solution is to always use GetFullPathNameW for relative paths.

If that's what you meant, then we should first try to fix getcwd, because it's within "handle long paths correcty" - which is the goal of this PR.

If that's not what you meant, then please be very specific on what you mean, and what exactly it implies about the solution.

Also, as I said, any implementation detail which is based on undocumented assumptions or knowledge must be explained in a comment (not the commit message), together with the best references we can find (links, etc), or else it will not be possible in the future to fix issues, because it would look incorrect by the official docs, and the actual assumptions for the code would be unknown and unverifiable.

According to you, these are the undocumented things the implementation is based on (feel free make corrections or add more):

  • GetFullPathNameW can handle any input length even if it's not UNC.
  • CreateFileW fails with short relative paths if the absolute path is longer than MAX_PATH.
  • getcwd returns an incorrect path if the current dir is longer than MAX_PATH (though preferably we should fix it instead basing code on this).

Also, extrapolating from the CreateFileW and getcwd issues, it would seem to me that "short relative path is broken if the absolute path reaches MAX_PATH" applies to most win32 APIs which take a path, so I'd suggest to test that with mkdir too, and possibly others.

@pgasior
Copy link
Contributor Author

pgasior commented Aug 14, 2023

So the answer to whether normalization is needed would probably be:
If it looks like UNC (which I think should include //foo/bar/baz) then normalize if it contains / anywhere, or \ except at the beginning.

CreateFileW will fail if UNC path contains \..\ or \.\. I'm not sure if doing relative resolution by hand is a good idea. There is possibility that something will still be missed. GetFullPathNameW handles that case too. Maybe only call it as last resort when path contains those patterns?

If it's not UNC and absolute, then normalize if it's a too-long.

Same as with UNC, needs additional path resolution if too long

If it's not UNC and relative, then normalize if it becomes too long together with CWD (calculate or estimate).

How would you like that estimation to work?

  • Call getcwd. Calculate length of CWD + \ + path. If too long manually join UNC, CWD, separator, path, Call normalize_path_construction on result. Needs additional path resolution
  • Call getcwd. Call normalize_path_construction. Calculate length of CWD + \ + normalized_path. If too long manually join UNC, CWD, separator, normalized_path. - in some edge cases path wouldn't need to be converted to UNC if normalization made it shorter. Needs additional path resolution
  • Call GetFullPathNameW, discard result when shorter than MAX_PATH, use when longer.

I think those are all possible cases. So should mpv have code that would also do full path resolution, or should GetFullPathNameW be used? My current implementation works in all cases. I understand that you don't want to modify provided paths, but doing full path resolution just before call to WinAPI doesn't sound that bad. I can't think of any problem that may happen because of that.

If that's what you meant, then we should first try to fix getcwd, because it's within "handle long paths correcty" - which is the goal of this PR.

It is not broken. It seems to be internal Windows limitation that process can't be started with or have CWD longer than MAX_PATH. MPV getcwd on Windows is mp_win32_getcwd and currently calls GetFullPathNameW(L".", ...) but has same limitation as GetCurrentDirectoryW. When trying to launch process from such path:

  • Explorer: GetCurrentDirectoryW/GetFullPathNameW(L".", ...) always returns C:\WINDOWS\system32. Path is passed as UNC if it is on any drive other than C. If it is on C then it is passed as old DOS 8.3 format (C:\dev\PATH_E~1\VERYLO~1\VERYLO~1\1.mp4), but CreateFileW and _wopendir can handle that.
  • Powershell: GetCurrentDirectoryW/GetFullPathNameW(L".", ...) always returns C:\
  • CMD: Unable to start process

Explorer will always work because it uses absolute path. In case of CLI it will only work with absolute paths. It will never work with relative paths due to Windows limitation. Explorer and Powershell seem to be aware of that limitation and just change CWD to be able to start new process. So getcwd works correctly because it will never get path longer that MAX_PATH. It can't be trusted when launched from long path, but it is also impossible to know when this happens.

Also, extrapolating from the CreateFileW and getcwd issues, it would seem to me that "short relative path is broken if the absolute path reaches MAX_PATH" applies to most win32 APIs which take a path, so I'd suggest to test that with mkdir too, and possibly others.

  • open -> mp_open - updated in this PR
  • creat -> mp_open
  • fopen -> mp_open
  • opendir -> mp_opendir - currently broken due to MinGW _wopendir
  • readdit -> mp_readdir - broken because of mp_opendir
  • closedir -> mp_closedir - broken because of mp_opendir
  • mkdir -> mp_mkdir - will need same changes as mp_open
  • getcwd - works. Explained in previous paragraph
  • stat -> mp_stat - updated in this PR
  • glob -> mp_glob -> will need same changes as mp_open
  • exists -> will need same changes as mp_open
  • mp_subprocess2 - will need same changes as mp_open

I also did some research on alternate dirent and seems that R project recently was introducing long path support https://blog.r-project.org/2023/03/07/path-length-limit-on-windows/
They hit same limitation that MPV has with MingGW-w64 provided functions and decided to write their own https://github.com/r-devel/r-svn/blob/9e5f453f2759f3693cb9f4d8fe0c32fa56709669/src/main/platform.c#L1243 They also seem to be using manifest and registry thing.

@avih
Copy link
Member

avih commented Aug 14, 2023

CreateFileW will fail if UNC path contains \..\ or \.\. I'm not sure if doing relative resolution by hand is a good idea

Please separate two things: normalization-test, and normalization-procedure.

The goal of the test is to rule out cases which don't need any modifications of the path (normalization). It's allowed to err on the side of caution, i.e. decide that something needs normalization even if it actually doesn't (but preferably that's won't happen much).

Currently mpv doesn't normalize paths and it generally works well, so hopefully the test will be able to rule out the majiority of cases from the need for normalization.

That's the goal with the test - to rule out the need for normalization, hopefully most of the time.

Then there's the normalization procedure, which modifies the path, if needed, so that it can be used with the target underlaying API (assuming the path actually exists/valid/etc).

So in your reply to my questions about the test and normalization, you added the fact that normalized (UNC) path also cannot have relative components (. and ..), and maybe there are more things it normalizes which you don't know about, yes?

That's fine. For now, let's say the normalization procedure is (logically) this:

  • Convert using GetFullPathNameW.
  • If the result doesn't begin with \\ (two backslashes), then prepend \\?\.

Is this a good normalization procedure in your opinion which works as good as possible with all paths? (UNC and not-UNC, absolute/relaive, short/long, etc.)

Or is there anything missing here? at which case, what would be a better procedure?

Now that we have a good normalization procedure, let's get back to the test which decides whether to normalize. Let's ignore relative paths for now, because this seems to be a big subject on its own.

So the overall test might be:

  • If it looks like UNC then normalize.
  • If it's absolute and too long then normalize.
  • (some way to test if a relative path needs normalization).

Assuming the normalization procedure is as described above (and the relative-path test and normalization work), does this test cover our test needs? most importantly, is there any case where it will decide that there's no need for normalization but it actually won't work without normalization?

Now that we have an outline of the test, let's get back to the relative path thing.

How would you like that estimation to work? Call getcwd. Calculate length of CWD + \ + path. If too long...

That's prettty much what I had in mind, yes. something like

if (relative && (CWDLENGTH + 1 + strlen(path) >= MAX_PATH))
  // we need to normalize

It is not broken. It seems to be internal Windows limitation that process can't be started with or have CWD longer than MAX_PATH. MPV getcwd on Windows is mp_win32_getcwd and currently calls GetFullPathNameW(L".", ...) but has same limitation as GetCurrentDirectoryW

This is where I get a bit lost, so first, some focus.

For the sake of this discussion, let's assume the mpv binary is x:\mpv\mpv.exe.

Now, there's no problem with getcwd when it's short, right? (e.g. cd d:\a\b && x:\mpv\mpv foo/bar.mkv)

So our problematic use case is relative path (e.g. of a media file), and where the current working dir is (or we expect it to be) longer than MAX_PATH, and where we need to decide whether the relative path needs normalization, right?

Explorer: GetCurrentDirectoryW/GetFullPathNameW(L".", ...) always returns C:\WINDOWS\system32

Do correct me if I'm wrong, but when launching a media file in mpv from explorer (double clicking the media if mpv is associated with the type, or draging a media file or directory onto mpv.exe), then the media path[s] are always absolute, so this is irrelevant to the case of relative media/file path with long CWD, right?

Or am I missing the relevance?

CMD: Unable to start process

So you're saying that you can't cd into a long dir in cmd.exe?

Or that once you cd into a long path, you can't launch any process, not even a trivial hello-world C program?

Or that you can launch some programs but not mpv?

Powershell: GetCurrentDirectoryW/GetFullPathNameW(L".", ...) always returns C:\

Am I correct to assume that in such case normalization won't help either, because if GetFullPathNameW(L".", ...) returns C:\ instead of C:\some-long-path, then GetFullPathNameW(L"foo/bar.mkv"...) would result in C:\foo\bar.mkv - which is incorrect?

Basically, our ability to normalize a relative path depends on whether GetFullPathNameW can identify CWD. If it can, then getcwd would work correctly and its length would be correct, and the normalization would work. If it can't, then there's nothing we can do about it anyway, and even unconditional normalization would fail. It's simply beyond mpv's control.

And so the bottom line is that there's no realistic use case where mpv both needs and can deal with long CWD and relative path (explorer always launches with absolute paths, CMD.exe can't launch mpv at all, and with powershell normalization doesn't help anyway).

The only case where mpv can test and/or normalize it is when getcwd() works.

Apparently, currently it only works when CWDLENGTH < MAX_PATH.

And so, as far as I can tell, the only relative-paths case we can improve compared to the current behavior is when CWD is shorter than MAX_PATH, but CWD + / + FILE is longer than MAX_PATH?

And for this case (and any other case where getcwd works), this would work, right?

if (relative && (strlen(getcwd(...)) + 1 + strlen(path) >= MAX_PATH))
    // need to normalize

Do you agree?

Does this conclude the discussion about normalization test and procedure?

If not, what's missing?

I also did some research on alternate dirent...

Nice, but forget it for now. It only adds noise. Let's assume that it works with the same limitations as CreateFIleW.

  • open -> mp_open - updated in this PR
    ...

So, you say this test and maybe normalization would need to happen at mp_open, mp_opendir, mp_stat (the PR already has those), and also mp_mkdir, mp_glob, exists, and mp_subprocess2 (which would need to be added to the PR), yes?

Let's say we did all this, and now let's see what we've gained (do correct me if I'm wrong):

  • Absolute paths were limited to MAX_PATH, now there's no limit (well, up to ~32K).
  • UNC path could not be concatenated with a relative path using /, now it can.
  • Relative paths were limited to CWD+/+FILE < MAX_PATH, now the limit is CWD < MAX_PATH (can you confirm this?).
  • Enumerating files in a directory was limited to [CWD+]DIR+/+FILE < MAX_PATH, now the limit is [CWD+]DIR < MAX_PATH, and, once/if _wopendir gets fixed, it would have the same limits as above (absolute or UNC of any length, relative if getcwd works).

Is that about right?

@avih
Copy link
Member

avih commented Aug 14, 2023

Wait a minute, how does one even create a dir-chain longer than 260?

I can't do that in windows exporer, and I can't do that in cmd.exe.

In explorer, if I create this dir at the root of some NTFS drive (that's 203 chars long)
START--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--PATH-A-200--END

and then inside this dir I try to create a dir longer than ~50 chars then explorer truncates it.

In cmd.exe it refuses to create long dir inside it.

How realistic is this long-path use case at all? What are the use cases? how do users end up with long paths?

@pgasior
Copy link
Contributor Author

pgasior commented Aug 14, 2023

It can be done with PowerShell New-Item -Type Directory dirname. Should also be possible with call to CreateDirectoryW. qBittorent can also do that and it is main source of such paths for me. However I encounter such paths very rarely.

@kasper93
Copy link
Contributor

What if we just enable long paths in manifset (<ws2:longPathAware>true</ws2:longPathAware>) instead of using UNC?

@avih
Copy link
Member

avih commented Aug 14, 2023

  • Absolute paths were limited to MAX_PATH, now there's no limit (well, up to ~32K).

Is this actually true? can mpv be launched with an absolute path to a media file where the absolute dir chain (before the final \filename.mkv) is longer than 260?

I have a hunch that as is, the most it can potentially improve is cases which previously were limited to ABS_DIR + FILE < 260 and now they would become limited a bit less - to ABS_DIR < 260.

I.e. that if the sbsolute dir is more than 260 then it just won't work even with the PR.

And the reason it might not work, is maybe because mpv doesn't have the manifest thingy to enable long paths, and so it's still limited to dir-chains of at most 260 chars, even if the system has the registry thing to enable long paths.

If indeed that's the case, then I imagine getcwd and GetFullPathNameW would work just fine for relative paths with any length of CWD.

What if we just enable long paths in manifset (<ws2:longPathAware>true</ws2:longPathAware>) instead of using UNC?

I don't think it's an instead thing. My hunch is that long UNC paths (where the dir chain > 260) just wouldn't work without the manifest.

@pgasior
Copy link
Contributor Author

pgasior commented Aug 14, 2023

Is this actually true? can mpv be launched with an absolute path to a media file where the absolute dir chain (before the final \filename.mkv) is longer than 260?
I don't think it's an instead thing. My hunch is that long UNC paths (where the dir chain > 260) just wouldn't work without the manifest.

Long correct UNC paths work in current mpv master, but only to load file. I didn't test manifest thing, but if I understand correctly it would make CreateFileW accept long paths without UNC prefix.

@kasper93
Copy link
Contributor

kasper93 commented Aug 14, 2023

I don't think it's an instead thing. My hunch is that long UNC paths (where the dir chain > 260) just wouldn't work without the manifest.

UNC paths supported longer paths since ever. The manifest and registry opt-in for long paths is for common file APIs to make them work with longer paths. This is a new thing, since Windows 10 1607 and opt-in for application is done only for backward compatibility, so that application have to explicit say it is ok, I can work with longer paths.

Long correct UNC paths work in current mpv master, but only to load file. I didn't test manifest thing, but if I understand correctly it would make CreateFileW accept long paths without UNC prefix.

That's correct. I think manifest opt-in is a clear way to support it instead of converting everything to UNC.

@Hrxn
Copy link
Contributor

Hrxn commented Oct 2, 2023

With the spate of recent commits improving mpv on Windows (Window affinity, etc.), wouldn't it be time to finally address this one here?

The relevant link to the WinAPI docs has been posted earlier in this thread here, but here is it again, direct to the relevant section:
Enable Long Paths in Windows 10, Version 1607, and Later

@pgasior

[..] Since Windows 10 1607 it should be possible to enable handling of long path without converting them to UNC. It requires change to registry and to application manifest. [..]

Yes, this is also how I understood this. We don't have to deal with extended-length paths by using the \\?\ path prefix.
NB: This is not necessarily a UNC path. The same prefix also works with UNC paths, but also non-UNC.

First of all, I agree that mpv never should have to deal with the Windows registry, let alone make changes to it, god forbid. [1]

But we don't have to care about this, let that be the responsibility of the user.
On mpv's side, all that is needed is the application manifest, to be aware of LongPathsEnabled support.

All that needs to be done, as I understand it, is to embed the application manifest as a resource in mpv.exe.


[1]

But, on the other hand, I think a very legitimate argument could be made in at least support of mpv doing a simple check here, and then showing a warning to the terminal, or something.

I mean, mpv should already have everything included to do this. We have the run and subprocess commands, on every normal Windows installation on the planet is still the old PowerShell (C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe), and all that is now missing to query the registry here is that bit:

(Get-ItemProperty 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem\' -Name 'LongPathsEnabled' -ErrorAction Ignore).LongPathsEnabled

If this returns 1, the registry key has been set correctly. If not, then not.

@bitraid
Copy link

bitraid commented Oct 10, 2023

Huh. So the docs on both of these APIs are wrong, and in different ways? The CreateFileW doc on [in] lpFileName says exactly the same thing which GetFullPathNameW says about [in] lpFileName - that it's limited to MAX_PATH unless prefixed with \\?\ (or the registry thing which removes the MAX_PATH limit) - and then it becomes up to (roughly) 32767.

It doesn't mention anywhere (that I could notice) that the length depends on CWD somehow.

Do you know whether this "long CWD path can break short relative paths" apply elsewhere too? like in stat, or other APIs which takes paths/files? Or is it only a CreateFileW thing?

Can you find any references, preferably official, to this notion other than from your experiments?

I found this official documentation that should clear some things up:
https://learn.microsoft.com/en-us/archive/blogs/jeremykuhne/path-normalization

Normally any path passed to a Windows API is (effectively) passed to GetFullPathName() and normalized. There is one important exception- if you have a device path that begins with a question mark instead of a period. It must use the canonical backslash- if the path does not start with exactly \\?\ it will be normalized.

Why would you want to skip normalization? One reason is to get access to paths that are normally unavailable, but legal in NTFS/FAT/etc. A file or directory called "foo." for example, is impossible to access any other way. You also get to avoid some cycles by skipping normalization if you've already normalized.

The last reason is that the MAX_PATH check for path length is skipped as well, allowing for paths that are greater than 259 characters long. Most APIs will allow this, with some notable exceptions, such as Get/SetCurrentDirectory.

Skipping normalization and max path checks is the only difference between the two device path syntaxes- they are otherwise identical. Tread carefully with skipping normalization as you can easily create paths that are difficult for "normal" applications to deal with.

Paths that start with \\?\ are normalized if you explicitly pass them to GetFullPathName(). Don't forget, however, that rooting is different with device syntax (C:.. does not normalize the same as \\?\C:..). Note that you can pass > MAX_PATH paths to GetFullPathName() without \\?\. It supports arbitrary length paths (well, currently up to the maximum string size that Windows can handle, see UNICODE_STRING).

@sfan5 sfan5 added the priority:stale Issue is too old and is unclear if it's even applicable anymore label Dec 21, 2023
@kasper93
Copy link
Contributor

Long paths support has been enabled in #13134

@kasper93 kasper93 closed this Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:stale Issue is too old and is unclear if it's even applicable anymore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants