-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os.path.realpath on Windows does not follow symbolic links #54158
Comments
In Lib/ntpath.py: # realpath is a no-op on systems without islink support
realpath = abspath However, Windows Vista and newer support symbolic links and other Python methods support them. (noticed this through source code inspection; haven't actually tested it) |
The issue bpo-12799 has been marked as a duplicate of this issue: |
This is a patch for the os.path.realpath() bug under Windows, http://bugs.python.org/issue9949 "os.path.realpath on Windows does not follow symbolic links" ntpath.diff fixes the realpath() function to resolve symbolic links to their targets (tested under Windows 7). Note: I tried to make the equivalent patch for Python 2.7, but it didn't work because the "from nt import _getfinalpathname" fails. |
It seems that the nt module is implemented in the posixmodule.c source file, and the Python 3 version contains the posix__getfinalpathname entry point, but the Python 2 version does not. I presume that PyWin32 could also be used to work around this. Too bad it isn't automatically included with Python: |
file, and the Python 3 version contains the posix__getfinalpathname entry
We can't use external packages in the standard library. If you're looking I think we can just implement the function we need. |
Excellent! The ntpath.py change is nearly identical in Python 2.7 to the change for Python 3.2. The only difference is that instead of: + elif isinstance(path, bytes): It is: + elif isinstance(path, unicode): |
I have attached a series of patches with (hopefully) provide more robust fix for this issue, against the Python 3.3 branch. It handles both bytes and str objects, paths that do not actually exist on the filesystem, and removal of the '\\?\' prefix returned by _getfinalpathname, unless the supplied path had that prefix already. A couple of the patches contain some separate infrastructure that could hopefully be used to simplify some of the other Windows code in posixmodule.c (One immediate possibility would be to combine the code provided in these patches with the symbolic-link resolution code in the Windows stat functions - there is quite a bit of duplication there that could be eliminated.) One thing these patches do not address is resolving a broken symbolic link. The Windows API function GetFinalPathNameByHandle does not handle this case (because CreateFile cannot be used to get a handle if the symbolic link is broken). This functionality could be implemented by manually following the reparse points, but that would basically require reimplementing GetFinalPathNameByHandle. Finally, this patch could be fairly easily backported to Python 3.2, but that shouldn't be done without careful consideration. It changes the return value from os.path.realpath on Windows even when there are no symbolic links in the path (the returned value will have the actual casing as stored on the filesystem, instead of the casing supplied by the user). I don't think it should be backported to Python 2.7, because that version, like all Python versions before Python 3.2 are unaware of symbolic links on Windows (e.g. lexists is the same function as exists). The patches were generated using git (I use git-hg) - if that format is a problem, let me know and I can regenerate them. |
Uploading a new series of patches - they are all the same as the first set, except for 0006-Make-realpath-follow-symbolic-links-on-Windows.patch. I realized that I could use os.readlink to handle broken symbolic links, so I changed the logic of os.path.realpath slightly to do that (including recursive symlink detection). |
The previous version of this patch did not handle bytes arguments correctly and could fail in conjunction with a non-ASCII compatible encoding. Also, if the result was a UNC path, it was not being handled correctly (the returned value would have started with a single backslash, not two). Version 3 of the patch fixes these issues. I am also attaching a single, condensed patch as requested by the Lifecycle of a Patch devguide. I can also provide a patch series as before if desired. |
Yet another patch to support symlink on Windows. |
I have unlinked my patch since it doesn't looks correct now. Sorry for disturbing. |
Daniel: It's taken two years, but I've reviewed your patch :). There are a few things that need to be addressed, but the basic change looks pretty good. If you're still interested in seeing this fixed, I look forward to reviewing an updated patch; otherwise, I'll try to pick this up myself soon. |
@zach - thanks for the review. Sorry that it has taken a few months to pick this issue back up. Anyway, here is an updated patch. It is pretty different than the previous patch, to the point that I would consider it a completely new patch. Anyway, here's a basic rundown of what the patch contains: Two functions in Modules/posixmodule.c were modified to handle bytes objects as well as str objects: win_readlink and _getfinalpathname. This was necessary to allow os.path.realpath() to continue returning bytes when supplied with bytes. I know you said in your review that you didn't care about this because using bytes paths on Windows is deprecated, but I think it still important to maintain backwards compatibility. A new implementation of realpath() was added to Lib\ntpath.py, used when _getfinalpathname is available (when _getfinalpathname is not available, realpath remains a synonym for abspath, as it was previously). The logic here has been completely reworked and is roughly modeled on the realpath implementation from Lib\posixpath.py. A number of tests were added to Lib\test\test_ntpath.py for realpath() functionality. Three of the tests were based on realpath() tests from test_posixpath.py, but one is completely new: test_realpath_broken_symlinks. A note about improved Windows support was added to the documentation for realpath(), but no other doc changes have been made. Anyway, feedback is welcome - I'd love to get this code in sufficient shape to see a working realpath() implementation for Windows land in Python. |
Any update on this? Would be great with a fix for python symlinks on Windows. |
FYI, there's been some discussion of this on bpo-37834, as the issues quickly became conflated. There's also bpo-14094 which is a dup of this one, but with a different patch. --- To move the relevant discussion here, my current PR is basically the tests from the last attached patch here plus a simpler implementation for non-strict resolution. Right now the big open question is how to deal with the \\?\ prefix. My last two commits will leave it alone if it was in the initial path, and otherwise strip it if the non-prefixed path resolves to the same path as the prefixed path (the _getfinalpathname always returns a prefixed path). The downside of this is that any unresolvable symlinks (dangling links, cycles, etc.) will now be returned with the prefix, whereas previously they were being returned without any attempt to resolve them being made (not because they are invalid, but because the old code wasn't attempting to resolve anything at all). I kinda feel like this is okay, but would appreciate any other opinions on the matter. The alternative is to always strip the prefix, which could make some valid paths become invalid (for example, the one in the new test_realpath_symlink_prefix). |
Another minor change worth calling out - I added a note to the docs that when a symlink cycle is detected, realpath() could return any member of the cycle, but does not guarantee which one. For our test cases it's generally stable enough, but if you change the starting point (e.g. from absolute to relative path) then it can change the result. It seemed easier all round just to not guarantee anything about which member of the cycle will be returned, rather than determining the exact rules, reproducing them across all platforms, and documenting them. |
There are multiple failures on several buildbots after commit 75e0649 was merged: BUILDBOT FAILURE REPORT Builder name: AMD64 Windows7 SP1 3.x Failed tests
Test leaking resources Build summary == Tests result: FAILURE then FAILURE == 384 tests OK. 10 slowest tests:
2 tests failed: 33 tests skipped: 2 re-run tests: Total duration: 13 min 17 sec Tracebacks
Current builder status The builder is failing currently Commits Other builds with similar failures
Steve, can you take a look? |
On it |
I suspect the relevant failure here (which is not listed in Pablo's post) is this one: ====================================================================== Traceback (most recent call last):
File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_ntpath.py", line 210, in test_realpath_curdir
tester("ntpath.realpath('/'.join(['.'] * 100))", expected)
File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_ntpath.py", line 30, in tester
raise TestFailed("%s should return: %s but returned: %s" \
test.support.TestFailed: ntpath.realpath('/'.join(['.'] * 100)) should return: C:\buildbot.python.org\3.x.kloth-win64\build\build\test_python_5340\test_python_worker_1408 but returned: \\?\C:\buildbot.python.org\3.x.kloth-win64\build\build\test_python_5340\test_python_worker_1408\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\. The path being normalized is longer than MAX_PATH, which is clearly causing failures with Windows APIs as implemented on Windows 7. I can add the normpath() call back in, which should remove all the dots before ever actually testing realpath() and get it back down under MAX_PATH. That won't help anyone using real long paths on Win7 though, but then again the only reason this test was passing before was because there was a normpath() call, so we're no worse off (apart from performance-wise). The venv issues seem unrelated. I'll get the realpath one fixed first and then take a look at those once I've finished today's tasks. |
Okay, the venv break is related (and it should have broken more frequently). The new realpath() implementation leaves the \\?\ prefix behind if the path doesn't exist, since that's the error you get when the path is longer than MAX_PATH and you're on a system that can't handle it. I think the best fix is probably to strip the prefix if the file didn't exist at the start and still doesn't exist afterwards. |
And sorry, the test_ntpath traceback was half listed in Pablo's message, but was missing the critical lines :) |
I'm tentatively reopening this issue for you to consider the following point, Steve. A real path is not always the same as a final path. We can find code that does What code like this needs from a realpath() implementation is a solid (real) path, not a final path. In other words, the caller wants a solidified form of Here's an example. In this first case "scripts" is a junction mount point that targets "C:/spam/etc/scripts":
>>> rel_eggs_right = os.path.relpath(eggs, scripts)
>>> print(rel_eggs_right)
..\dlls\eggs.dll
>>> os.symlink(rel_eggs_right, 'C:/spam/scripts/eggs_right.dll')
>>> os.path.exists('C:/spam/scripts/eggs_right.dll')
True
>>> scripts_final = os.path._getfinalpathname(scripts)[4:]
>>> print(scripts_final)
C:\spam\etc\scripts
>>> rel_eggs_wrong = os.path.relpath(eggs, scripts_final)
>>> print(rel_eggs_wrong)
..\..\dlls\eggs.dll
>>> os.symlink(rel_eggs_wrong, 'C:/spam/scripts/eggs_wrong.dll')
>>> os.path.exists('C:/spam/scripts/eggs_wrong.dll')
False If we remove the junction and replace it with a 'soft' symlink that targets the same directory, then using the final path works, and using the given path no longer works. >>> print(os.readlink('C:/spam/scripts'))
C:\spam\etc\scripts
>>> scripts_final = os.path._getfinalpathname(scripts)[4:]
>>> rel_eggs_right_2 = os.path.relpath(eggs, scripts_final)
>>> print(rel_eggs_right_2)
..\..\dlls\eggs.dll
>>> os.symlink(rel_eggs_right_2, 'C:/spam/scripts/eggs_right_2.dll')
>>> os.path.exists('C:/spam/scripts/eggs_right_2.dll')
True
>>> rel_eggs_wrong_2 = os.path.relpath(eggs, scripts)
>>> print(rel_eggs_wrong_2)
..\dlls\eggs.dll
>>> os.symlink(rel_eggs_wrong_2, 'C:/spam/scripts/eggs_wrong_2.dll')
>>> os.path.exists('C:/spam/scripts/eggs_wrong_2.dll')
False When the kernel traverses "scripts" as a soft link, it collapses to the target (i.e. "C:/spam/etc/scripts"), so our relative path that was computed from the final path is right in this case. On the other hand, if "scripts" is is a mount point (junction), it's a hard (solid) component. It does not collapse to the target (the kernel even checks the junction's security descriptor, which it does not do for a symlink), so ".." in the relative symlink traverses the junction component as if it were an actual directory. What we need is an implementation of realpath("C:/spam/scripts") that returns "C:\\spam\\scripts" when "scripts" is a mount point and returns "C:\\spam\\etc\\scripts" when "scripts" is a symlink. This means we need an implementation of realpath() that looks a lot like posixpath.realpath. Generally a mount point should be walked over like a directory, just as mount points are handled in Unix. The difference is that a mount point in Windows is allowed to target a symlink. (This is a design flaw; Unix doesn't allow it.) Since we need to know the target of a junction, we have to read the reparse point, until we hit a real directory target. As long as it targets another junction, it remains a hard component. As soon as it targets a symlink, however, it becomes a soft component that needs to be resolved. If the junction targets a name surrogate reparse point that we can't read, then our only option is to get a final path. This is dysfunctional. We should raise an exception for this case. Code can handle the exception and knowingly get a final path instead of a real path. This also means we can't reliably compute a real path for a remote path (UNC) because we can't manually evaluate the target of a remote junction. A remote junction is meaningless to us. If we're evaluating a UNC path and reach a junction, we have to give up on a real path and settle for a final path. We can get a final path because that lets the kernel in the server talk to our kernel to resolve any combination of mount points (handled on the server side) and symlinks (handled on our side). This case should also raise an exception. Aware code can handle it by getting a real path and taking appropriate measures. |
I don't know how common this scenario is, but I can certainly say that it's never worked on Windows. You'd also end up with a relative symlink in a real directory somewhere (that the junction was pointing at) that is unable to reach Relative symlinks are nearly as evil as symlink loops :) Given there is no POSIX concept of a "final" path, a real path is the closest analogy. If this is the quality of edge case where that happens to break down, I'm okay with leaving it broken. (Similarly for the junction/symlink combination on remote systems. I don't see any way that POSIX emulation can properly support that, but it seems like the far more rare case - and I say this as someone whose employer *lives and breathes* unnecessarily complicated SMB shares ;) I've never seen this become an issue.) |
That should be "by getting a final path". |
With ntpath.realpath == ntpath.abspath, it actually works fine for just junctions because in most cases a mount point in Windows should be handled simply as just a directory, not resolved as its target. Revisit my example. This is what I showed when "C:/spam/scripts" is a junction. It's only wrong if the junction targets a directory symlink -- a chimeric mount-point-link that NT has made the mistake of allowing. (The kernel developers should have known to disallow this when they introduced Unix-like symlinks. When evaluating a mount point that targets a symlink, or any name-surrogate reparse point, it should fail the call as an invalid reparse point. For simplicity and our sanity, a mount point should always have consistent semantics in path parsing as a hard component that behaves like a regular directory. Then we could *always* handle it as just a directory -- whether it's local or remote. They messed up badly, IMO.) |
At this point, I'm inclined to say let's wait and see what the 3.8.0b4 feedback looks like. Given that WSL has been fudging the boundaries here and hasn't suffered as a result, I don't expect much of a problem (and this change does actually make realpath() consistent between Windows native and WSL, which is nice). But I'm definitely open to making changes to this again if it turns out we need to. |
It seems like Windows buildbots are back to green (fixed), thanks ;-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: