-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
__code__. co_filename should always be an absolute path #64642
Comments
There are many issues on tracker related to the relative paths in co_filename. Most of them are about introspection functions in inspect module--which are usually broken after 'os.chdir'. Test case: create a file t.py: def foo(): pass
print(foo.__code__.co_filename) Execute it with '$ python t.py' -> it will print 't.py'. Ideally, when executing a python file, interpreter should expand all relative paths for __file__ and __code__.co_filename attributes. |
One of the side effect of my PR 14053 is that tracebacks are more verbose: the filename is now absolute rather than relative. Currently: $ python3 x.py
Traceback (most recent call last):
File "x.py", line 4, in <module>
func()
File "x.py", line 2, in func
bug
NameError: name 'bug' is not defined With my PR: $ ./python x.py
Traceback (most recent call last):
File "/home/vstinner/prog/python/master/x.py", line 4, in <module>
func()
File "/home/vstinner/prog/python/master/x.py", line 2, in func
bug
NameError: name 'bug' is not defined |
The site module tries to compute the absolute path of __file__ and __cached__ attributes of all modules in sys.modules: def abs_paths():
"""Set all module __file__ and __cached__ attributes to an absolute path"""
for m in set(sys.modules.values()):
if (getattr(getattr(m, '__loader__', None), '__module__', None) not in
('_frozen_importlib', '_frozen_importlib_external')):
continue # don't mess with a PEP 302-supplied __file__
try:
m.__file__ = os.path.abspath(m.__file__)
except (AttributeError, OSError, TypeError):
pass
try:
m.__cached__ = os.path.abspath(m.__cached__)
except (AttributeError, OSError, TypeError):
pass The __path__ attribute isn't updated. Another approach would be to hack importlib to compute the absolute path before loading a module, rather than trying to fix it *afterwards*. One pratical problem: posixpath and ntpath are not available when importlib is setup, these modules are implemented in pure Python and so must be imported. Maybe importlib could use a naive implementation of os.path.abspath(). Maybe the C function _Py_abspath() that I implemented in PR 14053 should be exposed somehow to importlib as a private function using a builtin module like _imp, so it can be used directly. |
Perhaps |
Effect of my PR 14053 using script.py: import sys
print(f"{__file__=}")
print(f"{sys.argv=}")
print(f"{sys.path[0]=}") Before: $ ./python script.py
__file__=script.py
sys.argv=['script.py']
sys.path[0]=/home/vstinner/prog/python/master With the change: $ ./python script.py
__file__=/home/vstinner/prog/python/master/script.py
sys.argv=['/home/vstinner/prog/python/master/script.py']
sys.path[0]=/home/vstinner/prog/python/master |
Example of case where a module path is still relative: import sys
import os
modname = 'relpath'
filename = modname + '.py'
sys.path.insert(0, os.curdir)
with open(filename, "w") as fp:
print("import sys", file=fp)
print("mod = sys.modules[__name__]", file=fp)
print("print(f'{__file__=}')", file=fp)
print("print(f'{mod.__file__=}')", file=fp)
print("print(f'{mod.__cached__=}')", file=fp)
__import__(modname)
os.unlink(filename) Output: __file__ and mod.__file__ are relative paths: not absolute paths. |
This change seems to produce a new warning on my Mac. ./Modules/getpath.c:764:23: warning: incompatible pointer types passing 'char [1025]' to parameter of type 'const wchar_t *' (aka 'const int *') [-Wincompatible-pointer-types]
_Py_isabs(execpath))
^~~~~~~~
./Include/fileutils.h:158:42: note: passing argument to parameter 'path' here
PyAPI_FUNC(int) _Py_isabs(const wchar_t *path);
^
1 warning generated. |
Isn't that change breaking compat ? I'm assuming many scripts in the wild sys.argv[0] and play with it assuming it's relative. I would expect such a change to be behind a flag or a __future__ import. |
I think that's a valid point regarding sys.argv[0] - it's the import system and code introspection that wants(/needs) absolute paths, whereas sys.argv[0] gets used in situations (e.g. usage messages) where we should retain whatever the OS gave us, since that best reflects what the user entered. That's straightforward to selectively revert, though: remove the "config->run_filename != NULL" clause at Line 2201 in 24dc2f8
That way the OS-provided argv entry will continue to be passed through to sys.argv[0], while everywhere else will get the absolute path. |
What is the use case for having a relative sys.argv[0]?
Right. It has been made on purpose. The rationale can be found in the first message of this issue. |
The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did. 3.7 already broke a few packages by choosing to not do the True/False dance again and make async/await keywords. I took a lot of people by surprise, not because of the issue itself, but because of the change of philosophy compared of what Python had been in the past. Do we really want to make a habit of breaking a few things at every minor release ? It's fair to expect, as a user, that any 3.x will be aiming to be forward compatible. This was the case for Python 2.X and was part of the popularity of the language: a reliable tech. I'm advocating that any breaking change,, even the tiniest, should be behind a __future__ import or a flag, with warnings, and then switched on for good on Python 4.0. Python is a language, not a simple library. Stability is key. Like Linus Torvald use to say: "do not break user space" The fact this change is easy to fix and migrate is not a proper reason to ignore this important software rule, IMO. So many people and so much code rely on Python that the tinest glitch may have massive impact down the line. Not to mention that after a decade of healing from the P2/3 transition, the community needs a rest. Le 21/10/2019 à 13:26, STINNER Victor a écrit :
|
I did a quick search to see what code would break from sys.argv[0] going relative intext:"sys.argv[0]" ext:py site:github.com and while most uses of it are to print usage messages. There is potentially code relying on it being a relative path that will break from this change:
I think Michel and Nick are correct here and the status-quo should be maintained for sys.argv[0]. Michel should not have to enumerate the use cases for a relative sys.argv[0]. |
Nick Coghlan:
Even before this cases, there were different cases where Python does modify sys.argv:
At the C level, the Py_GetArgcArgv() function provides the "original" argc and argv: before they are modified. See open issues: |
I understand that the discussion is about my commit 3939c32 which changed multiple things: I understand that the complain is not about sys.modules['__main__'].__file__ nor sys.path[0], but only sys.argv[0]. The sys.argv documentation says: "The list of command line arguments passed to a Python script. argv[0] is the script name (it is operating system dependent whether this is a full pathname or not)." https://docs.python.org/dev/library/sys.html#sys.argv I understand that before my change, sys.argv[0] was sometimes relative, and sometimes absolute. With my change, sys.argv[0] should now always be asolute. There are cases where an absolute path is preferred. There are cases where a relative path is preferred. I'm trying to understand the scope of the issue. I don't know this code. It seems like sys.argv[0] is used to lookup in a dict, and that dict keys are supposed to be "module names". I don't understand in which cases sys.argv[0] could be a module *name*, since sys.argv[0] seems like a relative or absolute path. The code starts by sys.argv.pop(0): remove sys.argv[0].
if sys.argv[0].startswith('pyperun'): ... I understand that my change broke such test. Did this test work before my change when specifying the absolute path to run the script? Like path/to/pyperun.py rather than pyperun.py? Such code looks fragile: using os.path.basename() would be safer here. home = os.path.expanduser("~")
svnWorkingCopiesPath = os.path.join(home, "svnwork")
gitReposPath = os.path.join(home, "gitrepos")
if sys.argv[0].startswith(svnWorkingCopiesPath)
or sys.argv[0].startswith(gitReposPath): ... On my Linux, os.path.expanduser("~") is an absolute path and so svnWorkingCopiesPath and gitReposPath should be absolute paths, no? My change made this code more reliable, rather than breaking it, no? |
Michel Desmoulin: "The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did." You have to understand that the change has be done to fix some use cases and that Python 3.8.0 has been related with the change. If we change again the behavior, we will get Python < 3.8.0, Python 3.8.0 and Python > 3.8.0 behaving differently. I would prefer to fully understand all use cases before starting to discuss changing the behavior again.
You have to understand that it was known since the start and it was a deliberate choice after balancing advantages and disadvantages. https://www.python.org/dev/peps/pep-0492/#transition-plan I'm only aware of a single function in a single project, Twisted. Are you aware of more projects broken by this change? Did it take long to fix them? Balance it with the ability to use async and await in cases which were not possible in Python 3.6 because they were not real keywords. The Python 3.6 grammar was fully of hack to emulate keywords.
Usually, incompatible changes are discussed to balance if it's worth it or not. There is no plan to break all user code just for your pleasure. For the specific case of argv[0], the discussion occurred here on associated pull requests, and nobody mentioned any risk of backward compatibility. FYI I'm working on this topic and I just proposed a the PEP-606 "Python Compatibility Version" to propose one possible solution to make incompatible changes less painful to users.
My PEP-606 tries to explain incompatible changes are needed:
I understand that but such comment doesn't help the discussion. See my previous comments and please try to help to fill the missing information to help us to take better decisions. Since Python 3.8.0 has been release, a revert can make things even worse that the current state. |
This change isn't in Python 3.8 though, it's only in Python 3.9 (the PR merge date is after the beta 1 branch date). Unless it was backported in a Python 3.8 PR that didn't link back to this issue or the original Python 3.9 PR? |
Oh, you're right. Sorry, I thought that I made the change before 3.8 branch was created. In that case, we can do whatever we want :-) (We are free to revert.) |
Please revert. An absolute path changes semantics in many real world situations (altering symlink traversals, etc). People expect the current sys.argv[0] behavior which is "similar to C argv" and matches what was passed on the interpreter command line. A getcwd() call doesn't even have to succeed. A single file python program should still be able to run in that environment rather than fail to start. To help address the original report filing issue, we could add a notion of .co_cwd to code objects for use in resolving co_filename. Allow it to be '' if getcwd failed at source load time. Code could check if co_cwd exists and join it with the co_filename. Also let co_cwd remain empty when there is no valid co_filename for the code. Preaching: Code that calls os.chdir() is always a potential problem as it alters process global state. That call is best avoided as modifying globals is impolite. |
Ok ok, I will revert my change, but only on sys.argv[0].
The current implementation leaves a path unchanged if it fails to make it absolute: config_run_filename_abspath: wchar_t *abs_filename;
if (_Py_abspath(config->run_filename, &abs_filename) < 0) {
/* failed to get the absolute path of the command line filename:
ignore the error, keep the relative path */
return _PyStatus_OK();
}
if (abs_filename == NULL) {
return _PyStatus_NO_MEMORY();
}
PyMem_RawFree(config->run_filename);
config->run_filename = abs_filename; with: /* Get an absolute path.
On error (ex: fail to get the current directory), return -1.
On memory allocation failure, set *abspath_p to NULL and return 0.
On success, return a newly allocated to *abspath_p to and return 0.
The string must be freed by PyMem_RawFree(). */
int _Py_abspath(const wchar_t *path, wchar_t **abspath_p); |
Do you mean that co_filename attribute of code objects must remain relative?
I thought that calling os.chdir("/") is a good practice when launching a program as a daemon. And thne call os.fork() twice to detach the process from its parent and run it in background. I commonly use os.chdir() in my programs for various reasons. A classical use case is to mimick a shell script doing something like (cd $DIRECTORY; command): temporary change the current directory. cwd parameter of subprocess.Popen helps to avoid changing the currently directory, but sometimes subprocess is not used at all. Maybe avoiding os.chdir() is a good practice, but it's hard to prevent users to call it. I don't see how os.chdir() is bad by design. I'm not sure that it's really useful to debate this neither :-) |
I think sys.argv[0] is the important one as program logic often tends to use that as an input. I'm honestly unsure of if this will be as much of a problem for .co_filename or sys.path[0]. |
(read that as: feel free to keep the behavior for co_filename and path[0] and lets see what happens :) |
Anything new here? 3.9.0a2 is due next week. |
I had it in my TODO list but I lost the bpo number. Thanks for the reminder :-) I wrote PR 17534 that I plan to merge as soon as the CI tests pass. |
I am not sure that if co_filename can break backwards compability or not (because until CodeType.replace; it was usual to break code object construction, people who are using code objects may take this too) but yes PR 17534 was necessary. My initial patch (PR 13527) proposed a future directive to do this (and poorly implemented the concept) which can be helpful (yet it needs a PEP). |
I reverted my change, so I remove "release blocker" priority. |
With the sys.argv[0] change reverted, I think this overall issue is fixed now - code objects will get absolute paths, while sys.argv[0] will continue to reflect how __main__ was identified. |
This says that Is there anything in the official Python documentation that specifies whether |
This issue is closed. I suggest you to open a new issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: