Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid adding an empty directory to sys.path when running a module with -m #77234

Closed
ztane mannequin opened this issue Mar 12, 2018 · 30 comments
Closed

Avoid adding an empty directory to sys.path when running a module with -m #77234

ztane mannequin opened this issue Mar 12, 2018 · 30 comments
Assignees
Labels
3.7 3.8 interpreter-core type-feature

Comments

@ztane
Copy link
Mannequin

@ztane ztane mannequin commented Mar 12, 2018

BPO 33053
Nosy @brettcannon, @ncoghlan, @vstinner, @jwilk, @ned-deily, @njsmith, @ericsnowcurrently, @eryksun, @ztane
PRs
  • #6231
  • #6237
  • #6236
  • #9589
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2018-03-25.13:50:03.376>
    created_at = <Date 2018-03-12.11:57:14.902>
    labels = ['interpreter-core', '3.8', 'type-feature', '3.7']
    title = 'Avoid adding an empty directory to sys.path when running a module with `-m`'
    updated_at = <Date 2018-09-26.16:09:42.678>
    user = 'https://github.com/ztane'

    bugs.python.org fields:

    activity = <Date 2018-09-26.16:09:42.678>
    actor = 'vstinner'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2018-03-25.13:50:03.376>
    closer = 'ncoghlan'
    components = ['Interpreter Core']
    creation = <Date 2018-03-12.11:57:14.902>
    creator = 'ztane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33053
    keywords = ['patch']
    message_count = 30.0
    messages = ['313641', '313966', '314020', '314021', '314023', '314025', '314028', '314036', '314040', '314081', '314088', '314089', '314092', '314130', '314192', '314214', '314233', '314248', '314276', '314403', '314404', '314411', '314412', '314414', '314415', '314431', '314432', '315043', '315079', '326481']
    nosy_count = 10.0
    nosy_names = ['brett.cannon', 'ncoghlan', 'vstinner', 'jwilk', 'ned.deily', 'njs', 'eric.snow', 'eryksun', 'ztane', 'Viv']
    pr_nums = ['6231', '6237', '6236', '9589']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue33053'
    versions = ['Python 3.7', 'Python 3.8']

    @ztane
    Copy link
    Mannequin Author

    @ztane ztane mannequin commented Mar 12, 2018

    I think this is a really stupid security bug. Running a module with -mmodule seems to add '' as a path in sys.path, and in front. This is doubly wrong, because '' will stand for whatever the current working directory might happen to be at the time of the subsequent import statements, i.e. it is far worse than https://bugs.python.org/issue16202

    I.e. whereas python3 /usr/lib/module.py wouldn't do that, python3 -mmodule would make it so that following a chdirs in code, imports would be executed from arbitrary locations. Verified on MacOS X, Ubuntu 17.10, using variety of Python versions up to 3.7.

    @ztane ztane mannequin added type-security interpreter-core labels Mar 12, 2018
    @jwilk
    Copy link
    Mannequin

    @jwilk jwilk mannequin commented Mar 16, 2018

    FWIW, this behavior is documented:

    https://docs.python.org/3/using/cmdline.html#cmdoption-m
    "As with the -c option, the current directory will be added to the start of sys.path."

    With the -c option, at least you could easily remove the sys.path element yourself:

    python -c 'import sys; sys.path.remove(""); ...'

    (This works, because sys is always a builtin module, so it won't be imported from cwd.)

    I don't see any obvious way to make "python -m foo" secure in untrusted cwd, though.
    The best I could come up with is:

    python -c 'import sys; sys.path.remove(""); import runpy; runpy._run_module_as_main("foo")'

    which is quite insane.

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 18, 2018

    This isn't considered a security issue, as running "python3" interactively behaves in exactly the same way (i.e. tracking changes to the current working directory for the duration of the session), and running "python3 script.py" adds the full path to the current directory.

    In all cases, the expectation is that end users will at least enable isolated mode if they don't want to risk importing arbitrary code from user controlled directories.

        $ echo "print('Hello')" > foo.py
        $ python3 -m foo
        Hello
        $ python3 -Im foo
        /usr/bin/python3: No module named foo

    However, I'm flagging this as an enhancement request for 3.8+ (with a reworded issue title), as the non-isolated -m switch algorithm for sys.path[0] calculation could be made more robust as follows:

    1. Start out with "os.getcwd()" rather than the empty string
    2. Once __main__.__file__ has been calculated, delete sys.path[0] if main was found somewhere else

    A potentially related enhancement would be to modify directory & zipfile execution to only look for __main__.py in sys.path[0] rather than searching the whole of sys.path (which is what currently happens).

    @ncoghlan ncoghlan added the 3.8 label Mar 18, 2018
    @ncoghlan ncoghlan changed the title Running a module with -m will add empty directory to sys.path Avoid adding an empty directory to sys.path when running a module with -m Mar 18, 2018
    @ncoghlan ncoghlan added type-feature and removed type-security labels Mar 18, 2018
    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 18, 2018

    Also, a small upstream community interaction tip: if you want people to seriously consider your requests for changes in default behaviour (which inevitably risk backwards compatibility breaks), don't start out by insulting them.

    Python's defaults are currently set up for a *trusted personal automation tool*, where the person writing the code is also the person running it.

    By starting out with an insult like "I think this is a really stupid security bug", you're actually saying "I know very little about Python's history, or the audiences it was originally written to serve, and instead of politely suggesting an alternative behaviour that would be more robust in the face of system configuration errors, I'm going to try to use shame, guilt, and embarrassment to get people to do work for me". That kind of behaviour *isn't* a good way to get your issues addressed, but it *is* a good way to encourage people to decide that volunteering as an open source maintainer isn't worth the associated hassles.

    The opening insult added nothing to your issue report, and could more productively have been replaced with an explanation of the expectations you had of the default behaviour, how you came by those expectations, and how the current behaviour failed to meet them.

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 18, 2018

    I've also separated out https://bugs.python.org/issue33095 (a docs issue about making isolated mode more discoverable) based on the jwilk's comment that it wasn't clear how to disable the default "add the current directory to sys.path" behaviour.

    @njsmith
    Copy link
    Contributor

    @njsmith njsmith commented Mar 18, 2018

    Whoa, wait, what?

    I agree that the original post is not as diplomatic as it could be, but my reaction to learning about this just now is also shock and confusion, so I guess I can sympathize with the OP a bit...

    The reason I'm surprised is that -- while this probably wasn't fully anticipated when -m was designed -- it's turned out to be a bit of a meme to replace calls like 'pip ...' with 'python -m pip ...', or 'virtualenv ...' with 'python -m virtualenv ...', etc. I thought these were generally pretty much equivalent. I definitely did *not* know that running 'python -m pip' could lead to executing arbitrary code from the cwd, and I'm sure I've run it inside e.g. random git checkouts. If someone had tried to spearphish me with this they would totally have succeeded. (I hope they haven't?)

    If you want to run a file in the current directory, is there any advantage to doing 'python -m myscript' instead of 'python myscript.py'? Could we declare that the latter is the One Obvious Way and remove support for the former entirely?

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 18, 2018

    "python -m mypkg.myscript" does the right thing as far as local packages are concerned, whereas "python -m mypkg/myscript.py" will set you up for double-import bugs.

    Note that you can almost always trigger arbitrary non-obvious code execution just by writing sitecustomize.py to the current directory, and any package you install can add a "<installation-site-packages>/arbitrary-code.pth" or "<user-site-packages>/arbitrary-code.pth" file that gets run at startup (setuptools has long relied on this to implement various features).

    Opting in to isolated mode turns *all* of those features off by saying "I'm expecting to run system code only here, not custom user code".

    @jwilk
    Copy link
    Mannequin

    @jwilk jwilk mannequin commented Mar 18, 2018

    -I implies -s, which is not something I want.

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 18, 2018

    https://bugs.python.org/issue13475 is the existing enhancement request to expose sys.path[0] management independently of the other execution isolation features.

    @njsmith
    Copy link
    Contributor

    @njsmith njsmith commented Mar 19, 2018

    @ncoghlan: The comparison I'm worried about is specifically this one: IIUC, right now it's safe to run 'pip --version' in an arbitrary directory, but it's not safe to run 'python -m pip --version' in an arbitrary directory. Am I wrong? (I actually couldn't convince either version to execute arbitrary code in 2 minutes of trying, but that's my understanding of the discussion so far.)

    If that's correct, then I don't think this is like... the hugest security bug ever, but... I also think that it's irresponsible for e.g. packaging.python.org to be recommending people run 'python -m pip' the way it does now, and we need to find some way to change things so our beginner docs aren't triggering arbitrary code execution in a rare and subtle case.

    We could add a bunch of warnings to packaging.python.org, explaining about how the code execution can be triggered, but that seems unsatisfactory given how those docs are targeted at beginners, plus there are so many places around the internet that recommend 'python -m pip' we'd never find them all.

    We could update all those docs to instead recommend 'python -Im pip', but that has the same problems: no-one will understand, and people won't do it.

    We could stop recommending 'python -m pip' entirely, but that runs into all the problems that have motivated this in the first place.

    So I think we should find a way to make it so 'python -m pip' *never* executes code from the current directory (modulo the usual caveats, like the user explicitly setting PYTHONPATH to an insecure value etc.).

    If 'python -m mypkg.myscript' is important, maybe we can make it 'PYTHONPATH=. python -m mypkg.myscript', or 'python -M mypkg.myscript', or making 'python mypkg/myscript.py' DTRT, or... something?

    @ztane
    Copy link
    Mannequin Author

    @ztane ztane mannequin commented Mar 19, 2018

    Took 2 seconds.

    % sudo python3 -mpip --version
    hello world
    Traceback (most recent call last):
      File "/usr/lib/python3.6/runpy.py", line 183, in _run_module_as_main
        mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
      File "/usr/lib/python3.6/runpy.py", line 142, in _get_module_details
        return _get_module_details(pkg_main_name, error)
      File "/usr/lib/python3.6/runpy.py", line 109, in _get_module_details
        __import__(pkg_name)
      File "/usr/lib/python3/dist-packages/pip/__init__.py", line 4, in <module>
        import locale
      File "/usr/lib/python3.6/locale.py", line 180, in <module>
        _percent_re = re.compile(r'%(?:\((?P<key>.*?)\))?'
    AttributeError: module 're' has no attribute 'compile'
    Error in sys.excepthook:
    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 53, in apport_excepthook
        if not enabled():
      File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 28, in enabled
        return re.search(r'^\s*enabled\s*=\s*0\s*$', conf, re.M) is None
    AttributeError: module 're' has no attribute 'search'
    
    Original exception was:
    Traceback (most recent call last):
      File "/usr/lib/python3.6/runpy.py", line 183, in _run_module_as_main
        mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
      File "/usr/lib/python3.6/runpy.py", line 142, in _get_module_details
        return _get_module_details(pkg_main_name, error)
      File "/usr/lib/python3.6/runpy.py", line 109, in _get_module_details
        __import__(pkg_name)
      File "/usr/lib/python3/dist-packages/pip/__init__.py", line 4, in <module>
        import locale
      File "/usr/lib/python3.6/locale.py", line 180, in <module>
        _percent_re = re.compile(r'%(?:\((?P<key>.*?)\))?'
    AttributeError: module 're' has no attribute 'compile'

    Same for python -mhttp.server, say.

    ----

    I'd prefer there be a change that the default be always safe from some version on, so that the REPL can do whatever it does, but -m etc probably shouldn't even have neither the initial current directory nor the current current directory in the path unless the interactive session is requested. I am not worried about the garbage that the user would have installed in their own directories breaking things.

    @njsmith
    Copy link
    Contributor

    @njsmith njsmith commented Mar 19, 2018

    Ah, yeah, I see:

    ~/t$ echo 'print("hi")' > re.py
    ~/t$ pip --version
    pip 9.0.1 from /home/njs/.user-python3.5-64bit/local/lib/python3.5/site-packages (python 3.5)
    ~/t$ python -m pip --version
    hi
    Traceback (most recent call last):
    [...]

    But if I create a sitecustomize.py or an io.py in the current directory (knowing that 'import io' happens implicitly startup), then those *don't* seem to get picked up by 'python -m pip' or 'python -c ...' or plain 'python'. I guess the cwd doesn't get added to sys.path until after initial bootstrapping is finished.

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 19, 2018

    It occurs to me that there may be some additional unshared context here: the way python -m pip searches for the module to execute is much closer to the way Windows searches for a command like pip (i.e. current directory first) than it is to the way *nix systems search for it (i.e. if the current directory isn't on PATH it must be specified explicitly as "./pip"). If you spend a lot of time on Windows systems, or working with Windows users, it becomes a habit to assume that folks aren't going to expect it to be safe to run arbitrary commands from arbitrary directories.

    This behaviour means that if you want to intercept "python -m pip", the current easiest filename to use to intercept it is just "pip.py", similar to the way you can use "pip.exe" or "python.exe" to intercept those commands on Windows.

    I do think switching from a Windows-style default search behaviour to a *nix style default search behaviour is potentially reasonable, but the related backwards compatibility considerations would likely push it into being a PEP level change.

    I'll also note that Nathaniel's right that I was wrong about sitecustomize always being easy to intercept from the current directory, though, as sys.path[0] gets set after "import site" has already executed.

    I was just confusing myself, as my default approach to double-checking the behaviour of the "-m" switch is to run "python -m site", but that's misleading in this case since doing that also re-runs the site and user customisation modules (and will pick them up from the current working directory) - it's closely akin to testing python3 -c "import runpy; runpy.run_module('site', run_name='__main__')"

    @eryksun
    Copy link
    Contributor

    @eryksun eryksun commented Mar 20, 2018

    the way python -m pip searches for the module to execute is much
    closer to the way Windows searches for a command like pip (i.e.
    current directory first)

    That's classic Windows behavior. However, search paths for CreateProcess and the loader are composed on demand, which allows different behavior to be selected easily enough. The current behavior depends on a mix of environment variables, registry settings, reserved names (known DLLs, API sets), application and DLL manifests, .local redirection, and in-process configuration. For example, to skip the working directory when searching for DLLs, there's the CWDIllegalInDllSearch registry setting, SetDllDirectoryW(L""), or SetDefaultDllDirectories (the latter removes PATH as well, so it's not suited for loosely-couple systems). To skip the working directory when searching for executables, define the environment variable "NoDefaultCurrentDirectoryInExePath".

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 21, 2018

    (Adding the other import system maintainers to the nosy list, along with Ned as the release manager for 3.6 and 3.7)

    Summarizing my current thoughts on this:

    I think the fact that "-m" currently adds the empty directory as sys.path[0] instead of the fully resolved cwd is definitely worth changing for 3.8, and is at least arguably a bugfix that should be applied to 3.7 prior to release. (It's probably too late in 3.6's lifecycle to be worth backporting that far, even if we do reclassify this part as a bugfix instead of an enhancement)

    The above is what I consider to be in scope for *this* particular issue.

    (The details of the current behaviour probably arose from my borrowing an existing code path originally designed for "-c" rather than from a deliberate design decision)

    As a separate question, I'd personally be open to the idea of:

    1. Deprecating the implicit addition of the current working directory to sys.path (and have that emit a deprecation warning in 3.8)

    2. Add support for an explicit leading dot on "-m" arguments to say "Run this from the current directory" (the -m switch was added in 2.4, while the explicit relative import syntax wasn't chosen until 2.5, so this wasn't an available option for the original feature design). Note: given implicit namespace packages, it may make sense to allow additional leading dots in order to add a parent directory to sys.path instead of the current directory.

    3. Add a "--basepath <dir>" argument to have "-m" resolve explicit relative imports against a directory other than the current one (this would be a different potential spelling for https://bugs.python.org/issue13475 )

    4. Once the current directory is no longer added implicitly (in 3.9 at the earliest), if runpy fails to find the main module, then check the current directory and ask "Did you mean '.name'?" in the error message.

    However, that second part is the part I consider out of scope for this issue, and think would require a PEP due to the backwards compatibility challenges.

    @brettcannon
    Copy link
    Member

    @brettcannon brettcannon commented Mar 21, 2018

    The proposed change for 3.8 and 3.7 seems reasonable to me.

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Mar 22, 2018

    I am OK with changing '' to the initial resolved working directory for 3.7 as long as we get that change in now, i.e. before the 3.7.0b3 ABI freeze next Monday. And I agree that we should not attempt to change this behavior for 3.6.x at this stage of its life cycle.

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 22, 2018

    Brett or Eric, any chance one of you could run with this for 3.7b3? I already have a startup refactoring related regression that I'm aiming to have fixed before then.

    Thanks to Victor's refactoring work, there's at least a clear interception point now where we can treat the empty string differently depending on whether the command line option was -c, -m, or not specified at all: https://github.com/python/cpython/blob/master/Python/pathconfig.c#L259

    As an initial attempt, I think the necessary fix will be along the lines of checking for 'n == 0 && argc > 1 && wcscmp(argv0, L"-m") == 0', and resolving the current working directory in that case.

    @brettcannon
    Copy link
    Member

    @brettcannon brettcannon commented Mar 22, 2018

    I can't make any promises unfortunately.

    @ncoghlan ncoghlan self-assigned this Mar 25, 2018
    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 25, 2018

    PR posted with the change to use an absolute path for the starting working directory in the "-m" case.

    That PR also includes a change to improve the fidelity of the test suite: back when I first wrote test_cmd_line_script, I was mainly focused on testing the runpy aspects, and not the sys.path initialisation aspects, so the way the tests worked didn't really check the latter properly.

    The test updates get rid of the launch script that was previously confusing matters, and instead test sys.path[0] initialisation properly (relying on PYTHONPATH for the zipimport related cases where just changing the working directory isn't sufficient).

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 25, 2018

    It turned out some tests in CPython's own test suite were implicitly relying on the old behaviour where the current working directory automatically ended up on sys.path (see the changes to test_bdb and test_doctest in the updated PR).

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 25, 2018

    Initial fix has been merged to master, CI runs pending for the backport to 3.7 and a follow-up master branch PR to remove a debugging print I noticed when resolving a test_import conflict in the backport.

    I won't get to merging those until some time after work tomorrow (probably 8 pm'ish in UTC+10), so if anyone wanted to merge them before that, it would likely be a good idea :)

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 25, 2018

    3.7 CI finished before I logged off for the night, so this is good to go for 3.7.0b3 now :)

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 25, 2018

    New changeset a9e5d0e by Nick Coghlan in branch 'master':
    bpo-33053: Remove test_cmd_line_script debugging print (GH-6237)
    a9e5d0e

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 25, 2018

    Marking as fixed, since this is now the version likely to go out in 3.7.0b3 - if we find further problems with it (beyond the potential enhancement discussed above to make local directory usage opt-in), then those can go in a new issue.

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 25, 2018

    New changeset d5d9e02 by Nick Coghlan in branch 'master':
    bpo-33053: -m now adds *starting* directory to sys.path (GH-6231)
    d5d9e02

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Mar 25, 2018

    New changeset ee37845 by Nick Coghlan in branch '3.7':
    bpo-33053: -m now adds *starting* directory to sys.path (GH-6231) (bpo-6236)
    ee37845

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Apr 6, 2018

    (See bpo-33185 for regression.)

    @ncoghlan
    Copy link
    Contributor

    @ncoghlan ncoghlan commented Apr 8, 2018

    Some notes from my investigation of bpo-33185 that seem more appropriate here, rather than on that issue:

    • several of the developer-centric utilities in the standard library have a shared need to be friendly to imports from the current working directory.
    • timeit uses os.curdir, but could be switched to os.getcwd()
    • pydoc uses a literal '.', but could be switched to os.getcwd()
    • trace, profile, cProfile, pdb, doctest, and IDLE's pyshell all add the directory containing the file under test

    Aside from switching pydoc from a literal '.' to os.curdir, I'm not going to change any of those (hence why I'm putting these notes here), but I wanted to capture this info in case does decide to follow through on a "less isolated than isolated mode, but still omits the current directory from sys.path" execution mode.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Sep 26, 2018

    New changeset 43500a5 by Victor Stinner in branch '3.6':
    bpo-28655: Fix test_import.test_missing_source_legacy() (GH-9589)
    43500a5

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 3.8 interpreter-core type-feature
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants