Skip to content

[security] CVE-2021-3426: Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem #87154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hroncok mannequin opened this issue Jan 21, 2021 · 33 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-security A security issue

Comments

@hroncok
Copy link
Mannequin

hroncok mannequin commented Jan 21, 2021

BPO 42988
Nosy @malemburg, @gpshead, @vstinner, @ned-deily, @ambv, @serhiy-storchaka, @JulienPalard, @hroncok, @frenzymadness, @miss-islington, @Fidget-Spinner
PRs
  • bpo-42988: Improve pydoc web server security #24285
  • bpo-42988: Fix security issue in the pydoc server #24337
  • bpo-42988: Remove the pydoc getfile feature #25015
  • [3.9] bpo-42988: Remove the pydoc getfile feature (GH-25015) #25064
  • [3.8] bpo-42988: Remove the pydoc getfile feature (GH-25015) #25065
  • [3.7] bpo-42988: Remove the pydoc getfile feature (GH-25015) #25066
  • [3.6] bpo-42988: Remove the pydoc getfile feature (GH-25015) #25067
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-03-29.19:39:54.149>
    created_at = <Date 2021-01-21.12:18:37.837>
    labels = ['type-security', '3.8', '3.9', '3.10', '3.7', 'library']
    title = '[security] CVE-2021-3426: Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem'
    updated_at = <Date 2021-03-29.19:39:54.148>
    user = 'https://github.com/hroncok'

    bugs.python.org fields:

    activity = <Date 2021-03-29.19:39:54.148>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-03-29.19:39:54.149>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2021-01-21.12:18:37.837>
    creator = 'hroncok'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42988
    keywords = ['patch']
    message_count = 30.0
    messages = ['385412', '385413', '385415', '385418', '385420', '385421', '385422', '385435', '385460', '385485', '385488', '385489', '385492', '385503', '385710', '385721', '385866', '386221', '386222', '388399', '388451', '388452', '388455', '388645', '389452', '389695', '389699', '389700', '389710', '389711']
    nosy_count = 11.0
    nosy_names = ['lemburg', 'gregory.p.smith', 'vstinner', 'ned.deily', 'lukasz.langa', 'serhiy.storchaka', 'mdk', 'hroncok', 'frenzy', 'miss-islington', 'kj']
    pr_nums = ['24285', '24337', '25015', '25064', '25065', '25066', '25067']
    priority = 'critical'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue42988'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10']

    @hroncok
    Copy link
    Mannequin Author

    hroncok mannequin commented Jan 21, 2021

    Hello Python security,
    a Fedora user has reported the following security vulnerability to us (I was able to verify it):

    Running pydoc -p allows other local users to extract arbitrary files.

    Steps to Reproduce:

    1. start pydoc on a port
    2. as a different user guess or extract the port
    3. call getfile on the server to extract arbitrary files, e.g. http://localhost:8888/getfile?key=/home/dave/.ssh/id_rsa

    Actual results:
    any local user on the multi-user system can read all my keys and secrets

    Expected results:
    Access is prevented.

    Additional info:
    At least a warning should be printed, that this is insecure on multi-user systems.

    Python notebook works around this by providing a token that is required to access the notepad. Depending on the system being able to read arbitrary files can allow to impersonate my, by e.g. stealing my ssh-key (if it is non-encrypted)

    I've originally reported this to security@python.org but I was asked to open a public issue here.

    @hroncok hroncok mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-security A security issue labels Jan 21, 2021
    @JulienPalard
    Copy link
    Member

    Nice find! I am able to reproduce it too in many Python releases.

    I see differnt ways we can fix it:

    # Using a random secret generated at startup time

    Used any way, like by providing an hmac on getfile urls to ensure they are signed with the server secret.

    Con: getfile URLS won't work from a run to another run (the secret should be random and changed at every start), and can't be shared (do someone share them in the first place?)

    # Allowlist according to sys.path

    In getfile implementation, we can check if the asked path is under a path from sys.path.

    Con: If someone have ~/ in sys.path, we still can access all its home, or if someone start it using python -m pydoc while being in its home directory as Python will prepend the cwd in sys.path.

    # Allowlist populated while generating links

    Idea is: each time the server generates a getfile link, the target is added to an allowlist.

    Each time a getfile link is requested, if the file is not in the allowlist, request is denied.

    Con: Refreshing a page won't work after a server restart (thus having an empty allowlist).

    # fnmatch allowlist

    We could allow only .py files.

    Con: Secrets stored in .py files under user project could still be leaked.

    -----------------

    My personal preference goes for the allowlist populated while generating links.

    @vstinner
    Copy link
    Member

    Downstream Fedora issue: https://bugzilla.redhat.com/show_bug.cgi?id=1917807

    @malemburg
    Copy link
    Member

    Looking at the _url_handler() code in pydoc.py, this was clearly not written with web server standards in mind. None of the handlers apply security checks on the user input and there are most likely several other vulnerabilities in there to be found.

    It's not just the getfile query allowing reading arbitrary files. The user may well have code in his or her Python installation which is not meant to be published to other users on the same server.

    I'd suggest to print a big warning on the console, explaining that the web server will potentially make all content accessible by the user visible to anyone else on the same server.

    Perhaps adding some extra check to the html_getfile() handler would be good as well, making sure that the path is on sys.path and maps to a Python file (there could be non-Python file resources in package dirs as well).

    Alternatively, perhaps the whole getfile logic could be removed and the web server just provide the path to the source file (as file:// link), so that the user can easily open it, but needs access permissions for this to be successful.

    @vstinner
    Copy link
    Member

    I searched for "pydoc by Ka-Ping Yee" in Google and only found two online pydoc services:

    The first one runs on Python 2.7 which doesn't have the getfile feature (added in Python 3.6), the second one is a static website.

    => there is no public vulnerable website: good!

    I don't think that pydoc is commonly used to run a server on the Internet.

    @vstinner vstinner changed the title Information disclosure via pydoc -p Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem Jan 21, 2021
    @vstinner vstinner changed the title Information disclosure via pydoc -p Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem Jan 21, 2021
    @vstinner
    Copy link
    Member

    An option is also to remove the whole getfile feature. It was added in bpo-2001 by:

    commit 7bb30b7
    Author: Nick Coghlan <ncoghlan@gmail.com>
    Date: Fri Dec 3 09:29:11 2010 +0000

    Improve Pydoc interactive browsing (bpo-2001).  Patch by Ron Adam.
    
    * A -b option to start an enhanced browsing session.
    * Allow -b and -p options to be used together.
    * Specifying port 0 will pick an arbitrary unused socket port.
    * A new browse() function to start the new server and browser.
    * Show Python version information in the header.
    * A *Get* field which takes the same input as the help() function.
    * A *Search* field which replaces the Tkinter search box.
    * Links to *Module Index*, *Topics*, and *Keywords*.
    * Improved source file viewing.
    * An HTMLDoc.filelink() method.
    * The -g option and the gui() and serve() functions are deprecated.
    

    @vstinner
    Copy link
    Member

    The getfile feature is used to display the source code of a Python module.

    For example, on the difflib documentation, there a link to difflib.py. If you click, a webpage displays the content of the file.

    I suggest to remove the whole feature. I don't think that it's so useful, compared to the vulnerability.

    @Fidget-Spinner
    Copy link
    Member

    I created a PR to remove the getfile function - now it just places the hyperlinked file path there but clicking on it won't render the file contents.

    Personally I agree with Marc-Andre Lemburg's comments on how _url_handler probably has other vulnerabilities somewhere. But I don't really see an easy solution other than removing the web server altogether. It uses http.server, which has a disclaimer on the docs page saying it isn't recommended for production. Someone looking hard enough can probably find a few more vulnerabilities in http.server itself rather than just pydoc.

    I think the "Allowlist populated while generating links" suggested by Julien is pretty clever.

    I thought about file: // approach too - it's probably the most secure. But it would require a lot of change (and also generating all the .py files to .html initially).

    Maybe I'll make a PR exploring the other approaches if the current one isn't favorable.

    Thanks for your time.

    @vstinner
    Copy link
    Member

    I'd suggest to print a big warning on the console, explaining that the web server will potentially make all content accessible by the user visible to anyone else on the same server.

    I dislike this idea. If they are vulnerabilities, they should be fixed. Users usually have no idea what to do when seeing such warning.

    @malemburg
    Copy link
    Member

    On 22.01.2021 01:28, STINNER Victor wrote:

    STINNER Victor <vstinner@python.org> added the comment:

    > I'd suggest to print a big warning on the console, explaining that the web server will potentially make all content accessible by the user visible to anyone else on the same server.

    I dislike this idea. If they are vulnerabilities, they should be fixed. Users usually have no idea what to do when seeing such warning.

    The problem is that neither the docs nor the help text in the command
    make it clear what exactly is exposed via the web server pydoc
    launches.

    While the getfile API endpoint can be used to view non-Python files
    as well (which is certainly not intended), the tool also makes available
    all Python modules which can be found on sys.path of the user starting
    pydoc -p. It shows all doc-strings, functions, the class structure and
    literal values of any constants found in those modules.

    In a corporate environment this can easily result in data leaks
    of e.g. unreleased software, personal information, disclosure of
    NDA protected code, designs, algorithms and other secrets.

    Fixing just getfile or replacing those links with file:// ones will
    only address one part of the problem. The other is educating the
    user about possible consequences of running a server on the machine
    -- just like you warn users about deleting files before going ahead
    with it.

    Python's http.server at least warns about this in the docs:
    https://docs.python.org/3/library/http.server.html
    and limits the serving to the current dir (and subdirs).

    My guess is that pydoc -p really is just intended to be useful
    for the current user. Rather than having it serve files under
    a blanket URL, it could restrict browsing to a random URL
    token generated at pydoc startup and open this in the browser
    via the "b" command or the -b option, e.g.

    """
    Server ready at http://localhost:8080/uLy6t87AD-ScPthd/
    Server commands: [b]rowser, [q]uit
    server>
    """

    That would make it harder to guess the base URL and limit
    exposure.

    @serhiy-storchaka
    Copy link
    Member

    Why not limit the serving to sys.path?

    @vstinner
    Copy link
    Member

    Fidget-Spinner wrote on the PR:

    AFAIK no. However, pydoc currently works by calling inspect on files it sees in path, and this may reveal private code as Marc-Andre Lemburg pointed out on the bpo. I will try the random url token he suggested via secrets.token_urlsafe to see if it helps.

    pydoc shows global constant values in the doc. So yes, if you find a settings.py of a Django project, you can discover secrets.

    I'm working on bpo-42955 "Add sys.module_names: list of stdlib module names (Python and extension modules)".

    One option would be to restrict pydoc to stdlib modules by defaults, and ask to opt-in for discovery of any module installed on the system (sys.path).

    @vstinner
    Copy link
    Member

    Python's http.server at least warns about this in the docs:
    https://docs.python.org/3/library/http.server.html
    and limits the serving to the current dir (and subdirs).

    I would be fine with a warning in the pydoc documentation, but I dislike warnings display on the command line. When I see such warning, I feel that the machine considers that I'm dumb and I have no idea of what I am doing.

    If it's unsafe, can we make it safe by default?

    @Fidget-Spinner
    Copy link
    Member

    I have updated the PR to do the following:

    • removed html_getfile
    • implement a unique secret as suggested above
    Now it says:
    >>> python.exe -m pydoc -b
    Server ready at http://localhost:52035/Y1YzOyEbitE9BB_dtH0YXbMgGXbcg3ytXLpvpg8P7GEM3z1hlCkTXgxaojtAordVqs2s6oHZHPMbXqq9mXq_wbJCVW8jnHrgQeYE5hFUQuI/

    FWIW, it seems that Jupyter notebook server deals with the same problems in a similar manner: https://jupyter-notebook.readthedocs.io/en/stable/security.html#security-in-the-jupyter-notebook-server

    I removed the warning message in the PR because I think this is secure enough.

    @serhiy-storchaka
    Copy link
    Member

    PR 24337 uses different approach. It keeps compatibility, but checks that the argument is a file path of the source of one of modules (using the same algorithm as /search).

    @Fidget-Spinner
    Copy link
    Member

    @serhiy,

    While this approach solves the getfile problem, I don't think this will solve the other problem of pydoc leaking secrets stored in python files:

    Quoting from Marc-Andre Lemburg's message:

    the tool also makes available all Python modules which can be found on sys.path of the user starting pydoc -p. It shows all doc-strings, functions, the class structure and literal values of any constants found in those modules.
    In a corporate environment this can easily result in data leaks of e.g. unreleased software, personal information, disclosure of NDA protected code, designs, algorithms and other secrets.

    Quoting from Victor's messages:

    pydoc shows global constant values in the doc. So yes, if you find a settings.py of a Django project, you can discover secrets.

    Ultimately, the problem seems to be that .py files (other than those in the stdlib) may contain sensitive info, which pydoc can read.

    @ned-deily
    Copy link
    Member

    Resolution of this issue is blocking 3.7.x and 3.6.x security releases and threatens to block upcoming maintenance releases.

    @ned-deily ned-deily changed the title Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem [security] Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem Jan 28, 2021
    @ned-deily ned-deily changed the title Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem [security] Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem Jan 28, 2021
    @vstinner
    Copy link
    Member

    vstinner commented Feb 3, 2021

    While this vulnerability is bad, it only impacts very few users who run pydoc server. I suggest to not hold the incoming Python release (remove the "release blocker" priority) just for this one. If it's fixed before: great! But IMO it can wait for another Python release.

    @hroncok
    Copy link
    Mannequin Author

    hroncok mannequin commented Feb 3, 2021

    I agree.

    @hroncok
    Copy link
    Mannequin Author

    hroncok mannequin commented Mar 10, 2021

    Todd Cullum from Red Hat Security team:

    "I don't have an account on Python's tracker, would you mind forwarding to upstream on my behalf that this is not only locally exploitable, but it can be exploited by actors on the adjacent network as well because 6a396c9 was introduced in Python 3.7.0 alpha 1. I just used the -n option and got to read some of my own files using my cell phone on the WiFi. It does require the port to be unblocked by firewall though."

    @hroncok
    Copy link
    Mannequin Author

    hroncok mannequin commented Mar 10, 2021

    This is now CVE-2021-3426.

    @hroncok hroncok mannequin changed the title [security] Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem [security] CVE-2021-3426: Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem Mar 10, 2021
    @hroncok hroncok mannequin changed the title [security] Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem [security] CVE-2021-3426: Information disclosure via pydoc -p: /getfile?key=path allows to read arbitrary file on the filesystem Mar 10, 2021
    @vstinner
    Copy link
    Member

    I created https://python-security.readthedocs.io/vuln/pydoc-getfile.html to track this vulnerability. The is no CVE section yet since the CVE is currently only *RESERVED*.

    @vstinner
    Copy link
    Member

    Fedora downstream issue: https://bugzilla.redhat.com/show_bug.cgi?id=1937476

    @gpshead
    Copy link
    Member

    gpshead commented Mar 14, 2021

    FWIW, I don't think we should even have a server feature in pydoc...

    @vstinner
    Copy link
    Member

    vstinner commented Mar 24, 2021

    The "pydoc -p port" command only listen on the local link ("localhost") by default, even if it's possible to listen on another IPv4 address using -n HOSTNAME command line option.

    While the "getfile" feature is convenient when the pydoc server is accessed from a different machine, I don't think that it's worth it, compared to the security risks and the complexity of PR 24285 and PR 24337 fixes.

    I propose to simply remove the "getfile" feature with PR 25015, but keep links using file:// scheme. So we delegate the security to the web browser. If the web browser is allowed to read sensitive data of a local Python file, it's not our problem: pydoc doesn't make things worse.

    @vstinner
    Copy link
    Member

    New changeset 9b99947 by Victor Stinner in branch 'master':
    bpo-42988: Remove the pydoc getfile feature (GH-25015)
    9b99947

    @miss-islington
    Copy link
    Contributor

    New changeset 7e38d33 by Miss Islington (bot) in branch '3.8':
    bpo-42988: Remove the pydoc getfile feature (GH-25015)
    7e38d33

    @miss-islington
    Copy link
    Contributor

    New changeset ed753d9 by Miss Islington (bot) in branch '3.9':
    bpo-42988: Remove the pydoc getfile feature (GH-25015)
    ed753d9

    @ned-deily
    Copy link
    Member

    New changeset 7c2284f by Miss Islington (bot) in branch '3.7':
    bpo-42988: Remove the pydoc getfile feature (GH-25015) (bpo-25066)
    7c2284f

    @ned-deily
    Copy link
    Member

    New changeset 5b1e502 by Miss Islington (bot) in branch '3.6':
    bpo-42988: Remove the pydoc getfile feature (GH-25015) (GH-25067)
    5b1e502

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @apetro76
    Copy link

    Hello Python security, a Fedora user has reported the following security vulnerability to us (I was able to verify it):

    Running pydoc -p allows other local users to extract arbitrary files.

    Steps to Reproduce:

    1. start pydoc on a port
    2. as a different user guess or extract the port
    3. call getfile on the server to extract arbitrary files, e.g. http://localhost:8888/getfile?key=/home/dave/.ssh/id_rsa

    Actual results: any local user on the multi-user system can read all my keys and secrets

    Expected results: Access is prevented.

    Additional info: At least a warning should be printed, that this is insecure on multi-user systems.

    Python notebook works around this by providing a token that is required to access the notepad. Depending on the system being able to read arbitrary files can allow to impersonate my, by e.g. stealing my ssh-key (if it is non-encrypted)

    I've originally reported this to security@python.org but I was asked to open a public issue here.

    Curious, what is actually being called out as a vulnerability here? The fact that pydoc doesn't enforce authentication? Seems this works the same as any other service/application works (when accessing the application/service, you are granted the permissions of the account running that application/service when it interacts with the operating system). I can think of dozens of applications and protocols that would be victim to this same finding. For example, running python -m http.server 80 . from a home directory does the same thing (still works even on version 3.11) does that mean that python3.11 is broken?

    @gpshead
    Copy link
    Member

    gpshead commented Aug 28, 2023

    It's a matter of meeting expectations for users of a tool so that they know when they're choosing to expose themselves to risks.

    Python's standard library http.server module is documented as not suitable for production use. ie: the user chooses to run that despite being warned, that isn't a flaw, that is their choice.

    pydoc's former built in web server could reasonably have been expected not to serve other data, as that isn't what the user asked for or would ever assume it would allow. Thus it was reasonable to consider it doing so a vulnerability.

    @apetro76
    Copy link

    thanks for the explanation, that does make a bit more sense. Personally I am not familiar with pydoc, so when reading the vulnerability through the first time it sounded no different than if nginx, apache or iis was misconfigured, or running a tftp, ftp, or smb share that disclosed sensitive documents or for that matter any other production application that is used for file sharing. I appreciate the explanation and will chalk it up to my ignorance to pydoc and how its used.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants