Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sys.stdlib_module_names: list of stdlib module names (Python and extension modules) #87121

Closed
vstinner opened this issue Jan 18, 2021 · 33 comments
Labels
3.10 stdlib

Comments

@vstinner
Copy link
Member

@vstinner vstinner commented Jan 18, 2021

BPO 42955
Nosy @ronaldoussoren, @vstinner, @aroberge, @serhiy-storchaka, @SylvainDe, @corona10, @shihai1991
PRs
  • #24238
  • #24254
  • #24258
  • #24329
  • #24332
  • #24353
  • #25122
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-01-25.12:30:06.176>
    created_at = <Date 2021-01-18.10:08:34.048>
    labels = ['library', '3.10']
    title = 'Add sys.stdlib_module_names: list of stdlib module names (Python and extension modules)'
    updated_at = <Date 2021-04-01.00:28:36.061>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-04-01.00:28:36.061>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-01-25.12:30:06.176>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2021-01-18.10:08:34.048>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42955
    keywords = ['patch']
    message_count = 33.0
    messages = ['385180', '385181', '385186', '385194', '385197', '385200', '385201', '385208', '385250', '385251', '385252', '385253', '385254', '385255', '385270', '385272', '385273', '385300', '385310', '385311', '385312', '385317', '385350', '385364', '385493', '385623', '385624', '385625', '385670', '385671', '385817', '388419', '389942']
    nosy_count = 7.0
    nosy_names = ['ronaldoussoren', 'vstinner', 'aroberge', 'serhiy.storchaka', 'SylvainDe', 'corona10', 'shihai1991']
    pr_nums = ['24238', '24254', '24258', '24329', '24332', '24353', '25122']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue42955'
    versions = ['Python 3.10']

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 18, 2021

    Some use cases require to know if a module comes from the stdlib or not. For example, I would like to only dump extension modules which don't come from the stdlib in bpo-42923: "Py_FatalError(): dump the list of extension modules".

    Stdlib modules are special. For example, the maintenance and updates are connected to the Python lifecycle. Stdlib modules cannot be updated with "pip install --upgrade". They are shipped with the system ("system" Python). They are usually "read only": on Unix, only the root user can write into /usr directory where the stdlib is installed, whereas modules installed with "pip install --user" can be modified by the current user.

    There is a third party project on PyPI which contains the list of stdlib modules:
    https://pypi.org/project/stdlib-list/

    There is already sys.builtin_module_names:
    "A tuple of strings giving the names of all modules that are compiled into this Python interpreter."
    https://docs.python.org/dev/library/sys.html#sys.builtin_module_names

    I propose to add a similar sys.module_names tuple of strings (module names).

    There are different constraints:

    • If we add a public sys attribute, users will likely expect the list to be (1) exhaustive (2) correct
    • Some extensions are not built if there are missing dependencies. Dependencies are checked after Python "core" (the sys module) is built.
    • Some extensions are not available on some platforms.
    • This list should be maintained.

    Should we only list top level packages, or also submodules? For example, only list "asyncio", or list the 31 submodules (asyncio.base_events, asyncio.futures, ...)? Maybe it can be decided on a case by case basis. For example, I consider that "os.path" is a stdlib module, even it's just an alias to "posixpath" or "ntpath" depending on the platform.

    I propose to include all extensions in the list, even if they are not built/available on some platforms. For example, "winsound" would also be listed on Linux, even if the extension is specific to Windows.

    I also propose to include stdlib module names even if they are overridden at runtime using PYTHONPATH with a different implementation. For example, "asyncio" would be in the list, even if an user creates "asyncio.py" file. The list would not depend on sys.path.

    --

    Another option is to add an attribute to modules to mark them as coming from the stdlib. The API would be an attribute: module.__stdlib__ (bool).

    The attribute could be set directly in the module code. For example, add "__stdlib__ = True" in Python modules. Similar idea for C extension modules.

    Or the attribute could be set after importing the module, in the import site. But we don't control how stdlib modules are imported.

    --

    For the specific case of bpo-42923, another option is to use a list of stdlib paths, and check module.__file__ to check if a module is a stdlib module, and also use sys.builtin_module_names. And so don't add any API.

    @vstinner vstinner added 3.10 stdlib labels Jan 18, 2021
    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 18, 2021

    On Fedora 33, the stdlib lives in two main directories:

    • /usr/lib64/python3.9: Python modules
    • /usr/lib64/python3.9/lib-dynload: C extension modules

    Example:

    >>> import os.path
    >>> os.path.dirname(os.__file__) # Python
    '/usr/lib64/python3.9'
    >>> os.path.dirname(_asyncio.__file__)
    '/usr/lib64/python3.9/lib-dynload'

    The Python stdlib path can be retrieved with:

    >>> import sysconfig; sysconfig.get_paths()['stdlib']
    '/usr/lib64/python3.9'

    But I'm not sure how to retrieve /usr/lib64/python3.9/lib-dynload path. "platstdlib" is not what I expect:

    >>> import sysconfig; sysconfig.get_paths()['platstdlib']
    '/usr/lib64/python3.9'

    I found DESTDIR in Makefile:

    >>> sysconfig.get_config_var('DESTSHARED')
    '/usr/lib64/python3.9/lib-dynload'

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Jan 18, 2021

    There is no technical difference between stdlib modules and other modules. Stdlib modules are only special in context of copyright and responsibility.
    What if set __author__ = 'PSF' in every stdib module? Or other attributes which would allow to group modules by origin. Perhaps other authors would use that feature too.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 18, 2021

    I wrote PR 24238 which implements an hardcoded list of stdlib module names. The PR uses the CI to ensure that the list always remain up to date. If tomorrow a new module is added, a CI job fails ("file not up to date").

    What if set __author__ = 'PSF' in every stdib module? Or other attributes which would allow to group modules by origin. Perhaps other authors would use that feature too.

    Hum, that would go against my bpo-42923 use case.

    Also for bpo-42923, I would prefer a tuple of strings ;-)

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 18, 2021

    Use cases for sys.module_names:

    ---

    The isort uses the following script to generate the list of stdlib modules:
    https://github.com/PyCQA/isort/blob/develop/scripts/mkstdlibs.py

    The script uses sphinx.ext.intersphinx.fetch_inventory(...)["py:module"]. This API uses objects.inv from the online Python documentation. Example of Python 3.9:

    https://docs.python.org/3.9/objects.inv

    On the "dev" version, mkstdlibs.py lists 211 modules.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 18, 2021

    (...) This API uses objects.inv from the online Python documentation.

    By the way, there is also this documentation page listing Python stdlib modules:
    https://docs.python.org/dev/py-modindex.html

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 18, 2021

    Another way to gather the list of Python modules: pydoc help("modules") uses pkgutil.walk_packages().

    pydoc uses ModuleScanner which uses sys.builtin_module_names and pkgutil.walk_packages().

    pkgutil.walk_packages() calls pkgutil.iter_modules() which iterates on sys.meta_path and sys.path, and calls iter_modules() on each importer.

    For FileImporter, it iterates on os.listdir() on the importer path.

    For zipimporter, it iterates on zipimport._zip_directory_cache[importer.archive].

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 18, 2021

    Note for myself: regen-keyword should be included in "make regen-all".

    @ronaldoussoren
    Copy link
    Contributor

    @ronaldoussoren ronaldoussoren commented Jan 19, 2021

    A list of stdlib modules/extensions is IMHO problematic for maintenance, esp. if you consider that not all modules/extensions are installed on all systems (both because dependencies aren't present and because packagers have decided to unbundle parts of the stdlib).

    Wouldn't it be sufficient to somehow mark the stdlib entries on sys.path? Although that might give misleading answers with tools like pyinstaller/py2exe/py2app that package an application and its dependencies into a single zipfile.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 19, 2021

    Ronald Oussoren:

    A list of stdlib modules/extensions is IMHO problematic for maintenance, esp. if you consider that not all modules/extensions are installed on all systems (both because dependencies aren't present and because packagers have decided to unbundle parts of the stdlib).

    My PR 24238 adds a list of module names (tuple of str): sys.module_names. It includes modules which are not available on the platform. For example, "winsound" is listed on Linux but not available, and "tkinter" is listed even if the extension could not be built (missing dependency).

    I tried to make it clear in the sys.module_names documentation (see my PR).

    Making the list conditional depending if the module is built or not is causing different issues. For example, for the "isort" (sort and group imports) use case, you want to know on Linux if "import winsound" is a stdlib import or a third party import (same for Linux-only modules on Windows). Moreover, there is a practical issue: extension modules are only built by setup.py *after* the sys module is built. I don't want to rebuild the sys module if building an extension failed.

    Even if a module is available (listed in sys.module_names and the file is present on disk), it doesn't mean that "it works". For example, "import multiprocessing" fails if there is no working lock implementation. Other modules have similar issues.

    Wouldn't it be sufficient to somehow mark the stdlib entries on sys.path? Although that might give misleading answers with tools like pyinstaller/py2exe/py2app that package an application and its dependencies into a single zipfile.

    Having to actually import modules to check if it's a stdlib module or not is not convenient. Many stdlib modules have side effects on import. For example, "import antigravity" opens a web browser. An import can open files, spawn threads, run programs, etc.

    @ronaldoussoren
    Copy link
    Contributor

    @ronaldoussoren ronaldoussoren commented Jan 19, 2021

    > Wouldn't it be sufficient to somehow mark the stdlib entries on sys.path? Although that might give misleading answers with tools like pyinstaller/py2exe/py2app that package an application and its dependencies into a single zipfile.

    Having to actually import modules to check if it's a stdlib module or not is not convenient. Many stdlib modules have side effects on import. For example, "import antigravity" opens a web browser. An import can open files, spawn threads, run programs, etc.

    You wouldn't necessarily have to import a module to test, this is something that could be added to importlib. One (poorly thought out) option is to add sys._stdlib_path with the subsection of sys.path that contains the stdlib, and a function in importlib that returns if a spec is for a stdlib module.

    The disadvantage of this, or for the most part anything but your initial proposal, is that might not be save to use a function in importlib in Py_FatalError.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 19, 2021

    Myself:

    Having to actually import modules to check if it's a stdlib module or not is not convenient. Many stdlib modules have side effects on import.

    The "trace" usecase needs an exhaustive list of all module names. It is even less convenient to have to list all Python modules available on the system only to check modules coming from the stdlib.

    Ronald:

    You wouldn't necessarily have to import a module to test, this is something that could be added to importlib. One (poorly thought out) option is to add sys._stdlib_path with the subsection of sys.path that contains the stdlib, and a function in importlib that returns if a spec is for a stdlib module.

    I'm not sure how it would work. I listed different cases which have different constraints:

    • From a module name, check if it's part of the stdlib or not
    • From a module object, check if it's part of the stdlib or not

    For the test on the module name, how would it work with sys._stdlib_path? Should you import the module and then check if its path comes from sys._stdlib_path?

    Ronald:

    The disadvantage of this, or for the most part anything but your initial proposal, is that might not be save to use a function in importlib in Py_FatalError.

    PR 24254 is a working implementation of my use case: only list third party extension modules on a Python fatal error. It relies on PR 24238 sys.module_names list. The implementation works when called from a signal handler (when faulthandler catch fatal signals like SIGSEGV), it avoids memory allocations on the heap (one of the limits of a signal handler).

    @ronaldoussoren
    Copy link
    Contributor

    @ronaldoussoren ronaldoussoren commented Jan 19, 2021

    On 19 Jan 2021, at 12:30, STINNER Victor <report@bugs.python.org> wrote:

    Ronald:
    > You wouldn't necessarily have to import a module to test, this is something that could be added to importlib. One (poorly thought out) option is to add sys._stdlib_path with the subsection of sys.path that contains the stdlib, and a function in importlib that returns if a spec is for a stdlib module.

    I'm not sure how it would work. I listed different cases which have different constraints:

    • From a module name, check if it's part of the stdlib or not
    • From a module object, check if it's part of the stdlib or not

    For the test on the module name, how would it work with sys._stdlib_path? Should you import the module and then check if its path comes from sys._stdlib_path?

    For a module name use importlib.util.find_spec() to locate the module (or toplevel package if this is a module in package). The new importlib function could then use the spec and sys._stdlib_path to check if the spec is one for a stdlib module. This is pretty handwavy, but I do something similar in py2app (but completely based on paths calculated outside of the import machinery).

    For a module object you can extract the spec from the object and use the same function.

    Ronald:
    > The disadvantage of this, or for the most part anything but your initial proposal, is that might not be save to use a function in importlib in Py_FatalError.

    PR 24254 is a working implementation of my use case: only list third party extension modules on a Python fatal error. It relies on PR 24238 sys.module_names list. The implementation works when called from a signal handler (when faulthandler catch fatal signals like SIGSEGV), it avoids memory allocations on the heap (one of the limits of a signal handler).

    I think we agree on that point: my counter proposal won’t work in the faulthandler scenario, and may be problematic in the Py_FatalError case as well.

    Ronald

    @ronaldoussoren
    Copy link
    Contributor

    @ronaldoussoren ronaldoussoren commented Jan 19, 2021

    BTW. A list of stdlib module names is not sufficient to determine if a module is in the stdlib, thanks to .pth files it is possible to have entries on sys.path before the stdlib.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 19, 2021

    Ronald:

    BTW. A list of stdlib module names is not sufficient to determine if a module is in the stdlib, thanks to .pth files it is possible to have entries on sys.path before the stdlib.

    Yeah, I wrote it in my first message. I solved this issue with documentation:

    "Note: If a third party module has the same name than a standard library module and it comes before the standard library in sys.path, it overrides the standard library module on import."

    I updated sys.module_names documentation in my PR.

    IMO it's an acceptable limitation.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 19, 2021

    Ronald:

    I think we agree on that point: my counter proposal won’t work in the faulthandler scenario, and may be problematic in the Py_FatalError case as well.

    The API providing a tuple of str (sys.module_names) works with the 4 use cases that I listed:

    • faulthandler/Py_FatalError (dump third party extensions of sys.modules)
    • isort (group stdlib imports)
    • trace (don't trace stdlib modules)
    • pypt (ignore stdlib modules when computing dependencies)

    Ronald:

    Although that might give misleading answers with tools like pyinstaller/py2exe/py2app that package an application and its dependencies into a single zipfile.

    These tools worked for years without sys.module_names and don't need to be modified to use sys.module_names.

    sys.module_names is well defined, extract of its doc:

    "The list is the same on all platforms. Modules which are not available on some platforms and modules disabled at Python build are also listed."

    These packaging tools may require further checks than just checking if the name is in sys.module_names. These tools are complex anyway ;-)

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 19, 2021

    One (poorly thought out) option is to add sys._stdlib_path with the subsection of sys.path that contains the stdlib, and a function in importlib that returns if a spec is for a stdlib module.

    There is already sysconfig.get_paths()['stdlib']. Maybe we need to add a new key for extension modules.

    I don't think that these two options are exclusive. For me, it can even be a similar but different use case.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 19, 2021

    New changeset cad8020 by Victor Stinner in branch 'master':
    bpo-42955: Add Python/module_names.h (GH-24258)
    cad8020

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 20, 2021

    sys.module_names also solves the following old StackOverflow question:
    "How to check if a module/library/package is part of the python standard library?"
    https://stackoverflow.com/questions/22195382/how-to-check-if-a-module-library-package-is-part-of-the-python-standard-library

    "I have installed sooo many libraries/modules/packages with pip and now I cannot differentiate which is native to the python standard library and which is not. This causes problem when my code works on my machine but it doesn't work anywhere else."

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 20, 2021

    The pants project also uses a list of stdlib module names, similar to isort, to infer dependencies:
    https://github.com/pantsbuild/pants/tree/master/src/python/pants/backend/python/dependency_inference/python_stdlib

    "Pants is a scalable build system for monorepos: codebases containing multiple projects, often using multiple programming languages and frameworks, in a single unified code repository."

    See https://blog.pantsbuild.org/dependency-inference/

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 20, 2021

    Yet another use case: the friendly-traceback project detects typos on imports using the list of stdlib module names:
    https://aroberge.github.io/friendly-traceback-docs/docs/html/tracebacks_en_3.8.html#standard-library-module

    For example, it suggest to replace "import Tkinter" with "import tkinter".

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 20, 2021

    Another use case similar to the "isort" use case: vim plugin to insert an import, it checks if the module name is a stdlib module. It checks the module path (for Python stdlib modules) and uses an hardcoded list of builtin stdlib modules.

    https://github.com/mgedmin/python-imports.vim

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 20, 2021

    Another example: the unimport project ("linter, formatter for finding and removing unused import statements") uses the following function to guess if a module object is part of the stdlib:

    def is_std(package: str) -> bool:
        """Returns True if package module came with from Python."""
    
        if package in C.BUILTIN_MODULE_NAMES:
            return True
        spec = get_spec(package)
        if spec and isinstance(spec.origin, str):
            return any(
                (
                    spec.origin.startswith(C.STDLIB_PATH),
                    spec.origin in ["built-in", "frozen"],
                    spec.origin.endswith(".so"),
                )
            )
        else:
            return False

    https://github.com/hakancelik96/unimport/blob/c9bd5de99bd8a5239d3dee2c3ff633979bb3ead2/unimport/utils.py#L61-L77

    @SylvainDe
    Copy link
    Mannequin

    @SylvainDe SylvainDe mannequin commented Jan 20, 2021

    For similar reasons as friendly-traceback, I'd be interested in a list of stdlib modules to be able to provide additional information in case of exceptions.

    For instance:

    string.ascii_lowercase

    Before: NameError("name 'string' is not defined",)
    After: NameError("name 'string' is not defined. Did you mean to import string first)

    from maths import pi
    > Before: ImportError('No module named maths',)
    > After: ImportError("No module named maths. Did you mean 'math'?",)

    choice

    Before: NameError("name 'choice' is not defined",)
    After: NameError("name 'choice' is not defined. Did you mean 'choice' from random (not imported)?",)

    from itertools import pi
    > Before: ImportError('cannot import name pi',)
    > After: ImportError("cannot import name pi. Did you mean 'from math import pi'?",)

    The first 2 cases only use the module name but the last 2 cases will actually import the modules to get the names in it and I want to do this for modules which are safe to import (no side-effect expected).

    Source: https://github.com/SylvainDe/DidYouMean-Python/blob/master/didyoumean/didyoumean_internal.py#L30 (but if you are interested in that kind of features, I'd recommend using friendly-traceback instead)

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 22, 2021

    Another potential use case: restrict pydoc web server to stdlib modules, see bpo-42988.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 25, 2021

    New changeset db584bd by Victor Stinner in branch 'master':
    bpo-42955: Add sys.modules_names (GH-24238)
    db584bd

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 25, 2021

    I merged my PR, thanks for the feedback and reviews.

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 25, 2021

    New changeset 4833591 by Victor Stinner in branch 'master':
    bpo-42955: Fix sys.module_names doc (GH-24329)
    4833591

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 25, 2021

    New changeset 9852cb3 by Victor Stinner in branch 'master':
    bpo-42955: Rename module_names to sys.stdlib_module_names (GH-24332)
    9852cb3

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 25, 2021

    Update: attribute renamed to sys.stdlib_module_names.

    @vstinner vstinner changed the title Add sys.module_names: list of stdlib module names (Python and extension modules) Add sys.stdlib_module_names: list of stdlib module names (Python and extension modules) Jan 25, 2021
    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Jan 27, 2021

    New changeset 64fc105 by Victor Stinner in branch 'master':
    bpo-42955: Remove sub-packages from sys.stdlib_module_names (GH-24353)
    64fc105

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Mar 10, 2021

    I also created a thread on python-dev to discuss this new feature:
    https://mail.python.org/archives/list/python-dev@python.org/thread/BTX7SH2CR66QCLER2EXAK2GOUAH2U4CL/

    @vstinner
    Copy link
    Member Author

    @vstinner vstinner commented Apr 1, 2021

    New changeset ad493ed by Victor Stinner in branch 'master':
    bpo-42955: Add _overlapped to sys.stdlib_module_names (GH-25122)
    ad493ed

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 stdlib
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants