Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"import random" should import hashlib on demand (nor load OpenSSL) #80740

Closed
vstinner opened this issue Apr 8, 2019 · 15 comments
Closed

"import random" should import hashlib on demand (nor load OpenSSL) #80740

vstinner opened this issue Apr 8, 2019 · 15 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

vstinner commented Apr 8, 2019

BPO 36559
Nosy @brettcannon, @rhettinger, @vstinner, @tiran, @methane, @xdegaye
PRs
  • bpo-36559: random: import hashlib on demand #12728
  • bpo-36559: random module: optimize sha512 import #12742
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-04-10.20:18:43.217>
    created_at = <Date 2019-04-08.13:11:01.718>
    labels = ['3.8', 'library']
    title = '"import random" should import hashlib on demand (nor load OpenSSL)'
    updated_at = <Date 2019-04-10.20:18:43.216>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-04-10.20:18:43.216>
    actor = 'rhettinger'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-04-10.20:18:43.217>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2019-04-08.13:11:01.718>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36559
    keywords = ['patch']
    message_count = 15.0
    messages = ['339637', '339638', '339652', '339666', '339673', '339674', '339689', '339690', '339720', '339759', '339763', '339791', '339822', '339834', '339887']
    nosy_count = 6.0
    nosy_names = ['brett.cannon', 'rhettinger', 'vstinner', 'christian.heimes', 'methane', 'xdegaye']
    pr_nums = ['12728', '12742']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue36559'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 8, 2019

    Currently, when the random module is imported, the hashlib module is always imported which loads the OpenSSL library, whereas hashlib is only needed when a Random() instance is created with a string seed.

    For example, "rnd = random.Random()" and "rnd = random.Random(12345)" don't need hashlib.

    Example on Linux:

    $ python3
    Python 3.7.2 (default, Mar 21 2019, 10:09:12) 
    >>> import os, sys
    
    >>> 'hashlib' in sys.modules
    False
    >>> res=os.system(f"grep ssl /proc/{os.getpid()}/maps")
    
    >>> import random
    >>> 'hashlib' in sys.modules
    True
    >>> res=os.system(f"grep ssl /proc/{os.getpid()}/maps")
    7f463ec38000-7f463ec55000 r--p 00000000 00:2a 5791335                    /usr/lib64/libssl.so.1.1.1b
    7f463ec55000-7f463eca5000 r-xp 0001d000 00:2a 5791335                    /usr/lib64/libssl.so.1.1.1b
    ...

    Attached PR only imports hashlib on demand.

    Note: I noticed this issue while working on adding OpenSSL 1.1.1 support to Python 3.4 :-)

    @vstinner vstinner added 3.8 only security fixes stdlib Python modules in the Lib dir labels Apr 8, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 8, 2019

    In the past, some developers complained when an import has been removed in a stdlib module. I vaguely recall code using "import os" to get the "errno" module from "os.errno". That's "What's New in Python 3.7" contains:

    "Several undocumented internal imports were removed. One example is that os.errno is no longer available; use import errno directly instead. Note that such undocumented internal imports may be removed any time without notice, even in micro version releases."

    For this reason, I don't think that stable versions (2.7 and 3.7) should be modified.

    @rhettinger
    Copy link
    Contributor

    Why do we care about this particular import? It doesn't see slow in any way. In general, we don't do deferred imports unless there is a compelling reason (i.e. it is very slow or it is sometimes unavailable). Otherwise, it is a false optimization.

    @brettcannon
    Copy link
    Member

    Could you explain a bit more, Victor, about why you want to avoid importing hashlib and OpenSSL so much?

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 8, 2019

    Raymond:

    In general, we don't do deferred imports unless there is a compelling reason (i.e. it is very slow or it is sometimes unavailable).

    While I was working on adding OpenSSL 1.1.1 to Python 3.4, my _hashopenssl module was broken. In that case, "import random" fails with ImportError because of hashlib failures. I was surprised, since random doesn't need hashlib at startup.

    I would like to be able to use "import random" even if hashlib is broken. For me, random is a key component, whereas I see hashlib more as optional. (Even if in practice, it should always be available).

    Raymond:

    Otherwise, it is a false optimization.

    Brett:

    Could you explain a bit more, Victor, about why you want to avoid importing hashlib and OpenSSL so much?

    Well, OpenSSL is not a random tiny library. For example, loading it increases Python RSS of around 2.7 MiB.

    Example with script x.py:
    ---

    import os
    os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")
    import random
    os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")

    Output without the change on Fedora 29:

    VmRSS: 7396 kB
    VmRSS: 11796 kB # +4.4 MiB

    With the change:

    VmRSS: 7272 kB
    VmRSS: 8988 kB # +1.7 MiB

    Another example is that OpenSSL loads the libz.so dynamic library by dependency.

    I would prefer to minimize Python footprint if possible.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 8, 2019

    Raymond:

    Why do we care about this particular import? It doesn't see slow in any way.

    I ran a benchmark:

    $ ./python -m perf command -o ref.json -v -- ./python -c 'import random'
    $ # apply patch
    $ ./python -m perf command -o patch.json -v -- ./python -c 'import random'
    $ ./python -m perf compare_to ref.json patch.json 
    Mean +- std dev: [ref] 13.8 ms +- 0.2 ms -> [patch] 11.8 ms +- 0.2 ms: 1.18x faster (-15%)

    My PR 12728 makes Python startup 2 ms faster which I consider as quite significant on 13.8 ms.

    I know that some users are fighting to get a faster Python startup. Mercurial is just one example ;-)

    @rhettinger
    Copy link
    Contributor

    In general, deferred imports are code smell that should avoided unless really necessary. They create an on-going maintenance burden (there's a reason most modules don't do this and put their imports at the top).

    FWIW, a broken hashlib is a localized bug, not an optimization problem. It doesn't affect any user with a build that passes the test suite.

    Running "python -v" shows that "random" is not part of the normal startup, so deferring the import saves zero for normal startup. It only affects modules that specifically import random.

    IIRC, Mercurial uses hashing extensively, so deferring the import doesn't help them at all.

    This is minor change, so I suppose we could let it go through; however, it seems somewhat arbitrary and the reasons offered seem dubious. For the most part, it isn't a good practice.

    @rhettinger
    Copy link
    Contributor

    One other thought. This code has been present for over a decade. There is no evidence that anyone has ever wanted random to defer one of its imports. This seems like an invented problem.

    @tiran
    Copy link
    Member

    tiran commented Apr 9, 2019

    You could also use the internal _sha512 module. It's always present, small, lean and provides a SHA512 implementation with sufficient performance.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Apr 9, 2019

    You could also use the internal _sha512 module. It's always present

    This is not true at the moment, the _sha512 module is not present when openssl is missing. This is a bug in setup.py that prevents building the _sha512 module when openssl is missing. See bpo-36544.

    It is possible that this issue (since it started because of hashlib failures) and also bpo-36414 are caused by this problem.

    @tiran
    Copy link
    Member

    tiran commented Apr 9, 2019

    Thanks for pointing to the other issue. This is clearly a regression and should be fixed ASAP. I have ACKed your PR and pushed it.

    @vstinner
    Copy link
    Member Author

    vstinner commented Apr 9, 2019

    Note: *Technically*, you can disable the compilation of the _sha512 module using "*disabled*" in Modules/Setup, but I'm not sure if it's a common use case. At least, it makes sense to me when we are sure that OpenSSL and _hashlib are available ;-) I didn't want to mention that since I'm not sure that it's really relevant in this discussion. (I was aware of the regression, and hopefully it's now fixed!)

    @rhettinger
    Copy link
    Contributor

    You could also use the internal _sha512 module.
    It's always present, small, lean and provides a SHA512
    implementation with sufficient performance.

    I suppose we could do this but it borders on telling folks that we're worried about using our own public APIs, that importing hashlib is bad for them. It shouldn't be that way, hashlib is a collection of hash functions -- it is clear why this import isn't small and fast. It suggests that something is wrong with the implementation.

    The focus on client code in random seems like the wrong focus. That is just typical of what other clients would do.

    @methane
    Copy link
    Member

    methane commented Apr 10, 2019

    • Is hashlib large, slow to import library? -- Yes.
    • random module use hashlib only for specific (rare) use case? -- Yes.
    • Does user of random module needs hashlib in most case? -- No. For example, tmpfile user may not need hashlib.

    I'm +1 on PR 12728.

    @rhettinger
    Copy link
    Contributor

    New changeset d914596 by Raymond Hettinger (Christian Heimes) in branch 'master':
    bpo-36559: random module: optimize sha512 import (GH-12742)
    d914596

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants