Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

site.py imports relatively large sysconfig module. #73771

Closed
methane opened this issue Feb 17, 2017 · 22 comments
Closed

site.py imports relatively large sysconfig module. #73771

methane opened this issue Feb 17, 2017 · 22 comments
Labels
3.7 performance stdlib

Comments

@methane
Copy link
Member

@methane methane commented Feb 17, 2017

BPO 29585
Nosy @malemburg, @gpshead, @vstinner, @tiran, @ned-deily, @merwok, @methane
PRs
  • #136
  • #2476
  • #2477
  • #2478
  • #2483
  • #2927
  • #2928
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-07-28.12:35:27.441>
    created_at = <Date 2017-02-17.09:57:03.001>
    labels = ['3.7', 'library', 'performance']
    title = 'site.py imports relatively large `sysconfig` module.'
    updated_at = <Date 2017-07-28.12:35:27.441>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2017-07-28.12:35:27.441>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-07-28.12:35:27.441>
    closer = 'methane'
    components = ['Library (Lib)']
    creation = <Date 2017-02-17.09:57:03.001>
    creator = 'methane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 29585
    keywords = []
    message_count = 22.0
    messages = ['287981', '287983', '287984', '287985', '287988', '287990', '287997', '287999', '288000', '288001', '288012', '288020', '288057', '297192', '297194', '297197', '297258', '299368', '299371', '299379', '299380', '299383']
    nosy_count = 7.0
    nosy_names = ['lemburg', 'gregory.p.smith', 'vstinner', 'christian.heimes', 'ned.deily', 'eric.araujo', 'methane']
    pr_nums = ['136', '2476', '2477', '2478', '2483', '2927', '2928']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue29585'
    versions = ['Python 3.7']

    @methane
    Copy link
    Member Author

    @methane methane commented Feb 17, 2017

    site.py uses sysconfig (and sysconfigdata, _osx_support) module for user-site package.

    But sysconfig module is not so lightweight, and very rarely used.
    Actually speaking, only tests and distutils uses sysconfig in stdlibs.

    And it takes about 7% of startup time, only for searching user-site path.

    I tried to port minimal subset of sysconfig into site.py (GH-136).
    But 'PYTHONFRAMEWORK' is only in sysconfigdata. So I couldn't get rid sysconfig dependency completely.

    How can I do to solve this?

    a) Drop "osx_framework_user" (~/Library/Python/3.7/) support completely.
    b) Add "sys._osx_framework" attribute
    c) Create minimal sysconfigdata only for site.py
    d) anything else?

    @methane methane added 3.7 stdlib performance labels Feb 17, 2017
    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 17, 2017

    Instead of using slow sysconfig and loading the big _sysconfig_data dictionary in memory, would it be possible to extract the minimum set of sysconfig needed by the site module and put it in a builtin module? In site.py, I only found 4 variables:

        from sysconfig import get_config_var
        USER_BASE = get_config_var('userbase')
    
        from sysconfig import get_path
                USER_SITE = get_path('purelib', 'osx_framework_user')
        USER_SITE = get_path('purelib', '%s_user' % os.name)
    
                from sysconfig import get_config_var
                framework = get_config_var("PYTHONFRAMEWORK")

    Because of the site module, the _sysconfig_data module dictionary is always loaded in memory even for for a dummy print("Hello World!").

    I suggest to start building a _site builtin module: subset of site.py which would avoid sysconfig and reimplement things in C for best performances.

    speed.python.org:

    • python_startup: 14 ms
    • python_startup_nosite: 8 ms

    Importing site takes 6 ms: 42% of 14 ms...

    I'm interested to know if it would be possible to reduce these 6 ms by rewriting some parts of site.py in C.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 17, 2017

    Serhiy collected interesting numbers, copy/paste of this message:
    http://bugs.python.org/issue28637#msg280380

    On my computer:

    Importing empty module: 160 us
    Creating empty class: 30 us
    Creating empty function: 0.16 us
    Creating empty Enum/IntEnum: 125/150 us
    Creating Enum/IntEnum member: 25/27 us
    Creating empty namedtuple: 600 us
    Creating namedtuple member: 50 us
    Importing the itertools module: 40 us
    Importing the io module: 900 us
    Importing the os module: 1600 us
    Importing the functools module: 2100 us
    Importing the re module (with all sre submodules): 3300 us
    Python startup time: 43000 us

    @tiran
    Copy link
    Member

    @tiran tiran commented Feb 17, 2017

    What's your platform, Inada? Are you running macOS? I optimized site.py for Linux and BSD users a couple of years ago.

    @methane
    Copy link
    Member Author

    @methane methane commented Feb 17, 2017

    Christian: I'm using macOS on office and Linux on home.

    sysconfig is imported even on Linux
    https://github.com/python/cpython/blob/master/Lib/site.py#L247-L248
    https://github.com/python/cpython/blob/master/Lib/site.py#L263-L271

    @malemburg
    Copy link
    Member

    @malemburg malemburg commented Feb 17, 2017

    I don't think rewriting party of site.py in C is a good idea. It's a rather maintenance intense module.

    However, optimizing access is certainly something that's possible, e.g. by placing the few variables that are actually needed by site.py into a bootstrap module for sysconfig, which only contains the few variables needed by interpreter startup.

    Alternatively, sysconfig data could be made available via a C lookup function; with the complete dictionary only being created on demand. get_config_var() already is such a lookup API which could be used as front-end.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Feb 17, 2017

    Marc-Andre Lemburg added the comment:

    I don't think rewriting party of site.py in C is a good idea. It's a rather maintenance intense module.

    However, optimizing access is certainly something that's possible, e.g. by placing the few variables that are actually needed by site.py into a bootstrap module for sysconfig, which only contains the few variables needed by interpreter startup.

    Right, I don't propose to rewrite the 598 lines of site.py in C, but
    only rewrite the parts which have a huge impact on the startup time.
    It seems like the minimum part would be to write a _site module which
    provide the 4 variables currently read from sysconfig.

    I'm proposing to add a new private module because I don't want to
    pollute site which already contains too many things.

    I looked at site.py history: I don't see *major* changes last 2 years.
    Only small enhancements, updates and fixes.

    Alternatively, sysconfig data could be made available via a C lookup function; with the complete dictionary only being created on demand. get_config_var() already is such a lookup API which could be used as front-end.

    I don't think that it's worth it to reimplement partially sysconfig in
    C. This module is huge, complex, and platform dependant.

    Well, I'm not sure about what is the best approach, but I'm sure that
    we can do something to optimize site.py. 6 ms is a lot!

    I never liked site.py. It seems like a huge workaround. I also dislike
    having a different behaviour if site is imported or not.

    That's why I asked Steve Dower to removing the code to create the
    cpXXX alias for the mbcs codec from site.py to encodings/init.py:
    see commit f5aba58. I'm happy that
    this code was removed from site.py!

    @tiran
    Copy link
    Member

    @tiran tiran commented Feb 17, 2017

    Instead of _site, would it make sense to include the four vars in sys, perhaps as named structure like sys.flags?

    @methane
    Copy link
    Member Author

    @methane methane commented Feb 17, 2017

    @methane
    Copy link
    Member Author

    @methane methane commented Feb 17, 2017

    no-site
    python_startup_no_site: Median +- std dev: 9.13 ms +- 0.02 ms

    default:
    python_startup: Median +- std dev: 15.6 ms +- 0.0 ms

    GH-136 + skip abs_paths().
    python_startup: Median +- std dev: 14.2 ms +- 0.0 ms

    profile of GH-136 + skip abs_paths():
    https://gist.github.com/methane/26fc0a2382207655a6819a92f867620c

    Most of time is consumed by importlib.

    @malemburg
    Copy link
    Member

    @malemburg malemburg commented Feb 17, 2017

    On 17.02.2017 13:06, STINNER Victor wrote:

    > Alternatively, sysconfig data could be made available via a C lookup function; with the complete dictionary only being created on demand. get_config_var() already is such a lookup API which could be used as front-end.

    I don't think that it's worth it to reimplement partially sysconfig in
    C. This module is huge, complex, and platform dependant.

    Sorry, I was just referring to the data part of sysconfig,
    not sysconfig itself.

    Having a lookup function much like we have for unicodedata
    makes things much more manageable, since you don't need to
    generate a dictionary in memory for all the values in the
    config data. Creating that dictionary takes a while (in terms
    of ms).

    @methane
    Copy link
    Member Author

    @methane methane commented Feb 17, 2017

    I create bpo-29592 for abs_paths(). Let's focus on sysconfig in this issue.

    PR 136 ports really needed part of sysconfig into site.py already.
    'PYTHONFRAMEWORK' on macOS is the only variable we need import from sysconfig.

    Adding site.cfg like pyvenv.cfg make sense?

    @methane
    Copy link
    Member Author

    @methane methane commented Feb 18, 2017

    PR 136 now adds sys._framework and 'PYTHONFRAMEWORK' macro in pyconfig.h.

    @methane
    Copy link
    Member Author

    @methane methane commented Jun 28, 2017

    New changeset a8f8d5b by INADA Naoki in branch 'master':
    bpo-29585: optimize site.py startup time (GH-136)
    a8f8d5b

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Jun 28, 2017

    Test fails on macOS:

    http://buildbot.python.org/all/builders/x86-64%20Sierra%203.x/builds/402/steps/test/logs/stdio

    ======================================================================
    FAIL: test_getsitepackages (test.test_site.HelperFunctionsTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/Users/buildbot/buildarea/3.x.billenstein-sierra/build/Lib/test/test_site.py", line 266, in test_getsitepackages
        self.assertEqual(len(dirs), 2)
    AssertionError: 1 != 2

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Jun 28, 2017

    New changeset b01c574 by Victor Stinner in branch 'master':
    bpo-29585: Define PYTHONFRAMEWORK in PC/pyconfig.h (bpo-2477)
    b01c574

    @methane
    Copy link
    Member Author

    @methane methane commented Jun 29, 2017

    New changeset 6b42eb1 by INADA Naoki in branch 'master':
    bpo-29585: Fix sysconfig.get_config_var("PYTHONFRAMEWORK") (GH-2483)
    6b42eb1

    @methane methane closed this as completed Jun 29, 2017
    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Jul 28, 2017

    test_get_path fails on macOS installed framework builds:

    ======================================================================
    FAIL: test_get_path (test.test_site.HelperFunctionsTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/Users/nad/Projects/PyDev/active/dev/3x/root/fwd_macports/Library/Frameworks/pytest_10.12.framework/Versions/3.7/lib/python3.7/test/test_site.py", line 188, in test_get_path
        sysconfig.get_path('purelib', os.name + '_user'))
    AssertionError: '/Users/nad/Library/pytest_10.12/3.7/lib/python/site-packages' != '/Users/nad/Library/pytest_10.12/3.7/lib/python3.7/site-packages'
    - /Users/nad/Library/pytest_10.12/3.7/lib/python/site-packages
    + /Users/nad/Library/pytest_10.12/3.7/lib/python3.7/site-packages
    ?                                               +++

    Ran 27 tests in 0.471s

    FAILED (failures=1, skipped=4)

    @ned-deily ned-deily reopened this Jul 28, 2017
    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Jul 28, 2017

    New changeset c22bd58 by Ned Deily in branch 'master':
    bpo-28095: Re-enable temporarily disabled part of test_startup_imports on macOS (bpo-2927)
    c22bd58

    @methane
    Copy link
    Member Author

    @methane methane commented Jul 28, 2017

    https://docs.python.org/3.6/library/site.html#site.USER_SITE

    ~/Library/Python/X.Y/lib/python/site-packages for Mac framework builds

    So it seems I broke sysconfig.get_path('purelib', 'posix_user').

    @methane
    Copy link
    Member Author

    @methane methane commented Jul 28, 2017

    https://github.com/python/cpython/pull/136/files

    • if sys.platform == 'darwin':
    •    from sysconfig import get_config_var
      
    •    if get_config_var('PYTHONFRAMEWORK'):
      
    •        USER_SITE = get_path('purelib', 'osx_framework_user')
      
    •        return USER_SITE
      

    + if USER_SITE is None:
    + USER_SITE = _get_path(userbase)

    OK, I need to use osx_framework_user instead of os.name + '_user' on framework build.

    @methane
    Copy link
    Member Author

    @methane methane commented Jul 28, 2017

    New changeset ba9ddb7 by INADA Naoki in branch 'master':
    bpo-29585: fix test fail on macOS Framework build (GH-2928)
    ba9ddb7

    @methane methane closed this as completed Jul 28, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 performance stdlib
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants