Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microoptimize PyType_Lookup for cache hits #87618

Closed
DinoV opened this issue Mar 9, 2021 · 7 comments
Closed

Microoptimize PyType_Lookup for cache hits #87618

DinoV opened this issue Mar 9, 2021 · 7 comments
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@DinoV
Copy link
Contributor

DinoV commented Mar 9, 2021

BPO 43452
Nosy @vstinner, @DinoV, @methane, @pablogsal
PRs
  • bpo-43452: Microoptimizations to PyType_Lookup #24804
  • bpo-43452: Document the PyType_Lookup optimizations in the What's New for 3.10 #24949
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-03-20.19:13:03.911>
    created_at = <Date 2021-03-09.19:21:17.225>
    labels = ['interpreter-core', '3.10']
    title = 'Microoptimize PyType_Lookup for cache hits'
    updated_at = <Date 2021-03-27.00:02:33.128>
    user = 'https://github.com/DinoV'

    bugs.python.org fields:

    activity = <Date 2021-03-27.00:02:33.128>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-03-20.19:13:03.911>
    closer = 'pablogsal'
    components = ['Interpreter Core']
    creation = <Date 2021-03-09.19:21:17.225>
    creator = 'dino.viehland'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43452
    keywords = ['patch']
    message_count = 7.0
    messages = ['388377', '388485', '388506', '389134', '389173', '389418', '389577']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'dino.viehland', 'methane', 'pablogsal']
    pr_nums = ['24804', '24949']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue43452'
    versions = ['Python 3.10']

    @DinoV
    Copy link
    Contributor Author

    DinoV commented Mar 9, 2021

    The common case going through _PyType_Lookup is to have a cache hit. There's some small tweaks which can make this a little cheaper:

    1. the name field identity is used for a cache hit, and is kept alive by the cache. So there's no need to read the hash code of the name - instead the address can be used as the hash.

    2. There's no need to check if the name is cachable on the lookup either, it probably is, and if it is, it'll be in the cache.

    3. If we clear the version tag when invalidating a type then we don't actually need to check for a valid version tag bit.

    @DinoV DinoV added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Mar 9, 2021
    @methane
    Copy link
    Member

    methane commented Mar 11, 2021

    $ ./python -m pyperf compare_to -G --min-speed=3 master.json pytype.json
    Slower (1):
    - unpack_sequence: 62.2 ns +- 0.6 ns -> 66.1 ns +- 0.9 ns: 1.06x slower

    Faster (29):

    • nbody: 182 ms +- 1 ms -> 152 ms +- 2 ms: 1.19x faster
    • regex_effbot: 4.00 ms +- 0.05 ms -> 3.45 ms +- 0.08 ms: 1.16x faster
    • spectral_norm: 178 ms +- 1 ms -> 158 ms +- 2 ms: 1.13x faster
    • crypto_pyaes: 140 ms +- 1 ms -> 125 ms +- 1 ms: 1.12x faster
    • dulwich_log: 88.0 ms +- 11.8 ms -> 80.1 ms +- 0.5 ms: 1.10x faster
    • scimark_fft: 522 ms +- 4 ms -> 477 ms +- 14 ms: 1.09x faster
    • mako: 18.2 ms +- 0.1 ms -> 16.7 ms +- 0.1 ms: 1.09x faster
    • raytrace: 542 ms +- 4 ms -> 497 ms +- 3 ms: 1.09x faster
    • regex_dna: 251 ms +- 1 ms -> 230 ms +- 1 ms: 1.09x faster
    • regex_v8: 29.6 ms +- 0.2 ms -> 27.2 ms +- 0.1 ms: 1.09x faster
    • pidigits: 227 ms +- 0 ms -> 210 ms +- 0 ms: 1.08x faster
    • fannkuch: 564 ms +- 7 ms -> 525 ms +- 4 ms: 1.08x faster
    • telco: 7.70 ms +- 0.12 ms -> 7.17 ms +- 0.18 ms: 1.07x faster
    • float: 130 ms +- 1 ms -> 122 ms +- 1 ms: 1.07x faster
    • unpickle_pure_python: 350 us +- 4 us -> 327 us +- 3 us: 1.07x faster
    • scimark_sparse_mat_mult: 6.86 ms +- 0.08 ms -> 6.47 ms +- 0.07 ms: 1.06x faster
    • chaos: 122 ms +- 1 ms -> 115 ms +- 1 ms: 1.06x faster
    • nqueens: 113 ms +- 1 ms -> 107 ms +- 1 ms: 1.06x faster
    • json_dumps: 15.4 ms +- 0.1 ms -> 14.7 ms +- 0.1 ms: 1.05x faster
    • pickle_pure_python: 505 us +- 6 us -> 480 us +- 5 us: 1.05x faster
    • logging_simple: 9.88 us +- 0.17 us -> 9.40 us +- 0.18 us: 1.05x faster
    • deltablue: 8.19 ms +- 0.13 ms -> 7.79 ms +- 0.15 ms: 1.05x faster
    • scimark_lu: 190 ms +- 3 ms -> 182 ms +- 5 ms: 1.05x faster
    • scimark_monte_carlo: 125 ms +- 2 ms -> 120 ms +- 2 ms: 1.05x faster
    • pickle: 11.3 us +- 0.2 us -> 10.8 us +- 0.1 us: 1.05x faster
    • logging_silent: 189 ns +- 4 ns -> 182 ns +- 4 ns: 1.04x faster
    • pickle_dict: 31.7 us +- 0.2 us -> 30.5 us +- 0.1 us: 1.04x faster
    • django_template: 51.6 ms +- 0.7 ms -> 49.8 ms +- 0.6 ms: 1.04x faster
    • pyflate: 755 ms +- 7 ms -> 730 ms +- 7 ms: 1.03x faster

    Benchmark hidden because not significant (30): 2to3, chameleon, genshi_text, genshi_xml, go, hexiom, json_loads, logging_format, meteor_contest, pathlib, pickle_list, python_startup, python_startup_no_site, regex_compile, richards, scimark_sor, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, tornado_http, unpickle, unpickle_list, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process

    Geometric mean: 1.04x faster

    @vstinner
    Copy link
    Member

    Geometric mean: 1.04x faster

    Hey, it's good to see my new pyperf feature being used :-D IMO it helps to more easily compare a large set of benchmarks.

    @DinoV
    Copy link
    Contributor Author

    DinoV commented Mar 20, 2021

    Setup a micro-benchmark, foo.c:

    #define PY_SSIZE_T_CLEAN
    #include <Python.h>
    #include <string.h>
    #include <stdlib.h>
    
    int
    main(int argc, char *argv[])
    {
        wchar_t *program = Py_DecodeLocale(argv[0], NULL);
        if (program == NULL) {
            fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
            exit(1);
        }
        Py_SetProgramName(program);  /* optional but recommended */
        Py_Initialize();
        PyObject *pName = PyUnicode_DecodeFSDefault("foo");
        if (pName == NULL) { printf("no foo\n"); PyErr_Print(); }
        PyObject *pModule = PyImport_Import(pName);
        if (pModule == NULL) { printf("no mod\n"); PyErr_Print(); return 0; }
        PyObject *cls = PyObject_GetAttrString(pModule, "C");
        if (cls == NULL) { printf("no cls\n"); }
        PyObject *fs[20];
        for(int i = 0; i<20; i++) {
             char buf[4];
             sprintf(buf, "f%d", i);
             fs[i] = PyUnicode_DecodeFSDefault(buf);
        }
        for(int i = 0; i<100000000; i++) {
             for(int j = 0; j<20; j++) {
                 if(_PyType_Lookup(cls, fs[j])==NULL) {
    		printf("Uh oh\n");
    	     }
             }
        }
    
       if (Py_FinalizeEx() < 0) {
            exit(120);
        }
        PyMem_RawFree(program);
        return 0;
    }

    Lib/foo.py:
    import time

    class C:
        pass
    
    for i in range(20):
        setattr(C, f"f{i}", lambda self: None)

    obj hash: 0m6.222s
    str hash: 0m6.327s
    baseline: 0m6.784s

    @pablogsal
    Copy link
    Member

    New changeset ee48c7d by Dino Viehland in branch 'master':
    bpo-43452: Micro-optimizations to PyType_Lookup (GH-24804)
    ee48c7d

    @pablogsal
    Copy link
    Member

    New changeset a054f6b by Pablo Galindo in branch 'master':
    bpo-43452: Document the PyType_Lookup optimizations in the What's New for 3.10 (GH-24949)
    a054f6b

    @pablogsal
    Copy link
    Member

    Seems that commit ee48c7d has introduced a regression:

    https://bugs.python.org/issue43636

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants