Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use malloc() for better performance of some list operations #78332

Closed
sir-sigurd mannequin opened this issue Jul 18, 2018 · 4 comments
Closed

use malloc() for better performance of some list operations #78332

sir-sigurd mannequin opened this issue Jul 18, 2018 · 4 comments
Labels
3.8 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@sir-sigurd
Copy link
Mannequin

sir-sigurd mannequin commented Jul 18, 2018

BPO 34151
Nosy @rhettinger, @scoder, @zhangyangyu, @sir-sigurd, @tirkarthi
PRs
  • bpo-34151: Improve performance of some list operations #8332
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-08-11.13:14:26.195>
    created_at = <Date 2018-07-18.17:31:49.018>
    labels = ['interpreter-core', '3.8', 'performance']
    title = 'use malloc() for better performance of some list operations'
    updated_at = <Date 2018-08-11.13:14:26.194>
    user = 'https://github.com/sir-sigurd'

    bugs.python.org fields:

    activity = <Date 2018-08-11.13:14:26.194>
    actor = 'xiang.zhang'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-08-11.13:14:26.195>
    closer = 'xiang.zhang'
    components = ['Interpreter Core']
    creation = <Date 2018-07-18.17:31:49.018>
    creator = 'sir-sigurd'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34151
    keywords = ['patch']
    message_count = 4.0
    messages = ['321905', '321914', '321925', '323415']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'scoder', 'xiang.zhang', 'sir-sigurd', 'xtreak']
    pr_nums = ['8332']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue34151'
    versions = ['Python 3.8']

    @sir-sigurd
    Copy link
    Mannequin Author

    sir-sigurd mannequin commented Jul 18, 2018

    Currently list concatenation, slicing and repeating operations are using PyList_New() which allocates memory for the items by calloc(). malloc() could be used instead, since the allocated memory is overwritten by mentioned operations.

    I made benchmarks with this script:

    NAME=list-malloc-master.json

    python -m perf timeit --name slice0 -s "l = [None]*1000000" "l[:0]" --duplicate=2048 --append $NAME
    python -m perf timeit --name slice1 -s "l = [None]*1000000" "l[:1]" --duplicate=1024 --append $NAME
    python -m perf timeit --name slice2 -s "l = [None]*1000000" "l[:2]" --duplicate=1024 --append $NAME
    python -m perf timeit --name slice3 -s "l = [None]*1000000" "l[:3]" --duplicate=1024 --append $NAME
    python -m perf timeit --name slice1000000 -s "l = [None]*1000000" "l[:1000000]" --append $NAME

    python -m perf timeit --name cat0 -s "l = [None]*0" "l + l" --duplicate=1024 --append $NAME
    python -m perf timeit --name cat1 -s "l = [None]*1" "l * 1" --duplicate=1024 --append $NAME
    python -m perf timeit --name cat2 -s "l = [None]*2" "l * 1" --duplicate=1024 --append $NAME
    python -m perf timeit --name cat3 -s "l = [None]*3" "l * 1" --duplicate=1024 --append $NAME
    python -m perf timeit --name cat1000000 -s "l = [None]*1000000" "l * 1" --append $NAME

    python -m perf timeit --name 1x0 -s "l = [None]" "l * 0" --duplicate=1024 --append $NAME
    python -m perf timeit --name 1x1 -s "l = [None]" "l * 1" --duplicate=1024 --append $NAME
    python -m perf timeit --name 1x2 -s "l = [None]" "l * 2" --duplicate=1024 --append $NAME
    python -m perf timeit --name 1x3 -s "l = [None]" "l * 3" --duplicate=1024 --append $NAME
    python -m perf timeit --name 1x1000000 -s "l = [None]" "l * 1000000" --append $NAME

    Here's comparison table:

    +--------------+--------------------+------------------------------+
    | Benchmark | list-malloc-master | list-malloc |
    +==============+====================+==============================+
    | slice1 | 84.5 ns | 59.6 ns: 1.42x faster (-30%) |
    +--------------+--------------------+------------------------------+
    | slice2 | 71.6 ns | 61.8 ns: 1.16x faster (-14%) |
    +--------------+--------------------+------------------------------+
    | slice3 | 74.4 ns | 63.6 ns: 1.17x faster (-15%) |
    +--------------+--------------------+------------------------------+
    | slice1000000 | 4.39 ms | 4.08 ms: 1.08x faster (-7%) |
    +--------------+--------------------+------------------------------+
    | cat0 | 23.9 ns | 24.9 ns: 1.04x slower (+4%) |
    +--------------+--------------------+------------------------------+
    | cat1 | 73.2 ns | 51.9 ns: 1.41x faster (-29%) |
    +--------------+--------------------+------------------------------+
    | cat2 | 61.6 ns | 53.1 ns: 1.16x faster (-14%) |
    +--------------+--------------------+------------------------------+
    | cat3 | 63.0 ns | 54.3 ns: 1.16x faster (-14%) |
    +--------------+--------------------+------------------------------+
    | cat1000000 | 4.38 ms | 4.08 ms: 1.07x faster (-7%) |
    +--------------+--------------------+------------------------------+
    | 1x0 | 27.1 ns | 27.7 ns: 1.02x slower (+2%) |
    +--------------+--------------------+------------------------------+
    | 1x1 | 72.9 ns | 51.9 ns: 1.41x faster (-29%) |
    +--------------+--------------------+------------------------------+
    | 1x2 | 60.9 ns | 52.9 ns: 1.15x faster (-13%) |
    +--------------+--------------------+------------------------------+
    | 1x3 | 62.5 ns | 54.8 ns: 1.14x faster (-12%) |
    +--------------+--------------------+------------------------------+
    | 1x1000000 | 2.67 ms | 2.34 ms: 1.14x faster (-12%) |
    +--------------+--------------------+------------------------------+

    Not significant (1): slice0

    @sir-sigurd sir-sigurd mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jul 18, 2018
    @scoder
    Copy link
    Contributor

    scoder commented Jul 18, 2018

    Nice! Patch looks good to me, minus the usual naming nit-pick.

    @scoder scoder added the 3.8 (EOL) end of life label Jul 18, 2018
    @rhettinger
    Copy link
    Contributor

    +1 This looks like a reasonable improvement.

    @zhangyangyu
    Copy link
    Member

    New changeset 2fc4697 by Xiang Zhang (Sergey Fedoseev) in branch 'master':
    bpo-34151: Improve performance of some list operations (GH-8332)
    2fc4697

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants