Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use map function instead of genexpr in capwords #89388

Closed
speedrun-program mannequin opened this issue Sep 16, 2021 · 3 comments
Closed

use map function instead of genexpr in capwords #89388

speedrun-program mannequin opened this issue Sep 16, 2021 · 3 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@speedrun-program
Copy link
Mannequin

speedrun-program mannequin commented Sep 16, 2021

BPO 45225
Nosy @rhettinger, @speedrun-program
PRs
  • bpo-45225: use map function instead of genexpr in capwords #28342
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-09-16.19:50:29.576>
    created_at = <Date 2021-09-16.18:54:54.177>
    labels = ['library', 'performance']
    title = 'use map function instead of genexpr in capwords'
    updated_at = <Date 2021-09-16.19:50:29.575>
    user = 'https://github.com/speedrun-program'

    bugs.python.org fields:

    activity = <Date 2021-09-16.19:50:29.575>
    actor = 'rhettinger'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-09-16.19:50:29.576>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2021-09-16.18:54:54.177>
    creator = 'speedrun-program'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45225
    keywords = []
    message_count = 3.0
    messages = ['401981', '401985', '401986']
    nosy_count = 2.0
    nosy_names = ['rhettinger', 'speedrun-program']
    pr_nums = ['28342']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue45225'
    versions = []

    @speedrun-program
    Copy link
    Mannequin Author

    speedrun-program mannequin commented Sep 16, 2021

    In string.py, the capwords function passes str.join a generator expression, but the map function
    could be used instead. This is how capwords is currently written:

    --------------------

    def capwords(s, sep=None):
        """capwords(s [,sep]) -> string
        
        Split the argument into words using split, capitalize each
        word using capitalize, and join the capitalized words using
        join.  If the optional second argument sep is absent or None,
        runs of whitespace characters are replaced by a single space
        and leading and trailing whitespace are removed, otherwise
        sep is used to split and join the words.
        
        """
        return (sep or ' ').join(x.capitalize() for x in s.split(sep))

    --------------------

    This is how capwords could be written:

    --------------------

    def capwords(s, sep=None):
        """capwords(s [,sep]) -> string
        
        Split the argument into words using split, capitalize each
        word using capitalize, and join the capitalized words using
        join.  If the optional second argument sep is absent or None,
        runs of whitespace characters are replaced by a single space
        and leading and trailing whitespace are removed, otherwise
        sep is used to split and join the words.
        
        """
        return (sep or ' ').join(map(str.capitalize, s.split(sep)))

    --------------------

    These are the benefits:

    1. Faster performance which increases with the number of times the str is split.

    2. Very slightly smaller .py and .pyc file sizes.

    3. Source code is slightly more concise.

    This is the performance test code in ipython:

    --------------------

    def capwords_current(s, sep=None):
        return (sep or ' ').join(x.capitalize() for x in s.split(sep))
    ​
    def capwords_new(s, sep=None):
        return (sep or ' ').join(map(str.capitalize, s.split(sep)))
    ​
    tests = ["a " * 10**n for n in range(9)]
    tests.append("a " * (10**9 // 2)) # I only have 16GB of RAM

    --------------------

    These are the results of a performance test using %timeit in ipython:

    --------------------

    %timeit x = capwords_current("")
    835 ns ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    %timeit x = capwords_new("")
    758 ns ± 35.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


    %timeit x = capwords_current(tests[0])
    977 ns ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    %timeit x = capwords_new(tests[0])
    822 ns ± 30 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


    %timeit x = capwords_current(tests[1])
    3.07 µs ± 88.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

    %timeit x = capwords_new(tests[1])
    2.17 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


    %timeit x = capwords_current(tests[2])
    28 µs ± 896 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    %timeit x = capwords_new(tests[2])
    19.4 µs ± 352 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


    %timeit x = capwords_current(tests[3])
    236 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

    %timeit x = capwords_new(tests[3])
    153 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


    %timeit x = capwords_current(tests[4])
    2.12 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    %timeit x = capwords_new(tests[4])
    1.5 ms ± 9.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


    %timeit x = capwords_current(tests[5])
    23.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

    %timeit x = capwords_new(tests[5])
    15.6 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


    %timeit x = capwords_current(tests[6])
    271 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    %timeit x = capwords_new(tests[6])
    192 ms ± 807 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


    %timeit x = capwords_current(tests[7])
    2.66 s ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    %timeit x = capwords_new(tests[7])
    1.95 s ± 26.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


    %timeit x = capwords_current(tests[8])
    25.9 s ± 80.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    %timeit x = capwords_new(tests[8])
    18.4 s ± 123 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


    %timeit x = capwords_current(tests[9])
    6min 17s ± 29 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

    %timeit x = capwords_new(tests[9])
    5min 36s ± 24.8 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

    --------------------

    @speedrun-program speedrun-program mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Sep 16, 2021
    @rhettinger
    Copy link
    Contributor

    New changeset a59ede2 by speedrun-program in branch 'main':
    bpo-45225: use map function instead of genexpr in capwords (GH-28342)
    a59ede2

    @rhettinger
    Copy link
    Contributor

    Thanks for the PR.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant