Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyUnicodeWriter: change the overallocation factor for Windows #63780

Closed
vstinner opened this issue Nov 14, 2013 · 8 comments
Closed

PyUnicodeWriter: change the overallocation factor for Windows #63780

vstinner opened this issue Nov 14, 2013 · 8 comments
Labels
OS-windows performance Performance or resource usage

Comments

@vstinner
Copy link
Member

BPO 19581
Nosy @tim-one, @pitrou, @vstinner, @serhiy-storchaka
Files
  • writer_overallocate_factor.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-11-18.20:30:50.933>
    created_at = <Date 2013-11-14.09:37:25.675>
    labels = ['OS-windows', 'performance']
    title = 'PyUnicodeWriter: change the overallocation factor for Windows'
    updated_at = <Date 2013-11-18.20:30:50.932>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2013-11-18.20:30:50.932>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-11-18.20:30:50.933>
    closer = 'vstinner'
    components = ['Windows']
    creation = <Date 2013-11-14.09:37:25.675>
    creator = 'vstinner'
    dependencies = []
    files = ['32610']
    hgrepos = []
    issue_num = 19581
    keywords = ['patch']
    message_count = 8.0
    messages = ['202823', '202835', '202849', '202851', '202855', '202861', '203265', '203320']
    nosy_count = 5.0
    nosy_names = ['tim.peters', 'pitrou', 'vstinner', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue19581'
    versions = ['Python 3.4']

    @vstinner
    Copy link
    Member Author

    PyUnicodeWriter currently overallocates the internal buffer by 25%. On Windows, PyUnicodeWriter is slower than PyAccu API. With an overallocation factor of 50%, PyUnicodeWriter is fastter.

    See this message for the benchmark:
    http://bugs.python.org/issue19513#msg202312

    We might also change the factor on all platform, performances are almost the same with a factor of 25% or 50% on Linux.

    @vstinner vstinner added OS-windows performance Performance or resource usage labels Nov 14, 2013
    @pitrou
    Copy link
    Member

    pitrou commented Nov 14, 2013

    Your patch implies that the two only supported OSes are Linux and Windows :-) To be honest I think we should have one single overallocation factor for all OSes. If 50% is ok on Linux too, then let it be 50%.

    (did you run stringbench to see if it made a difference?)

    @serhiy-storchaka
    Copy link
    Member

    Many collections in other programming languages use overallocating rate 100%. Perhaps overallocating rate 12.5% for Python lists is good compromise between speed and memory usage (as far as Python lists have no explicit resize method), but for short-living objects we can use larger overallocating rate.

    In theory for overallocating rate r long list needs in average O(1/log(1+r)) reallocations and O((r+1)/r) copyings per element. I.e. 50% overallocating rate is 3-3.4 times more efficient than current 12.5% overallocating rate and 100% overallocating rate is 1.5-1.7 times more efficient than 50% overallocating rate (in terms of numbers of reallocations and copyings).

    I'm interesting what will happened when increase overallocating rate (50% or 100%) for PyAccu.

    @vstinner
    Copy link
    Member Author

    I chose 25% on Linux after some micro-benchmarks on str%args and str.format(args). If the buffer is too large, the final resize (because PyUnicodeObject must have the exact size) is slow. I suppose that realloc() can avoid copying data if the new is is very close, but has to allocate a new memory block and copy data if the new size is higher than a threshold. It's how _PyObject_Realloc() for example.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 14, 2013

    Did you find a difference for small strings vs. large strings?

    @vstinner
    Copy link
    Member Author

    Did you find a difference for small strings vs. large strings?

    When I replaced PyAccu with PyUnicodeWriter for str%args and str.format(args), I ran a lot of benchmarks with short, medium and large strings. See for example:
    http://bugs.python.org/file25687/REPORT_64BIT_2.7_3.2_writer

    See issues bpo-14716 and bpo-14744 for old benchmark results. If I remember correctly, this is the script used to run the benchmark:

    https://bitbucket.org/haypo/misc/src/7c2deb7a37353b41a45564ce6a98e07bbe0c691b/python/bench_str.py

    The script should be run using:

    https://bitbucket.org/haypo/misc/src/7c2deb7a37353b41a45564ce6a98e07bbe0c691b/python/benchmark.py?at=default

    I was concerned by performances on short strings because most calls to str%args are short strings.

    @vstinner
    Copy link
    Member Author

    Your patch implies that the two only supported OSes are Linux and Windows :-)

    It more means that Windows memory allocator is different to the one used on all other operating systems.

    Well, if you are not convinced, we can keep the overallocation factor of 25%: performances are not so bad. The difference between 25% and 50% is low.

    --

    Benchmark on patched repr(list) (to use PyUnicodeWriter, see issue bpo-19513) using different overallocation factors on Linux. The best factor is 25% (the current factor).

    Platform: Linux-3.9.4-200.fc18.x86_64-x86_64-with-fedora-18-Spherical_Cow
    Bits: int=32, long=64, long long=64, size_t=64, void*=64
    Timer: time.perf_counter
    SCM: hg revision=00348c0518f8+ tag=tip branch=default date="2013-11-18 10:04 +0100"
    Python unicode implementation: PEP-393
    CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
    CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
    Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
    Python version: 3.4.0a4+ (default:00348c0518f8+, Nov 18 2013, 10:11:42) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
    Timer precision: 40 ns

    -----------------------------+-------------+----------------+----------------+--------------
    Tests                        |   writer-25 |    writer-12.5 |      writer-50 |    writer-100
    -----------------------------+-------------+----------------+----------------+--------------
    list("a")                    |  247 ns () |  272 ns (+10%) |   265 ns (+7%) | 283 ns (+14%)
    list("abc")                  |  435 ns (
    ) |   407 ns (-7%) |         430 ns |        427 ns
    ["a"](100)                  | 8.26 us () |         8.4 us |  8.76 us (+6%) |       7.93 us
    ["abc"](100)                | 7.97 us () | 8.75 us (+10%) | 9.19 us (+15%) |       8.37 us
    ["a" * 100](100)            | 35.7 us () |        36.8 us |        36.6 us |       35.6 us
    ["a"](10**6)                |   73 ms () |  77.4 ms (+6%) | 84.9 ms (+16%) | 78.3 ms (+7%)
    ["abc"](10**6)              | 76.6 ms () |  81.1 ms (+6%) | 90.3 ms (+18%) | 82.4 ms (+8%)
    ["a" * 100](10**5)          | 35.1 ms () |        35.3 ms |        35.8 ms | 37.6 ms (+7%)
    list(range(10**6))           | 93.4 ms () |  102 ms (+10%) |  103 ms (+10%) | 105 ms (+12%)
    list(map(str, range(10**6))) |   97 ms (
    ) |        94.1 ms |        99.7 ms | 91.2 ms (-6%)
    -----------------------------+-------------+----------------+----------------+--------------
    Total                        |  375 ms (*) |         390 ms |  414 ms (+10%) |  394 ms (+5%)
    -----------------------------+-------------+----------------+----------------+--------------

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 18, 2013

    New changeset 093b9838a41c by Victor Stinner in branch 'default':
    Issue bpo-19581: Change the overallocation factor of _PyUnicodeWriter on Windows
    http://hg.python.org/cpython/rev/093b9838a41c

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants