Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array compare is hideously slow #68888

swanson mannequin opened this issue Jul 23, 2015 · 8 comments

array compare is hideously slow #68888

swanson mannequin opened this issue Jul 23, 2015 · 8 comments
3.7 performance stdlib


Copy link

@swanson swanson mannequin commented Jul 23, 2015

BPO 24700
Nosy @rhettinger, @pitrou, @MojoVampire, @adrian17, @lou1306
  • #3009
  • Files
  • Testing comparison with subclassed array
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-08-17.12:46:35.899>
    created_at = <Date 2015-07-23.23:01:14.149>
    labels = ['3.7', 'library', 'performance']
    title = 'array compare is hideously slow'
    updated_at = <Date 2017-08-17.12:46:35.898>
    user = '' fields:

    activity = <Date 2017-08-17.12:46:35.898>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-08-17.12:46:35.899>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2015-07-23.23:01:14.149>
    creator = 'swanson'
    dependencies = []
    files = ['46675']
    hgrepos = []
    issue_num = 24700
    keywords = []
    message_count = 8.0
    messages = ['247234', '247236', '288661', '299811', '299860', '299900', '300415', '300416']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'pitrou', 'josh.r', 'swanson', 'Adrian Wielgosik', 'lou1306']
    pr_nums = ['3009']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = ''
    versions = ['Python 3.7']

    Copy link
    Mannequin Author

    @swanson swanson mannequin commented Jul 23, 2015

    Comparing two array.array objects is much slower than it ought to be.

    The whole point of array.array is to be efficient:
    "array — Efficient arrays of numeric values"

    But comparing them is orders of magnitude less efficient than comparing tuples or lists of numbers. It would seem that array's __eq__ is converting every single number into an int, float, etc. first instead of just comparing the arrays in their native format.
    If arrays can be copied efficiently, and bytearray can be copied or compared efficiently, there's no good reason for array's equality test to be so stupid.

    example code:

    from timeit import timeit
    setup = '''
    from array import array
    a = array("I", %s)
    ac = a.__copy__
    b = ac()
    t = tuple(a)
    u = t[:1] + t[1:]
    for init in ("[1]*10", "[0xffffffff]*10", "[1]*1000", "[0xffffffff]*1000"):
     print("\n", init)
     for action in ("ac()", "a == b", "a == ac()", "t == u"):
      print(("%6.2f" % timeit(action, setup % init)), action)


    0.31 ac()
    0.50 a == b
    0.73 a == ac()
    0.17 t == u

    0.29 ac()
    1.59 a == b
    1.87 a == ac()
    0.15 t == u

    0.84 ac()
    37.06 a == b
    37.72 a == ac()
    2.91 t == u

    0.84 ac()
    146.03 a == b
    145.97 a == ac()
    2.90 t == u

    @swanson swanson mannequin added stdlib performance labels Jul 23, 2015
    Copy link

    @MojoVampire MojoVampire mannequin commented Jul 24, 2015

    You're correct about what is going on; aside from bypassing a bounds check (when not compiled with asserts enabled), the function it uses to get each index is the same as that used to implement indexing at the Python layer. It looks up the getitem function appropriate to the type code over and over, then calls it to create the PyLongObject and performs a rich compare.

    The existing behavior is probably necessary to work with array subclasses, but it's also incredibly slow as you noticed. Main question is whether to keep the slow path for subclasses, or (effectively) require that array subclasses overriding __getitem__ also override he rich comparison operators to make them work as expected.

    For cases where the signedness and element size are identical, it's trivial to acquire readonly buffers for both arrays and directly compare the memory (with memcmp for EQ/NE or size 1 elements, wmemcmp for appropriately sized wider elements, and simple loops for anything else).

    Copy link

    @lou1306 lou1306 mannequin commented Feb 27, 2017

    I noticed the issue is still there in Python 3.6.
    But I can't see why array subclasses should be the reason for this implementation.
    By looking at getarrayitem, it looks like __getitem__ does not play any role in how the elements are accessed.
    Consider the attached example, where SubclassedArray.__getitem__ is overridden to always return 0: nonetheless, equality checks with an array.array containing the same elements always succeed.

    For cases where the signedness and element size are identical, it's trivial to acquire readonly buffers for both arrays and directly compare the memory

    I would argue that _integerness_ sholud also be identical: otherwise
    array("l", range(10)) == array("f", range(10))
    would evaluate to False, while it is True in the current implementation.

    Copy link

    @adrian17 adrian17 mannequin commented Aug 6, 2017

    Added a PR with a fast path that triggers when compared arrays store values of the same type. In this fast path, no Python objects are created. For big arrays the runtime reduction can reach 50-100x.

    It's possible to optimize the comparison loop a bit more - for example array('B') comparison could use memcmp and become extra 10x faster, and other types could receive similar treatment - but I wanted to keep the code relatively simple.
    Code duplication with macros is a bit ugly, but that's the best I could come up with so far.

    Benchmark results:

    Test Before After % of old time
    Equal, same stored type (new fast path)
    array('i', range(0)) 20.4ns 22.07ns 108.15%
    array('i', range(1)) 33.39ns 22.32ns 66.86%
    array('i', range(10)) 152.0ns 31.21ns 20.54%
    array('i', range(10000)) 447.7us 6.571us 1.47%
    array('i', range(100000)) 4.626ms 67.24us 1.45%
    array('i', [1<<30]*100000)) 5.234ms 65.8us 1.26%
    array('B', range(10)) 151.8ns 28.53ns 18.79%
    array('B', range(250)) 3.14us 194.5ns 6.19%
    array('B', [1,2,3]*1000) 37.76us 2.088us 5.53%
    array('d', range(10)) 311.9ns 31.22ns 10.01%
    array('d', range(100000)) 2.889ms 99.3us 3.44%
    Equal, different types (slow path)
    array('X', range(0)) 20.37ns 19.45ns 95.48%
    array('X', range(1)) 34.87ns 34.42ns 98.72%
    array('X', range(10)) 169.1ns 169.0ns 99.97%
    array('X', range(10000)) 462.2us 444.8us 96.23%
    array('X', range(100000)) 4.752ms 4.571ms 96.20%
    Not equal: first element (X) different
    array('i', [X] + [1,2,3]*10000) 42.77ns 21.84ns 51.06%
    Not equal: last element (X) different
    array('i', [1,2,3]*10000 + [X]) 375.4us 19.8us 5.27%

    Copy link

    @pitrou pitrou commented Aug 7, 2017

    Not saying that this issue shouldn't be fixed, but Numpy arrays are a much more powerful and well-optimized replacement for array.array.

    Copy link

    @rhettinger rhettinger commented Aug 8, 2017

    The PR looks very reasonable. This is a nice optimization.

    @pitrou pitrou added the 3.7 label Aug 8, 2017
    Copy link

    @pitrou pitrou commented Aug 17, 2017

    New changeset 7c17e23 by Antoine Pitrou (Adrian Wielgosik) in branch 'master':
    bpo-24700: Add a fast path for comparing array.array of equal type (bpo-3009)

    Copy link

    @pitrou pitrou commented Aug 17, 2017

    Adrian's PR was merged. Thank you Adrian!

    @pitrou pitrou closed this as completed Aug 17, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    3.7 performance stdlib
    None yet

    No branches or pull requests

    2 participants