Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C accelerator for collections.Counter is slow #62794

Closed
scoder opened this issue Jul 30, 2013 · 14 comments
Closed

C accelerator for collections.Counter is slow #62794

scoder opened this issue Jul 30, 2013 · 14 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@scoder
Copy link
Contributor

scoder commented Jul 30, 2013

BPO 18594
Nosy @rhettinger, @jcea, @scoder, @serhiy-storchaka, @phmc
Files
  • counterbench.py: Benchmark different methods of counting
  • collections_Counter_without_C_accelerator_with_comprehension.patch: collections.Counter without C accelerator patch
  • performance_comparision.csv: performance comparison with and without C accelerator
  • fix_counter.diff: Repair the test for choosing the fast path
  • fix_counter2.diff: Make the slow path match the pure Python version.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2013-10-02.04:44:43.471>
    created_at = <Date 2013-07-30.06:49:20.393>
    labels = ['library', 'performance']
    title = 'C accelerator for collections.Counter is slow'
    updated_at = <Date 2013-10-04.23:53:36.091>
    user = 'https://github.com/scoder'

    bugs.python.org fields:

    activity = <Date 2013-10-04.23:53:36.091>
    actor = 'python-dev'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2013-10-02.04:44:43.471>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2013-07-30.06:49:20.393>
    creator = 'scoder'
    dependencies = []
    files = ['31085', '31774', '31775', '31918', '31932']
    hgrepos = []
    issue_num = 18594
    keywords = ['patch']
    message_count = 14.0
    messages = ['193914', '193928', '193929', '197806', '197807', '198675', '198683', '198685', '198712', '198753', '198756', '198757', '198817', '198971']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'jcea', 'scoder', 'python-dev', 'serhiy.storchaka', 'pconnell', 'Anoop.Thomas.Mathew']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue18594'
    versions = ['Python 3.3', 'Python 3.4']

    @scoder
    Copy link
    Contributor Author

    scoder commented Jul 30, 2013

    The C accelerator for the collections.Counter class (_count_elements() in _collections.c) is slower than the pure Python versions for data that has many unique entries. This is because the fast path for dicts is not taken (Counter is a subtype of dict) and the slower fallback path raises exceptions for each value that wasn't previously seen. This can apparently make it slower than calling get() on Python side.

    My suggestion is to drop the fallback path from the accelerator completely and to only call the C function when it's safe to use it, e.g. when "type(self) is Counter" and not a subclass.

    @scoder scoder added stdlib Python modules in the Lib dir performance Performance or resource usage labels Jul 30, 2013
    @serhiy-storchaka
    Copy link
    Member

    Are there any cases where the counter class with the C accelerator is faster than the pure Python version? Here is a benchmarking script (modification of Roy Smith's script [1]) and looks as the pure Python version is faster even for data that has not many unique entries.

    [1] http://permalink.gmane.org/gmane.comp.python.general/738820

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jul 30, 2013

    That sounds like a good idea, Stefan.

    @AnoopThomasMathew
    Copy link
    Mannequin

    AnoopThomasMathew mannequin commented Sep 15, 2013

    40% faster collections.Counter() . Removed C accelerator. Patch attached. Passes all tests. Results comparison follows.

    @AnoopThomasMathew
    Copy link
    Mannequin

    AnoopThomasMathew mannequin commented Sep 15, 2013

    Performance comparison with and without patch applied.

    @rhettinger rhettinger self-assigned this Sep 30, 2013
    @rhettinger
    Copy link
    Contributor

    A C-accelerator should ALWAYS be able to beat a pure python version if it does the same steps but without the overhead of the eval-loop. And in special cases such as type(self)==Counter, it can do much better.

    To resolve this report, the C accelerator needs to be fixed. As Stefan pointed-out, the fast-path isn't being triggered because of PyDict_CheckExact test. And, the fallback path can be sped-up as well (by doing the same steps in C as are being done with the pure python code).

    @rhettinger
    Copy link
    Contributor

    Repaired version
    ----------------

    $ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(1000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
    100 loops, best of 3: 14.3 msec per loop
    $ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(500000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
    10 loops, best of 3: 40.8 msec per loop

    Current with accelerator
    ------------------------

    $ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(1000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
    10 loops, best of 3: 61.7 msec per loop
    $ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(500000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
    10 loops, best of 3: 118 msec per loop

    Current without accelerator
    ---------------------------

    $ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(1000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
    10 loops, best of 3: 54.9 msec per loop
    $ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(500000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
    10 loops, best of 3: 80.8 msec per loop

    @scoder
    Copy link
    Contributor Author

    scoder commented Sep 30, 2013

    Patch LGTM and seems to work well, according to your numbers.

    Only minor nitpick would be that the method references could be decref-ed earlier, but that would complicate the code a bit.

    @serhiy-storchaka
    Copy link
    Member

    Benchmarking results look great.

    But isn't _PyObject_LookupSpecial() more appropriate function for special methods lookup than PyObject_GetAttrString()?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 1, 2013

    New changeset 6aef095fdb30 by Raymond Hettinger in branch '3.3':
    Issue bpo-18594: Fix the fast path for collections.Counter().
    http://hg.python.org/cpython/rev/6aef095fdb30

    @rhettinger
    Copy link
    Contributor

    Attaching a patch for the slow path. Makes the code exactly match the pure python version. This kicks in whether someone has subclassed Counter and overridden either __getitem__ or __setitem__.

    @scoder
    Copy link
    Contributor Author

    scoder commented Oct 1, 2013

    Can you update the benchmark numbers to show what the difference is compared to pure Python (and to the fastpath) now?

    One more thing: the fastpath depends on .__getitem__() and friends, whereas the fallback path depends on .get(). What if someone overrides .get() but not .__getitem__()? (Might be a hypothetical case...)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 2, 2013

    New changeset 1ee6f8a96fb9 by Raymond Hettinger in branch '3.3':
    Issue bpo-18594: Fix the fallback path in collections.Counter().
    http://hg.python.org/cpython/rev/1ee6f8a96fb9

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 4, 2013

    New changeset e4cec1116e5c by Raymond Hettinger in branch '3.3':
    Issue bpo-18594: Make the C code more closely match the pure python code.
    http://hg.python.org/cpython/rev/e4cec1116e5c

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants