Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enum member lookup is 20x slower than normal class attribute lookup #67674

Closed
craigh mannequin opened this issue Feb 19, 2015 · 18 comments
Closed

Enum member lookup is 20x slower than normal class attribute lookup #67674

craigh mannequin opened this issue Feb 19, 2015 · 18 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@craigh
Copy link
Mannequin

craigh mannequin commented Feb 19, 2015

BPO 23486
Nosy @warsaw, @vstinner, @larryhastings, @ethanfurman, @serhiy-storchaka
Files
  • test.py
  • issue23486.stoneleaf.01.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ethanfurman'
    closed_at = <Date 2015-03-11.15:43:55.595>
    created_at = <Date 2015-02-19.21:59:13.317>
    labels = ['library', 'performance']
    title = 'Enum member lookup is 20x slower than normal class attribute lookup'
    updated_at = <Date 2015-03-11.15:45:36.889>
    user = 'https://bugs.python.org/craigh'

    bugs.python.org fields:

    activity = <Date 2015-03-11.15:45:36.889>
    actor = 'ethan.furman'
    assignee = 'ethan.furman'
    closed = True
    closed_date = <Date 2015-03-11.15:43:55.595>
    closer = 'python-dev'
    components = ['Library (Lib)']
    creation = <Date 2015-02-19.21:59:13.317>
    creator = 'craigh'
    dependencies = []
    files = ['38177', '38182']
    hgrepos = []
    issue_num = 23486
    keywords = ['patch']
    message_count = 18.0
    messages = ['236234', '236236', '236237', '236239', '236243', '236258', '236259', '236260', '236559', '237816', '237844', '237846', '237847', '237865', '237866', '237867', '237877', '237878']
    nosy_count = 8.0
    nosy_names = ['barry', 'vstinner', 'larry', 'craigh', 'eli.bendersky', 'ethan.furman', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue23486'
    versions = ['Python 3.5']

    @craigh
    Copy link
    Mannequin Author

    craigh mannequin commented Feb 19, 2015

    Running the attached test script:

    $ time python test.py enum

    real 0m6.546s
    user 0m6.530s
    sys 0m0.007s
    $ time python test.py int

    real 0m0.384s
    user 0m0.377s
    sys 0m0.000s

    I encountered this with a script that yielded a sequence of objects (potentially a few hundred thousand of them) and categorized them with instances of an Enum subclass. The consumer of that iteration processes each object with a switch-case-like comparison of the category, checking it sequentially against each instance of the Enum. This seems like a fairly common use case.

    From cProfile it looks like EnumMeta.__getattr__ and _is_dunder are the main bottlenecks:

    [...]
    7/1 0.000 0.000 0.000 0.000 abc.py:194(subclasscheck)
    1 0.000 0.000 0.001 0.001 enum.py:1(<module>)
    3 0.000 0.000 0.000 0.000 enum.py:132(<genexpr>)
    2000021 0.988 0.000 0.988 0.000 enum.py:16(_is_dunder)
    19 0.000 0.000 0.000 0.000 enum.py:24(_is_sunder)
    2000002 1.825 0.000 2.813 0.000 enum.py:241(getattr)
    17 0.000 0.000 0.000 0.000 enum.py:282(setattr)
    3 0.000 0.000 0.000 0.000 enum.py:342(get_mixins)
    3 0.000 0.000 0.000 0.000 enum.py:387(find_new)
    [...]

    @craigh craigh mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Feb 19, 2015
    @ethanfurman
    Copy link
    Member

    Craig Holmquist wrote:
    ---------------------

    The consumer of that iteration processes each object with a switch-case-like
    comparison of the category, checking it sequentially against each instance
    of the Enum.

    So for every object you compare against every Enum member? Is there a reason you don't just use the lookup capability?

      class Category(Enum):
        tiny = 1
        medium = 2
        large = 3
    
      cat = Category(obj.category) # assumes obj.category is 1, 2, or 3

    @craigh
    Copy link
    Mannequin Author

    craigh mannequin commented Feb 19, 2015

    I may not have been clear before. What I mean is, code like this:

    for obj in get_objects():
        if obj.category == Cat.cat1:
            #do something
        elif obj.category == Cat.cat2:
            #do something else
        elif obj.category == Cat.cat3 or obj.category == Cat.cat4:
            #...

    obj.category is already an instance of Cat, in other words. The consumer is using it to determine what to do with each obj.

    @craigh
    Copy link
    Mannequin Author

    craigh mannequin commented Feb 19, 2015

    It seems like performance is drastically improved by doing this:

    class Category(Enum):
        tiny = 1
        medium = 2
        large = 3
    
    tiny = Category.tiny
    medium = Category.medium

    In other words, resolving Category.tiny and Category.medium is what's slow. The comparison itself is very fast.

    @ethanfurman
    Copy link
    Member

    Yup, you have it figured out. It's the lookup that is the slowdown.

    When performance is an issue one of the standard tricks is to create a local name, like you did with "tiny = Category.tiny".

    For the curious (taken from the docstring for Enum.__getattr__):

    We use __getattr__ instead of descriptors or inserting into the enum
    class' __dict__ in order to support `name` and `value` being both
    properties for enum members (which live in the class' __dict__) and
    enum members themselves.
    

    It is possible to store all the enum members /except/ for 'name' and 'value' in the class' __dict__, but I'm not sure it's worth the extra complication.

    @ethanfurman ethanfurman changed the title Enum comparisons are 20x slower than comparing equivalent ints Enum member lookup is 20x slower than normal class attribute lookup Feb 19, 2015
    @ethanfurman
    Copy link
    Member

    Out of curiousity I tried: took two new lines, one modified line, and one comment. :)

    @ethanfurman
    Copy link
    Member

    Oh, and the slowdown dropped from 20 to 3 (for non-DynamicClassAttributes -- which is probably more than 99% of them).

    @vstinner
    Copy link
    Member

    I don't understand the patch, but 3x slower instead of 20x slower is a huge optimization :-) Do you plan to change Python 3.5 *and* Python 3.4?

    @ethanfurman
    Copy link
    Member

    This isn't a change to the API or any visible user behavior (besides performance), so I don't see a reason to not add it to 3.4.

    @ethanfurman ethanfurman self-assigned this Mar 10, 2015
    @ethanfurman
    Copy link
    Member

    Larry, I have a very small patch (~4 lines) that does change user behavior or the API, but does have a significant performance boost.

    I'm still learning what is/is not okay to add to maintenance releases, so wanted to run this by you.

    @larryhastings
    Copy link
    Contributor

    My inclination is 3.5 only. Barry, do you want to argue for this going into 3.4?

    @ethanfurman
    Copy link
    Member

    Argh, sorry -- that was supposed to be *does not* change user behavior nor the API, it's *just* a performance increase.

    Does that change your inclination?

    @larryhastings
    Copy link
    Contributor

    Oh, I read the code. But it's a performance hack, and the rules say we only accept security fixes and bug fixes at this stage of the release, and they're the rules for good reasons.

    @warsaw
    Copy link
    Member

    warsaw commented Mar 11, 2015

    Poor performance could fall under the category of bug fixes, so for an in-maintenance mode release, a fix that does not in any way change user visible behavior could be acceptable. It would probably be fine for 3.4 but I'm just +0 on it. Larry's call.

    @serhiy-storchaka
    Copy link
    Member

    This is not a regression (there were no enums before 3.4), slow down is not critical (only constant factor, not increased computational complexity), there is a workaround, and the code that just use constants that were converted to IntEnum is not affected. I'm -0 on it.

    @ethanfurman
    Copy link
    Member

    In getting everything fixed up and tested I realized there was one slight user-facing change: with this patch it is now possible to say:

      SomeEnum.SomeMember = SomeMember

    In other words, it is possible to set a value on the class as long as it is the same value that already existed.

    3.5 sounds good.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 11, 2015

    New changeset 2545bfe0d273 by Ethan Furman in branch 'default':
    Close bpo-23486: performance boost for enum member lookup
    https://hg.python.org/cpython/rev/2545bfe0d273

    @python-dev python-dev mannequin closed this as completed Mar 11, 2015
    @ethanfurman
    Copy link
    Member

    Slight reordering of code removed the one user visible change.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants