-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
enum.Flag instance creation is slow #82226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Consider the following example from enum import Flag
F = Flag("F", list("abcdefghijklm"))
for idx in range(2**len(F) - 1):
F(idx) creating all possible combos of 13 flags, so 8192 instances (yes, I know the instances are cached later, but the initial cost is still significant). Locally, this takes over 4.5s to execute; profiling shows that ~30% of that time is spent executing enum._is_power_of_two(!): def _power_of_two(value):
if value < 1:
return False
return value == 2 ** _high_bit(value) Indeed, replacing the implementation of _power_of_two by import math
_powers_of_two = {
2 ** i for i in range(math.ceil(math.log2(sys.maxsize)) + 1)}
def _power_of_two(value):
return (value in _powers_of_two if value <= sys.maxsize
else value == 2 ** _high_bit(value)) shaves off ~30% of the runtime (obviously this won't help for huge (>sys.maxsize) flag values, but these are rarer I would think). An even better fix, I think, would be for Flag to cache flags_to_check = [
(m, v)
for v, m in list(flag._value2member_map_.items())
if m.name is not None or _power_of_two(v)
] as this actually needlessly introduces quadratic complexity when iterating over all possible combos (because the size of _value2member_map_ is growing). (Also, it's not actually clear to me how one can end up with unnamed power-of-two instances in _value2member_map?) |
The fast method to check if the value is a power of two:
|
Actually, microoptimizing _power_of_two is a red herring and the problem is the quadratic complexity of _decompose (value2member_map becomes bigger and bigger at each iteration). Replacing the implementation of _decompose by the following fixes the performance issue (the original snippet executes in ~150ms, which may be even more optimizable but already a 30x improvement :)), and still passes test_enum.py (from cpython 3.7.4): def _decompose(flag, value):
"""Extract all members from the value."""
# _decompose is only called if the value is not named
not_covered = value
members = []
for member in flag:
member_value = member.value
if member_value and member_value & value == member_value:
members.append(member)
not_covered &= ~member_value
if issubclass(flag, IntFlag) and value > 0:
while not_covered:
new_value = 2 ** _high_bit(not_covered)
if new_value not in flag._value2member_map_:
# construct singleton pseudo-members
pseudo_member = int.__new__(flag, value)
pseudo_member._name_ = None
pseudo_member._value_ = value
flag._value2member_map_.setdefault(value, pseudo_member)
members.append(flag(new_value))
not_covered &= ~new_value
if not members and value in flag._value2member_map_:
members.append(flag._value2member_map_[value])
members.sort(key=lambda m: m._value_, reverse=True)
if len(members) > 1 and members[0].value == value:
# we have the breakdown, don't need the value member itself
members.pop(0)
return members, not_covered Checking for issubclass(flag, IntFlag) is a bit ugly but that's because Flag and IntFlag really need two separate implementations of _decompose -- the former wants to error out on uncovered values, the latter should create them on-the-fly. |
Now fixed, I believe. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: