-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Improve performance of enum_ operators by going back to specific implementation #5887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ementation test_enum needs a patch because ops are now overloaded and this affects their docstrings.
This does cause more move constructions, as shown by the needed update to test_copy_move. Up to reviewers whether they want more code size or more moves.
…see mostly-not-red tests
|
test failures look like they're caused by disagreement on how many move operations we're performing and are caused by the "outline call_impl to save on code size" commit specifically. I am unclear about how important it is to minimize the number of move operations we perform, so I've tentatively just added another commit that should make the tests work for C++17, and we can talk about what to do from here. |
Add a static_assert to document and enforce that function_ref is trivially copyable, ensuring safe pass-by-value usage. This also documents the lifetime safety guarantees: function_ref is created from cap->f which lives in the capture object, and is only used synchronously within call_impl without being stored beyond its scope.
Undefine all enum operator macros after their last use to prevent macro pollution and follow the existing code pattern. This matches the cleanup pattern used for the previous enum operator macros.
Rename the macro to be more specific and avoid potential clashes with public macros. The new name clearly indicates it's scoped to enum operations and describes its purpose (throwing a type error).
Replace vague comments about 'extensions to <functional>' and 'functions' with a clearer description that this is a header-only class template similar to std::function but with non-owning semantics. This makes it clear that it's template-only and requires no additional library linking.
|
Hi @swolchok I used Cursor to review this PR. It generated the four added commits, which are all of the relatively minor and polishing kind:
I'm waiting for the CI to see if they work on all platforms. I also added two sections to the PR description, to explain that there is a behavior change, and why we're still optimizing a deprecated type. Could you please review my commits and the new sections in the PR description? |
LGTM |
Thanks! I think this is ready for merging, but ... wrt my comment from yesterday: Because of the behavior change, I think it's best to merge this PR only after the |
Description
This improves the performance of
enum_operators by no longer attempting to funnel them all through a generic implementation, which caused additional overhead related to callingint().Behavior change: Multiple operator overloads
This PR changes how enum operators are implemented for convertible enums (enums that can be implicitly converted to their underlying integer type). Previously, operators like
__eq__,__ne__, and arithmetic operators used a single generic implementation that handled both enum-to-enum and enum-to-scalar comparisons internally.New implementation:
__eq__now has two separate overloads:__eq__(self: MyEnum, other: MyEnum, /) -> bool- for enum-to-enum comparison__eq__(self: MyEnum, other: int, /) -> bool- for enum-to-scalar comparison<,>,<=,>=,&,|,^, etc.) now have separate overloads for enum-to-enum and enum-to-scalar operationsImpact:
int()calls) by using direct C++ comparisons, resulting in the ~2x performance improvement shown in the benchmarksRationale for optimizing
py::enum_While
py::enum_was declared deprecated in pybind11 v3.0.0 in favor ofpy::native_enum, many existing codebases still rely onpy::enum_and cannot be migrated overnight. Large projects with extensive enum usage require careful planning and testing to transition topy::native_enum. This optimization provides immediate performance benefits for these existing codebases during their migration period, reducing the performance gap betweenpy::enum_andpy::native_enumfrom approximately 18x slower to approximately 9x slower (based on the benchmark results below). For new code,py::native_enumremains the recommended choice as it offers the best performance and is the long-term supported API.Benchmark results
using https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47 (the current tip of the benchmark-updates branch):
Enum equality comparison
Command:
python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x != x'Times are nsec/loop
M4 Mac, before: 165, 167, 166, 164, 167
Mac, after: 78.9, 78.9, 79.7, 79.9, 80.5
Enum ordering comparison
Command:
python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x < x'Mac, before: 170, 168, 168, 171, 168
Mac, after: 79.5, 78.8, 80.8, 81.3, 82.3
(i.e., no difference between
!=and<)Compare to performance of calling a method of a simple pybinded class:
Command:
python -m timeit --setup 'from pybind11_benchmark import MyInt; x = MyInt()' 'x.get()'Mac: 54.6, 54.6, 54.9, 55.3, 55.3
Also compare to performance using a
py::native_enum:Command:
python -m timeit --setup 'from pybind11_benchmark import MyNativeEnum; x = MyNativeEnum.THREE' 'x < x'Mac: 9.12, 9.13, 9.2, 9.21, 9.34
(I note that the above benchmarks do have a tendency toward monotonically increasing times across runs, but that effect seems to be much smaller than the effect of the code changes.)
Code size:
py::arithmeticenum_ before this PR as measured on my Mac by adding an extra enum to the pybind11_benchmark (specifically https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47) was a little over 8 KiB of__text, plus some about 1000 bytes of__gcc_except_taband negligible amounts in other sections. After this PR, the marginal cost increases to a little over 17000 bytes of__text, almost 2000 bytes of__gcc_except_tab, and a few hundred bytes in other sections. I believe @Skylion007 previously mentioned that this seemed like a reasonable order of magnitude of marginal cost.__textfell by about 12500 bytes and__gcc_except_tabfell by a little over 2000 bytes, though there were negligible size increases in other sections.Suggested changelog entry:
py::enum_s, thoughpy::native_enumis still much faster.