Improve performance of enum_ operators by going back to specific implementation #5887

swolchok · 2025-10-31T20:12:00Z

Description

This improves the performance of enum_ operators by no longer attempting to funnel them all through a generic implementation, which caused additional overhead related to calling int().

Behavior change: Multiple operator overloads

This PR changes how enum operators are implemented for convertible enums (enums that can be implicitly converted to their underlying integer type). Previously, operators like __eq__, __ne__, and arithmetic operators used a single generic implementation that handled both enum-to-enum and enum-to-scalar comparisons internally.

New implementation:

For convertible enums, operators now use multiple type-specific overloads instead of a single generic implementation
For example, __eq__ now has two separate overloads:
- __eq__(self: MyEnum, other: MyEnum, /) -> bool - for enum-to-enum comparison
- __eq__(self: MyEnum, other: int, /) -> bool - for enum-to-scalar comparison
Similarly, arithmetic operators (<, >, <=, >=, &, |, ^, etc.) now have separate overloads for enum-to-enum and enum-to-scalar operations

Impact:

Performance: This change eliminates Python object conversion overhead (int() calls) by using direct C++ comparisons, resulting in the ~2x performance improvement shown in the benchmarks
Docstrings: When pybind11 generates docstrings for operators with multiple overloads, it lists all available signatures. This means the docstring format changes from showing a single signature to showing multiple signatures (one per overload)
API compatibility: The runtime behavior remains the same - users can still compare enums to enums or enums to scalars exactly as before. Only the internal implementation and docstring format have changed
Test updates: The enum operator docstring tests were updated to accommodate the new multi-overload docstring format by checking that the docstring starts with the operation name and contains the expected signature(s) anywhere in the docstring (not necessarily at the start)

Rationale for optimizing `py::enum_`

While py::enum_ was declared deprecated in pybind11 v3.0.0 in favor of py::native_enum, many existing codebases still rely on py::enum_ and cannot be migrated overnight. Large projects with extensive enum usage require careful planning and testing to transition to py::native_enum. This optimization provides immediate performance benefits for these existing codebases during their migration period, reducing the performance gap between py::enum_ and py::native_enum from approximately 18x slower to approximately 9x slower (based on the benchmark results below). For new code, py::native_enum remains the recommended choice as it offers the best performance and is the long-term supported API.

Benchmark results

using https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47 (the current tip of the benchmark-updates branch):

Enum equality comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x != x'
Times are nsec/loop

M4 Mac, before: 165, 167, 166, 164, 167
Mac, after: 78.9, 78.9, 79.7, 79.9, 80.5

Enum ordering comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x < x'

Mac, before: 170, 168, 168, 171, 168
Mac, after: 79.5, 78.8, 80.8, 81.3, 82.3

(i.e., no difference between != and <)

Compare to performance of calling a method of a simple pybinded class:
Command: python -m timeit --setup 'from pybind11_benchmark import MyInt; x = MyInt()' 'x.get()'

Mac: 54.6, 54.6, 54.9, 55.3, 55.3

Also compare to performance using a py::native_enum:
Command: python -m timeit --setup 'from pybind11_benchmark import MyNativeEnum; x = MyNativeEnum.THREE' 'x < x'

Mac: 9.12, 9.13, 9.2, 9.21, 9.34

(I note that the above benchmarks do have a tendency toward monotonically increasing times across runs, but that effect seems to be much smaller than the effect of the code changes.)

Code size:

the marginal code cost of 1 py::arithmetic enum_ before this PR as measured on my Mac by adding an extra enum to the pybind11_benchmark (specifically https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47) was a little over 8 KiB of __text, plus some about 1000 bytes of __gcc_except_tab and negligible amounts in other sections. After this PR, the marginal cost increases to a little over 17000 bytes of __text, almost 2000 bytes of __gcc_except_tab, and a few hundred bytes in other sections. I believe @Skylion007 previously mentioned that this seemed like a reasonable order of magnitude of marginal cost.
interestingly, the baseline size of that commit of pybind11_benchmark had its size decrease: __text fell by about 12500 bytes and __gcc_except_tab fell by a little over 2000 bytes, though there were negligible size increases in other sections.
The second commit on this branch, entitled "outline call_impl to save on code size", is specifically a code size mitigation. It is not necessary for correctness and can be dropped if we don't feel it is worthwhile.

Suggested changelog entry:

Improve performance of operators for py::enum_s, though py::native_enum is still much faster.

…ementation test_enum needs a patch because ops are now overloaded and this affects their docstrings.

This does cause more move constructions, as shown by the needed update to test_copy_move. Up to reviewers whether they want more code size or more moves.

…see mostly-not-red tests

swolchok · 2025-11-03T23:17:31Z

test failures look like they're caused by disagreement on how many move operations we're performing and are caused by the "outline call_impl to save on code size" commit specifically. I am unclear about how important it is to minimize the number of move operations we perform, so I've tentatively just added another commit that should make the tests work for C++17, and we can talk about what to do from here.

Add a static_assert to document and enforce that function_ref is trivially copyable, ensuring safe pass-by-value usage. This also documents the lifetime safety guarantees: function_ref is created from cap->f which lives in the capture object, and is only used synchronously within call_impl without being stored beyond its scope.

Undefine all enum operator macros after their last use to prevent macro pollution and follow the existing code pattern. This matches the cleanup pattern used for the previous enum operator macros.

Rename the macro to be more specific and avoid potential clashes with public macros. The new name clearly indicates it's scoped to enum operations and describes its purpose (throwing a type error).

Replace vague comments about 'extensions to <functional>' and 'functions' with a clearer description that this is a header-only class template similar to std::function but with non-owning semantics. This makes it clear that it's template-only and requires no additional library linking.

rwgk · 2025-11-11T20:11:05Z

Hi @swolchok I used Cursor to review this PR. It generated the four added commits, which are all of the relatively minor and polishing kind:

279c72a Add static assertion for function_ref lifetime safety in call_impl
a580cce Add #undef cleanup for enum operator macros
0287ec6 Rename PYBIND11_THROW to PYBIND11_ENUM_OP_THROW_TYPE_ERROR
35b4b8f Clarify comments in function_ref.h

I'm waiting for the CI to see if they work on all platforms.

I also added two sections to the PR description, to explain that there is a behavior change, and why we're still optimizing a deprecated type.

Could you please review my commits and the new sections in the PR description?

swolchok · 2025-11-11T20:54:21Z

Could you please review my commits and the new sections in the PR description?

LGTM

rwgk · 2025-11-11T22:00:57Z

Could you please review my commits and the new sections in the PR description?

LGTM

Thanks! I think this is ready for merging, but ...

wrt my comment from yesterday:

Because of the behavior change, I think it's best to merge this PR only after the v3.0.2 patch release. I.e. I plan to merge this along with #5879 to start v3.1.0a0.

swolchok added 3 commits October 31, 2025 12:49

Improve performance of enum_ operators by going back to specific impl…

f34a039

…ementation test_enum needs a patch because ops are now overloaded and this affects their docstrings.

outline call_impl to save on code size

a65e79d

This does cause more move constructions, as shown by the needed update to test_copy_move. Up to reviewers whether they want more code size or more moves.

add function_ref.h to PYBIND11_HEADERS.

ffb981c

swolchok requested a review from henryiii as a code owner October 31, 2025 21:33

swolchok added 2 commits November 3, 2025 15:16

Update test_copy_move tests with C++17 passing values just so we can …

65e5866

…see mostly-not-red tests

Remove stray TODO

729e9f8

swolchok added 2 commits November 5, 2025 09:32

fix clang-tidy

f24ded5

fix clang-tidy again. add function_ref.h to test_files.py

c032b53

rwgk mentioned this pull request Nov 11, 2025

Expand float and complex strict mode to allow ints and ints/float. #5879

Open

rwgk added 5 commits November 10, 2025 22:13

Merge branch 'master' into swolchok→enum-perf

bd014c0

Add #undef cleanup for enum operator macros

a580cce

Undefine all enum operator macros after their last use to prevent macro pollution and follow the existing code pattern. This matches the cleanup pattern used for the previous enum operator macros.

Rename PYBIND11_THROW to PYBIND11_ENUM_OP_THROW_TYPE_ERROR

0287ec6

Rename the macro to be more specific and avoid potential clashes with public macros. The new name clearly indicates it's scoped to enum operations and describes its purpose (throwing a type error).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of enum_ operators by going back to specific implementation #5887

Improve performance of enum_ operators by going back to specific implementation #5887

Uh oh!

swolchok commented Oct 31, 2025 •

edited by rwgk

Loading

Uh oh!

swolchok commented Nov 3, 2025

Uh oh!

rwgk commented Nov 11, 2025

Uh oh!

swolchok commented Nov 11, 2025

Uh oh!

rwgk commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve performance of enum_ operators by going back to specific implementation #5887

Are you sure you want to change the base?

Improve performance of enum_ operators by going back to specific implementation #5887

Uh oh!

Conversation

swolchok commented Oct 31, 2025 • edited by rwgk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Behavior change: Multiple operator overloads

Rationale for optimizing py::enum_

Benchmark results

Code size:

Suggested changelog entry:

Uh oh!

swolchok commented Nov 3, 2025

Uh oh!

rwgk commented Nov 11, 2025

Uh oh!

swolchok commented Nov 11, 2025

Uh oh!

rwgk commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

swolchok commented Oct 31, 2025 •

edited by rwgk

Loading

Rationale for optimizing `py::enum_`