Skip to content

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Oct 31, 2025

Description

This improves the performance of enum_ operators by no longer attempting to funnel them all through a generic implementation, which caused additional overhead related to calling int().

Behavior change: Multiple operator overloads

This PR changes how enum operators are implemented for convertible enums (enums that can be implicitly converted to their underlying integer type). Previously, operators like __eq__, __ne__, and arithmetic operators used a single generic implementation that handled both enum-to-enum and enum-to-scalar comparisons internally.

New implementation:

  • For convertible enums, operators now use multiple type-specific overloads instead of a single generic implementation
  • For example, __eq__ now has two separate overloads:
    • __eq__(self: MyEnum, other: MyEnum, /) -> bool - for enum-to-enum comparison
    • __eq__(self: MyEnum, other: int, /) -> bool - for enum-to-scalar comparison
  • Similarly, arithmetic operators (<, >, <=, >=, &, |, ^, etc.) now have separate overloads for enum-to-enum and enum-to-scalar operations

Impact:

  • Performance: This change eliminates Python object conversion overhead (int() calls) by using direct C++ comparisons, resulting in the ~2x performance improvement shown in the benchmarks
  • Docstrings: When pybind11 generates docstrings for operators with multiple overloads, it lists all available signatures. This means the docstring format changes from showing a single signature to showing multiple signatures (one per overload)
  • API compatibility: The runtime behavior remains the same - users can still compare enums to enums or enums to scalars exactly as before. Only the internal implementation and docstring format have changed
  • Test updates: The enum operator docstring tests were updated to accommodate the new multi-overload docstring format by checking that the docstring starts with the operation name and contains the expected signature(s) anywhere in the docstring (not necessarily at the start)

Rationale for optimizing py::enum_

While py::enum_ was declared deprecated in pybind11 v3.0.0 in favor of py::native_enum, many existing codebases still rely on py::enum_ and cannot be migrated overnight. Large projects with extensive enum usage require careful planning and testing to transition to py::native_enum. This optimization provides immediate performance benefits for these existing codebases during their migration period, reducing the performance gap between py::enum_ and py::native_enum from approximately 18x slower to approximately 9x slower (based on the benchmark results below). For new code, py::native_enum remains the recommended choice as it offers the best performance and is the long-term supported API.

Benchmark results

using https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47 (the current tip of the benchmark-updates branch):

Enum equality comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x != x'
Times are nsec/loop

M4 Mac, before: 165, 167, 166, 164, 167
Mac, after: 78.9, 78.9, 79.7, 79.9, 80.5

Enum ordering comparison
Command: python -m timeit --setup 'from pybind11_benchmark import MyEnum; x = MyEnum.ONE' 'x < x'

Mac, before: 170, 168, 168, 171, 168
Mac, after: 79.5, 78.8, 80.8, 81.3, 82.3

(i.e., no difference between != and <)

Compare to performance of calling a method of a simple pybinded class:
Command: python -m timeit --setup 'from pybind11_benchmark import MyInt; x = MyInt()' 'x.get()'

Mac: 54.6, 54.6, 54.9, 55.3, 55.3

Also compare to performance using a py::native_enum:
Command: python -m timeit --setup 'from pybind11_benchmark import MyNativeEnum; x = MyNativeEnum.THREE' 'x < x'

Mac: 9.12, 9.13, 9.2, 9.21, 9.34

(I note that the above benchmarks do have a tendency toward monotonically increasing times across runs, but that effect seems to be much smaller than the effect of the code changes.)

Code size:

  • the marginal code cost of 1 py::arithmetic enum_ before this PR as measured on my Mac by adding an extra enum to the pybind11_benchmark (specifically https://github.com/swolchok/pybind11_benchmark/tree/8a6f19d17c362dc2060dd8461b502b98c3226a47) was a little over 8 KiB of __text, plus some about 1000 bytes of __gcc_except_tab and negligible amounts in other sections. After this PR, the marginal cost increases to a little over 17000 bytes of __text, almost 2000 bytes of __gcc_except_tab, and a few hundred bytes in other sections. I believe @Skylion007 previously mentioned that this seemed like a reasonable order of magnitude of marginal cost.
  • interestingly, the baseline size of that commit of pybind11_benchmark had its size decrease: __text fell by about 12500 bytes and __gcc_except_tab fell by a little over 2000 bytes, though there were negligible size increases in other sections.
  • The second commit on this branch, entitled "outline call_impl to save on code size", is specifically a code size mitigation. It is not necessary for correctness and can be dropped if we don't feel it is worthwhile.

Suggested changelog entry:

  • Improve performance of operators for py::enum_s, though py::native_enum is still much faster.

…ementation

test_enum needs a patch because ops are now overloaded and this affects their docstrings.
This does cause more move constructions, as shown by the needed update to test_copy_move. Up to reviewers whether they want more code size or more moves.
@swolchok swolchok requested a review from henryiii as a code owner October 31, 2025 21:33
@swolchok
Copy link
Contributor Author

swolchok commented Nov 3, 2025

test failures look like they're caused by disagreement on how many move operations we're performing and are caused by the "outline call_impl to save on code size" commit specifically. I am unclear about how important it is to minimize the number of move operations we perform, so I've tentatively just added another commit that should make the tests work for C++17, and we can talk about what to do from here.

rwgk added 5 commits November 10, 2025 22:13
Add a static_assert to document and enforce that function_ref is
trivially copyable, ensuring safe pass-by-value usage. This also
documents the lifetime safety guarantees: function_ref is created
from cap->f which lives in the capture object, and is only used
synchronously within call_impl without being stored beyond its scope.
Undefine all enum operator macros after their last use to prevent
macro pollution and follow the existing code pattern. This matches
the cleanup pattern used for the previous enum operator macros.
Rename the macro to be more specific and avoid potential clashes with
public macros. The new name clearly indicates it's scoped to enum
operations and describes its purpose (throwing a type error).
Replace vague comments about 'extensions to <functional>' and 'functions'
with a clearer description that this is a header-only class template
similar to std::function but with non-owning semantics. This makes it
clear that it's template-only and requires no additional library linking.
@rwgk
Copy link
Collaborator

rwgk commented Nov 11, 2025

Hi @swolchok I used Cursor to review this PR. It generated the four added commits, which are all of the relatively minor and polishing kind:

  • 279c72a Add static assertion for function_ref lifetime safety in call_impl
  • a580cce Add #undef cleanup for enum operator macros
  • 0287ec6 Rename PYBIND11_THROW to PYBIND11_ENUM_OP_THROW_TYPE_ERROR
  • 35b4b8f Clarify comments in function_ref.h

I'm waiting for the CI to see if they work on all platforms.

I also added two sections to the PR description, to explain that there is a behavior change, and why we're still optimizing a deprecated type.

Could you please review my commits and the new sections in the PR description?

@swolchok
Copy link
Contributor Author

Could you please review my commits and the new sections in the PR description?

LGTM

@rwgk
Copy link
Collaborator

rwgk commented Nov 11, 2025

Could you please review my commits and the new sections in the PR description?

LGTM

Thanks! I think this is ready for merging, but ...

wrt my comment from yesterday:

Because of the behavior change, I think it's best to merge this PR only after the v3.0.2 patch release. I.e. I plan to merge this along with #5879 to start v3.1.0a0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants