Skip to content

Conversation

@a12k
Copy link
Contributor

@a12k a12k commented Jan 2, 2026

This PR fixes a missed optimization.

The function _Py_CallBuiltinClass_StackRefSteal uses STACKREFS_TO_PYOBJECTS. However, the code was not passing the PY_VECTORCALL_ARGUMENTS_OFFSET flag to the callee. This forced vector-callable types to reallocate and copy the arguments whenever they needed to prepend an argument.

Verification
Using LLDB to inspect long_vectorcall when reached via this path:

  • Before fix: nargsf = 1
  • After fix: nargsf = 0x8000000000000001 (High bit correctly set)

Benchmarks
I ran a small benchmark file allocating a class in a tight loop to trigger the path:

  • Baseline: 1.6193s
  • Optimized: 1.5560s
  • Result: ~3.9% speedup in this specific specialized call path.

gh-143361

Benchmark script
import time

class Worker:
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

def run_benchmark(iterations):
    cls = Worker
    start = time.perf_counter()
    for _ in range(iterations):
        cls(1, 2, 3)
        cls(4, 5, 6)
        cls(7, 8, 9)
        cls(10, 11, 12)
        cls(13, 14, 15)
    return time.perf_counter() - start

run_benchmark(10000) #warm it up

ITERS = 1_000_000
print(f"Time: {run_benchmark(ITERS):.4f}s")

</details>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this, this is internal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentally went back and forth on this. It's true that guideline is that "strictly internal changes with no user-visible effects" do not warrant an entry. My reasoning for inclusion was that a 3.9% speedup in a core code path is indeed significant enough to external, end-users that this change warrants an entry. It also seems to be how this sort of change in ceval.c. has been handled in the past. However, I’ll gladly defer to the guidance of the core developers who review the PR.

@Fidget-Spinner Fidget-Spinner merged commit b538c28 into python:main Jan 2, 2026
93 of 95 checks passed
@a12k a12k deleted the fix-vectorcall-offset-ceval branch January 2, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants