Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Apr 5, 2024

Some uops generate large templates for the JIT, and the hypothesis is that large code makes the JIT run more slowly. This "externalizes" _INIT_CALL_BOUND_METHOD_EXACT_ARGS, _INIT_CALL_PY_EXACT_ARGS, _COPY_FREE_VARS, and _SET_FUNCTION_ATTRIBUTE. See the original issue #117224 for an explanation of how these specific uops were chosen.

The code generator has a new annotation externalize to mark uops to be moved to their own function.

The code generator now produces two new files:

  • Python/executor_externals.c contains the body of the uops as functions
  • Include/internal/pycore_executor_externals.h contains the declarations for those functions

The generated uop code is then changed to simply call to one of these functions.

@mdboom mdboom marked this pull request as draft April 5, 2024 18:25
@mdboom
Copy link
Contributor Author

mdboom commented Apr 5, 2024

Converting to draft. On a second run of the benchmarks, this seems to have no effect, though it was 1% faster yesterday. So not a "clear win". Given that, it's probably not worth adding this complexity to the generator etc. But I'll leave this up for a while in case @brandtbucher, @markshannon, or anyone else see any opportunities for improvement.

@mdboom
Copy link
Contributor Author

mdboom commented May 28, 2024

Closing -- this seemed to be a dead end (at least for now).

@mdboom mdboom closed this May 28, 2024
@zooba
Copy link
Member

zooba commented May 29, 2024

Did any of your testing check whether the functions were being immediately inlined back in? I believe the usual preference for compilers is to inline functions with only a single call site, and once you're running PGO they should all quite happily inline across source units.

It feels unlikely that the call overhead is being perfectly balanced by the smaller code size in the main loop. Possible, but I'd expect to see the needle move one way or the other.

@mdboom
Copy link
Contributor Author

mdboom commented May 29, 2024

This was all for the benefit of the copy-and-patch JIT and measured there. So PGO/LTO doesn't apply, and I was confirming that the bytecode templates were getting smaller (indicating the functions were not being inlined).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants