Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove LOAD_CLOSURE #105775

Closed
markshannon opened this issue Jun 14, 2023 · 20 comments
Closed

Remove LOAD_CLOSURE #105775

markshannon opened this issue Jun 14, 2023 · 20 comments
Labels
performance Performance or resource usage

Comments

@markshannon
Copy link
Member

markshannon commented Jun 14, 2023

LOAD_CLOSURE is identical to LOAD_FAST_CHECK in every way except its name and number.

The justification for its existence is that "We keep LOAD_CLOSURE so that the bytecode stays more readable.".
Which is insufficient justification to keep it given that it:

  • Uses an instruction, a limited resource which adds bulk to the interpreter
  • Prevents superinstruction formation.
  • Prevents removal of checks for uninitialized variables.

Linked PRs

@markshannon markshannon added the performance Performance or resource usage label Jun 14, 2023
@heysujal
Copy link

I'd like to work on this issue

@markshannon
Copy link
Member Author

Are you sure? This is more complicated than it looks as it involves some awkward compiler internals.

@heysujal
Copy link

Oh, I thought it would be just deleting some statements

@markshannon
Copy link
Member Author

It should be. But the compiler handles local variables in a somewhat convoluted way

@heysujal
Copy link

Ok, then someone else can pick the issue up.

@polynomialherder
Copy link
Contributor

I've been working through this -- I'll update if I have questions or get stuck

@polynomialherder
Copy link
Contributor

@markshannon My plan is to add a member to the _PyCfgInstruction and _PyCompile_Instruction struct types instruction called i_fc or i_fromclosure that indicates whether the instruction originated in a closure, and then add a macro LOAD_FAST_CLOSURE (or similar) for adding the LOAD_FAST instructions with that member set to 1. Then I'll replace the existing LOAD_CLOSURE calls with those appropriately marked LOAD_FAST calls. Then downstream functions like fix_cell_offsets can check that member before processing. Let me know if there are any obvious problems with this design, or if you have thoughts on better approaches

For this unit of work, do we want to preserve the current performance characteristics to reduce risk? For instance, I see that replacing LOAD_CLOSURE with LOAD_FAST would immediately impact some of the optimization functions in flowgraph.c like insert_superinstructions and fast_scan_many_locals that presently don't operate on LOAD_CLOSURE instructions. I'm not sure what the consequences would be of allowing those functions to operate on LOAD_FAST instructions that would have previously been LOAD_CLOSURE, perhaps it would "just work" -- or whether we should add a check of that i_fc member to preserve the current behavior

Let me know your thoughts on this -- I'm pretty new to the cpython source code so still piecing together the local variable lifecycle. Thanks!

@markshannon
Copy link
Member Author

Rather than add special cases to _PyCfgInstruction and _PyCompile_Instruction would it make more sense use the names of the variables as operands, rather than indices into two different lists?
The numbering can then be done in the assembler.

There is no reason to treat LOAD_CLOSURE differently to LOAD_FAST in the optimizer.

@carljm
Copy link
Member

carljm commented Jun 22, 2023

If you don't want to tackle a full refactoring of how local variable and cell offsets are handled in this PR, a simpler incremental approach would be to make LOAD_CLOSURE a pseudo-instruction (similar to e.g. STORE_FAST_MAYBE_NULL) which is emitted in codegen (and thus can still be recognized in fix_cell_offsets) but is converted to LOAD_FAST in _PyCfg_ConvertPseudoOps. This would eliminate LOAD_CLOSURE as a real instruction that has to be handled at runtime.

The refactor of cell index handling could still be done as a follow-up PR, allowing removal of the LOAD_CLOSURE pseudo-op.

@gvanrossum
Copy link
Member

(From your post to core-mentorship:)

When I first set up my cpython development environment, I followed the guidance on the developer’s guide and have been studying the compiler design page closely. From my reading of it, when I remove an opcode, I need to

  • Remove references to the opcode from the code and update any logic impacted
  • Remove the opcode from Lib/opcode.py and the dis module documentation
  • Run make regen-opcode regen-opcode-targets to regenerate Include/opcode.h and Python/opcode_targets.h
  • Update the magic number in Lib/importlib/_bootstrap_external.py and PC/launcher.c
  • "Add the new bytecode target"
  • Run make regen-importlib

The first step is probably most of the work, but I'm a little bit unclear about the second to last step.

The compiler design page in the developer guide has written

Changes to Lib/importlib/_bootstrap_external.py will take effect only after running make regen-importlib. Running this command before adding the new bytecode target to Python/ceval.c will result in an error. You should only run make regen-importlib after the new bytecode target has been added.

It makes sense to me that if we’re introducing a new opcode, we should ideally have logic to handle the new instructions in ceval.c, but it’s not clear to me what error would be triggered if we’re not referencing the new instruction, so I feel I’m not understanding something. I also don't see most opcodes referenced in ceval.c -- for instance, no LOAD_FAST, STORE_FAST, or JUMP_* instructions, at least not explicitly.

The instructions may well be out of date. (A PR to update them would be welcome of course!)

As of 3.12, the cases in the switch are generated by tooling in Tool/cases_generator (check out the README.md there). This reads Python/bytecodes.c and writes Python/generated_cases.c.h, which in #included by ceval.c in the switch body. IMPORTANT: Whenever you edit bytecodes.c, run make regen-cases to regenerate the output files.
likely that Carl's suggestion is right: make LOAD_CLOSURE a pseudo op (search for "pseudo" in bytecodes.c) so the compiler can distinguish between it and LOAD_FAST if it needs to, make the assembler translate it to LOAD_FAST. I think that it's totally fine if a sequence of e.g. LOAD_FAST, LOAD_CLOSURE ends up being replaced by a single LOAD_FAST_LOAD_FAST instruction (I think that's what Mark meant).

If you are still stuck, can you push your work to a branch in your GitHub fork of the cpython repo and link it here, and post the errors you are getting (and from which command)?

@polynomialherder
Copy link
Contributor

Thank you all for the tips, this helped a lot. I'll proceed with the approach you all suggested, will report back soon with progress

polynomialherder added a commit to polynomialherder/cpython that referenced this issue Jun 24, 2023
This change demotes LOAD_CLOSURE from an instruction to a
pseudo-instruction. This enables superinstruction formation,
removal of checks for uninitialized variables, and frees
up an instruction.
@polynomialherder
Copy link
Contributor

As far as automated tests go, would adding a test for _PyCfg_ConvertPseudoOps be worthwhile? Or is there a better way to validate this change?

@gvanrossum
Copy link
Member

Usually when messing around with the interpreter, once you get it to build it's pretty solid, and if you get it to pass the test suite, it's golden. It seems you've already reached that stage. :-) Platinum would be to pass the buildbots, esp. the refleak subset. IIUC only triagers and core devs can trigger a buildbot run (by setting a magic label on the PR), and will do so as part of the PR review. I can do that for you.

Whether a test for _PyCfg_ConvertPseudoOps is needed, maybe @iritkatriel has an opinion.

@iritkatriel
Copy link
Member

_PyCfg_ConvertPseudoOps can be tested as part of the assembler. See Lib/test/test_compiler_assemble.py, which calls _PyCompile_Assemble. This is pretty new, so there aren't many tests there yet.

polynomialherder added a commit to polynomialherder/cpython that referenced this issue Jun 25, 2023
@polynomialherder
Copy link
Contributor

Thanks @iritkatriel -- just to make sure before I proceed, are you suggesting that I expose _PyCfg_ConvertPseudoOps in the same way that _PyCompile_Assemble has been exposed in _testinternalcapi.c so that it can be called from Lib/test_compiler_assemble.py (perhaps via helpers in test.support.bytecode_helper)?

@iritkatriel
Copy link
Member

I wouldn't do that. I'd just add a test to Lib/test/test_compiler_assemble.py that creates a code object from an instruction sequence that contains the pseudo ops, and then checks that the code works as expected.

If the pseudo ops are not replaced the code object can't possibly work.

polynomialherder added a commit to polynomialherder/cpython that referenced this issue Jun 27, 2023
This change adds a unit test for assembly
of code objects containing the LOAD_CLOSURE
pseud-instruction

We also correct the documentation and NEWS
descriptions to more accurately reflect the
effects of converting LOAD_CLOSURE to a pseudo-op
and the reason for its existence in the first place
polynomialherder added a commit to polynomialherder/cpython that referenced this issue Jun 27, 2023
- Describe the LOAD_CLOSURE change using more neutral language
- Add back in deleted whitespace
- Add an assertion to
  IsolatedAssembleTests.test_expression_with_pseudo_instruction_load_closure
- Regenerate generated files
@polynomialherder
Copy link
Contributor

As I wrote in the Core-mentorship thread, I'll start to think now about how to refactor the cell/local variable tracking so that we can remove LOAD_CLOSURE altogether, unless anyone else is actively working on that piece right now. If anyone has any thoughts on how to proceed OTTOYH, let me know -- otherwise I'll ping back with questions

I'll also keep an eye on the PR which converts LOAD_CLOSURE to a pseudo-op and continue to make changes there as needed

gvanrossum pushed a commit that referenced this issue Jun 29, 2023
This enables super-instruction formation,
removal of checks for uninitialized variables,
and frees up an instruction.
@polynomialherder
Copy link
Contributor

I'm still working through this -- I spent the last week writing simple examples and stepping through their evaluation in gdb to understand how the localsplus array is constructed by the compiler and ultimately referenced during evaluation

Something that confused me previously is that co_cellvars refers to cell variables contributed by the object under consideration to the scopes of child objects -- unlike co_freevars which refers to variables contributed to the object under consideration by its parent scopes (but excluding global variables).

In terms of possible approaches, @carljm commented that we could either refactor the cell/local offsets logic so that we can remove LOAD_CLOSURE altogether, or consider adding explicit LOAD_CLOSURE support to some optimizations -- I was planning to proceed with the former approach, but interested in whether @carljm has more to say on the latter or if anyone else has any thoughts (@markshannon? @iritkatriel?)

@carljm
Copy link
Member

carljm commented Jul 6, 2023

In terms of possible approaches, @carljm #106059 (comment) that we could either refactor the cell/local offsets logic so that we can remove LOAD_CLOSURE altogether, or consider adding explicit LOAD_CLOSURE support to some optimizations

I would say that the former is the ideal forward-looking approach, but may be high difficulty/complexity; it should be possible, but I haven't looked at it in detail. The latter is more of a short-term approach if for performance reasons we urgently want LOAD_CLOSURE to participate better in some specific bytecode optimizations, but we haven't been able to achieve the former yet.

@markshannon
Copy link
Member Author

I don't think this is urgent.
If it is too awkward to implement now, it can wait.

We plan (vaguely) to change the compiler to use names, rather than indices for load instructions.
Once that change has been made, removing LOAD_CLOSURE should be almost trivial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

6 participants