Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-99360: Avoid exc=None; del exc in bytecode where unnecessary #99361

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

sweeneyde
Copy link
Member

@sweeneyde sweeneyde commented Nov 11, 2022

Functional changes:

  • DELETE_FAST now assumes its argument is bound; it no longer raises UnboundLocalError by itself.
  • When the state of a local is uncertain, del x compiles to LOAD_FAST_CHECK x; POP_TOP; DELETE_FAST x
  • At the end of an exception handler except E as exc: ..., the exc=None before the del exc
    is omitted when not needed.

@sweeneyde sweeneyde marked this pull request as ready for review November 15, 2022 02:28
Python/compile.c Outdated
expand_del_noerror(basicblock *entryblock, int none_oparg)
{
for (basicblock *b = entryblock; b != NULL; b = b->b_next) {
int room_needed = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function would be simpler if you use insert_instruction(). Is it for optimization that you aren't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was trying to avoid quadratic time, but I suppose that could only happen if one long basicblock had many DELETE_FAST_CHECK instructions, e.g.

def f(x):
    del x; del x; del x; del x; del x  
    del x; del x; del x; del x; del x  
    del x; del x; del x; del x; del x  

But such code is useless, so I'll try to clean this up.

The DELETE_NOERROR instructions should typically happen at basicblock boundaries, so I believe there shouldn't be any performance issues there.

Python/compile.c Outdated
break;
}
b->b_instr[i].i_opcode = del_opcode;
if (insert_instruction(b, i, &(struct instr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can avoid shifting here:

in compiler_nameop, add a couple of NOPs so that there is space for this expansion. If no expansion happens, the NOPs can be removed by a final remove_redundant_nops().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimize_cfg removes NOPs already, so we'd have to either introduce another pseudo-opcode or shuffle around assemble() a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a new pseudo instruction called BLANK that is ignored by the assembler (has size 0 and emits nothing).

@sweeneyde sweeneyde marked this pull request as draft November 21, 2022 07:34
@markshannon
Copy link
Member

Is it really worthwhile extending this to global, nonlocal and class-local variables? They will be vanishingly rare.
This seems like a lot of code for a relatively minor improvement.

I thought the original idea was to change the code for del local from DELETE_FAST to LOAD_FAST_CHECK; POP_TOP; DELETE_FAST, and the code cleaning up except Exception as var: to something like STORE_FAST_NONE_IF_NULL; DELETE_FAST
The existing pass that eliminates checks can then be modified to convert STORE_FAST_NONE_IF_NULL to NOP in the same way that it converts LOAD_FAST_CHECK to LOAD_FAST.

The peepholer can convert any residual LOAD_FAST; POP_TOP to NOP.
The assembler will need to convert STORE_FAST_NONE_IF_NULL to LOAD_CONST None; STORE_FAST.

@sweeneyde
Copy link
Member Author

Is it really worthwhile extending this to global, nonlocal and class-local variables? They will be vanishingly rare. This seems like a lot of code for a relatively minor improvement.

I agree that anything other than a local is not worth it, and the PR doesn't change compiler output except when deleting fast locals. I've updated the PR so that it's clearer that there are no changes to those other cases. Adding an extra ctx possibility into compiler_nameop was the cleanest implementation I could come up with to avoid duplicating the symtable logic.

Over half of the code added is tests and docs. The tests are checking the bytecode for using DEREF/GLOBAL/NAME for the exception name as well, since compile_nameop is implementing that deletion after this PR, rather than the callers of compile_nameop, even though there should be no change. Also, refactoring add_local_null_checks to include deletions actually makes it more understandable IMO.

I thought the original idea was to change the code for del local from DELETE_FAST to LOAD_FAST_CHECK; POP_TOP; DELETE_FAST, and the code cleaning up except Exception as var: to something like STORE_FAST_NONE_IF_NULL; DELETE_FAST The existing pass that eliminates checks can then be modified to convert STORE_FAST_NONE_IF_NULL to NOP in the same way that it converts LOAD_FAST_CHECK to LOAD_FAST.

The peepholer can convert any residual LOAD_FAST; POP_TOP to NOP. The assembler will need to convert STORE_FAST_NONE_IF_NULL to LOAD_CONST None; STORE_FAST.

The add_local_null_checks function does not convert LOAD_FAST_CHECK into LOAD_FAST, but the other way around: it switches from the unchecked version to the checked version where needed. This is easier to implement because any path through the code that leaves the local undefined is enough to switch to the unsafe version, so it's akin to

instr.checked = any(p.unsafe(local_i) for p in codepaths_to_instr)

Assuming this approach, there's a need for the dataflow analysis to turn NOP; DELETE_FAST into STORE_FAST_NONE_IF_NULL; DELETE_FAST. But then DELETE_FAST would have to turn into LOAD_FAST_CHECK;POP_TOP;DELETE_FAST or LOAD_CONST(None);STORE_FAST;DELETE_FAST, depending on in which context it was emitted (whether an unbound local should raise UnboundLocalError or not).

This PR's solution is to emit DELETE_FAST_NOERROR during code generation, let the dataflow analysis potentially mark it as DELETE_FAST_NOERROR_CHECK, then replace that with LOAD_CONST(None);STORE_FAST;DELETE_FAST.

It might be fine to move add_local_null_checks to before optimize_cfg, but the small downside is that different copies of inlined blocks must share the same dataflow analysis as each other, so locking in that order of doing things isn't ideal. This affects test.test_dis.test_from_traceback_dis, where one copy of a LOAD_FAST;RETURN_VALUE is converted to LOAD_FAST_CHECK and the other isn't.

Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the benefit this brings us is worth the additional complexity.

Is this still being considered for 3.13? Perhaps a future superoptimizer (#102869) could build on our already-existing liveness analysis for fast locals to achieve the same effect...

@carljm
Copy link
Member

carljm commented May 10, 2023

FWIW now that PEP 709 is merged, I've considered whether the same sub-scope isolation could be used for handling isolation of exception handler variables, instead of using a DELETE_FAST. I think it could, and wouldn't be too complex, and the resulting behavior would be nicer. But it would be a visible behavior change (after an except Exception as exc: clause the prior value of exc, if any, would be visible again, rather than becoming unbound again), and I'm not sure if that would rise to the level of requiring another PEP.

@brandtbucher
Copy link
Member

brandtbucher commented May 10, 2023

Eh, if anything that would be better than what we currently do. It is a bit weird, though.

It personally feels like a Discuss thread more than a PEP (unless it's super controversial)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants