Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-104635: Expand optimization for dead store elimination #106571

Closed
wants to merge 9 commits into from

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Jul 9, 2023

Explain

We might also want to expand this so it can look more than one instruction ahead for another write to the same location, skipping over an allowlist of instructions that we know don't read that same local, don't execute arbitrary code, and can't raise an exception. E.g. LOAD_FAST of a different local, POP_TOP, NOP, probably others.

This PR is motivated by @carljm 's comment from #105040 (comment)
The limitation of #105320 is that the optimization only covers the next redundant opcode, but with this patch, the optimization can be applied to more than one instruction ahead for another write to the same location.

Benchmark

import pyperf

runner = pyperf.Runner()
runner.timeit(name="bench dead_store",
              stmt="""
_, a, b, c, _ = 1, 2, 3, 4, 5
""")
Mean +- std dev: [base] 13.1 ns +- 0.2 ns -> [opt] 12.6 ns +- 0.2 ns: 1.04x faster

TODO (Done)

  • Fix test_dis.py
  • Add testcase for this optimization.

@corona10 corona10 changed the title gh-104635: Extend optimization for dead store elimination [WIP] gh-104635: Extend optimization for dead store elimination Jul 9, 2023
@corona10
Copy link
Member Author

corona10 commented Jul 9, 2023


./_bootstrap_python ./Programs/_freeze_module.py runpy ./Lib/runpy.py Python/frozen_modules/runpy.h
Assertion failed: (OPCODE_HAS_ARG(opcode) || HAS_TARGET(opcode) || oparg == 0), function instr_sequence_addop, file compile.c, line 243.
make: *** [Python/frozen_modules/runpy.h] Abort trap: 6
make: *** Waiting for unfinished jobs....

Interesting crash; this crash was not triggered at my local machine. I will take a look.

-> Done

@corona10 corona10 changed the title [WIP] gh-104635: Extend optimization for dead store elimination gh-104635: Extend optimization for dead store elimination Jul 9, 2023
@corona10 corona10 marked this pull request as ready for review July 9, 2023 16:06
@corona10 corona10 marked this pull request as draft July 9, 2023 16:07
@corona10 corona10 marked this pull request as ready for review July 9, 2023 17:56
@corona10
Copy link
Member Author

corona10 commented Jul 9, 2023

@carljm Please take a look :)

@corona10 corona10 changed the title gh-104635: Extend optimization for dead store elimination gh-104635: Expand optimization for dead store elimination Jul 9, 2023
@corona10
Copy link
Member Author

corona10 commented Jul 10, 2023

Umm, this patch is missing the following case. (same line, multi variable multi target case)
I will improve for such case, do not merge yet :)


a, a, b, b = 1, 2, 3, 4

@corona10
Copy link
Member Author

corona10 commented Jul 10, 2023

@carljm Ready to review, I wanted the linear time complexity for the first time, but it will require more.

@corona10
Copy link
Member Author

>>> def f():
...     a = 42; b = a + 54; a = 54
...     return a, b
...
>>> f()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in f
UnboundLocalError: cannot access local variable 'a' where it is not associated with a value

Found more edge case, I will close the PR. We may need some kinds of use-def analysis

@markshannon
Copy link
Member

I'm still unconvinced that this can do anything that a superoptimizer could not do.

_, a, b, c, _ = 1, 2, 3, 4, 5 is obviously equivalent to a, b, c, _ = 2, 3, 4, 5. A superoptimizer would produce an optimal sequence of code for this.

If you are only looking at linear sequences of instructions that are on a single line then, almost by definition, a superoptimizer will do at least as well.

@corona10
Copy link
Member Author

If you are only looking at linear sequences of instructions that are on a single line then, almost by definition, a superoptimizer will do at least as well.

Yeah, I agree if the super-optimizer is added, I expect that most of the cases will be covered, but I just tried it because the super-optimizer infra does not exist at this moment.

More digging will be more expensive than I expect, so I am not going to cover those cases at this moment.

(At least, we may need to add use-def analysis, but it should be compared to adding a super-optimizer process)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants