[Proposal] Abstract interpretation algorithm for optimization of tier 2 uops #615

Fidget-Spinner · 2023-08-18T16:36:40Z

Sub-issue of #559

Please see our proposal for optimizations to tier 2 uops https://docs.google.com/presentation/d/1jyqyO-OxF53NzPYZqkTzfCfCbzX2BC4Vo-BP5Da3zdk/

Proposed optimizations to be performed:

Partial evaluation
Dead store elimination
Common subexpression elimination
Global value numbering
Constant propagation
Type propagation
Guard elimination

gvanrossum · 2023-08-18T22:33:35Z

I see that you are proposing a DSL change like this:

op(_BINARY_OP_MULTIPLY_INT,
           (unused/1, left: &PyLong_Type, right: &PyLong_Type -- res: &PyLong_Type),
               flags=[
                   pure,
               ],
               symbolic=ADD_INT(left, right)
           )

I'm not sure what syntax you are proposing for types that allows &PyLong_Type, but assuming this just means it's a Python int, I propose that you use PyLongObject* instead, because that maps directly to what the code generator would/could generate for Tier 1. In fact we could write this today (I added the postfix * operator recently):

        op(_GUARD_BOTH_INT, (left: PyLongObject*, right: PyLongObject* -- left: PyLongObject*, right: PyLongObject*)) {
            ...
        }
        op(_BINARY_OP_ADD_INT, (unused/1, left: PyLongObject*, right: PyLongObject* -- res)) {
            ...
        }

(Cross-ref: #611)

gvanrossum · 2023-08-18T22:35:37Z

Continuing on that DSL proposal, why flags=[pure] and not pure=true?

And where is ADD_INT(left, right) supposed to be defined? Won't moving the definition there from bytecodes.c just move code without adding any real power?

iritkatriel · 2023-08-18T22:52:55Z

Continuing on that DSL proposal, why flags=[pure] and not pure=true?

It seem like a good idea to have syntax for hard-coding values of metadata flags. We will probably have more use cases for this, like irregular bytecodes where the derivation logic isn't working well.

gvanrossum · 2023-08-18T22:58:58Z

In your GDoc about reading the paper you write

Instead of updating the symbolic expressions by doing data-flow analysis through the loops, instead express the symbolic relations as recursive definitions (see figure on the right). Xl2 is defined recursively. These are called cyclic term graphs (symbolic terms that can refer to themselves)

I do not understand what is happening here. I saw this in the YouTube video and didn't get it there either. How to they get from

(l2) x +-> x[l2]
  x = x + 1
(l3) x +-> x[l2] + 1

to recognizing the fixpoint?

gvanrossum · 2023-08-19T01:21:25Z

Continuing on that DSL proposal, why flags=[pure] and not pure=true?

It seem like a good idea to have syntax for hard-coding values of metadata flags. We will probably have more use cases for this, like irregular bytecodes where the derivation logic isn't working well.

That's another argument for pure=true -- we might very well want to say "set HAS_ARG to false" and the flags=[flag1,flag2,...] form doesn't easily extend to that, whereas has_arg=false comes out very naturally.

Fidget-Spinner · 2023-08-19T08:24:40Z

In your GDoc about reading the paper you write

Instead of updating the symbolic expressions by doing data-flow analysis through the loops, instead express the symbolic relations as recursive definitions (see figure on the right). Xl2 is defined recursively. These are called cyclic term graphs (symbolic terms that can refer to themselves)

I do not understand what is happening here. I saw this in the YouTube video and didn't get it there either. How to they get from
(l2) x +-> x[l2]
  x = x + 1
(l3) x +-> x[l2] + 1
to recognizing the fixpoint?

I cannot claim to understand this part fully as well (I did not pay too much attention to it as we won't need to do this for CPython unless we plan to do loop invariant code motion as well). However, here's what I understand,

At location $l2$, it realizes it has visited this location before, it then looks at all incoming edges and creates the $\phi$ node of the incoming edges. In that case, the incoming edge (from before the loop) ,is xl = xl2 = 0, and the other incoming edge, from the backward edge, is x = xl2 + 1 The phi of both is $\phi(0, xl2 + 1)$, which is the recursive definition.

Continuing on that DSL proposal, why flags=[pure] and not pure=true?

It seem like a good idea to have syntax for hard-coding values of metadata flags. We will probably have more use cases for this, like irregular bytecodes where the derivation logic isn't working well.

That's another argument for pure=true -- we might very well want to say "set HAS_ARG to false" and the flags=[flag1,flag2,...] form doesn't easily extend to that, whereas has_arg=false comes out very naturally.

Hmm this does look cleaner. I don't mind doing this first, then if we have too many flags, use the list method.

And where is ADD_INT(left, right) supposed to be defined? Won't moving the definition there from bytecodes.c just move code without adding any real power?

I plan to use the abstract interpreter to create the symbolic expression nodes. The symbolic expressions will use tagged unions (just a field set in a struct indicating its real type). So that can be read as "construct a tagged union of type ~~ADD_INT~~ (oops wrong one) _BINARY_OP_MULTIPLY_INT with the symbolic operands left and right on the abstract stack). I will add that to the slides.

Fidget-Spinner · 2023-08-19T12:46:10Z

I'm not sure what syntax you are proposing for types that allows &PyLong_Type, but assuming this just means it's a Python int, I propose that you use PyLongObject* instead, because that maps directly to what the code generator would/could generate for Tier 1. In fact we could write this today (I added the postfix * operator recently):

This is for convenience during the analysis phase, where we want to check Py_TYPE(x) == &PyLong_Type for example.

gvanrossum · 2023-08-21T00:04:24Z

This is for convenience during the analysis phase, where we want to check Py_TYPE(x) == &PyLong_Type for example.

Oh, interesting. It seems we have somewhat conflicting goals for the type attribute of stack effects. Maybe we can introduce a convention where if the type is PyFooObject * the type field is assumed to be &PyFoo_Type?

gvanrossum · 2023-08-21T03:31:05Z

Regarding my question about loops and your answer, I think I get it now.

At the point (l2) where the loop goes back to previous control flow, at the start of the loop we just have x = 0. After the first iteration we have x = phi(0, 0+1). But because this is a phi node, it gets a new name, and we write x = xl2 (understanding that xl2 stands for phi(0, 0+1)). Then we continue with the next iteration; at the bottom of the loop (at l3) we get x = xl2+1. (Not x = xl3, because there's no new phi here.) But then we loop around to l2, where we do have a merge point, so we need a phi, and we write x = xl2, but now xl2 = phi(0, xl2+1). Because x = xl2 is something we saw before at this point, the algorithm realizes it has reached a fixpoint and it stops.

The crux here is that we don't write out the phi functions in the symbolic store; instead of x = phi(0, 0+1) we write x = xl2 in the symbolic store, whereas the term graph holds the definition of xl2 etc., in this case {xl2 = phi(0, xl2+1)}. This distinction between abstract store and term graph is introduced a bit earlier in the talk, with the goal of naming terms (initially introducing x0, x1, ... instead of xl0, xl1, ... -- the latter refinement is needed for loops).

Watching a bit more of the video, what strikes me is that the abstract interpretation (if there are loops) may actually iterate an indefinite number of times, until the fixpoints are all found. I think in our case things will be simpler, because (at least initially) we only attempt to optimize one superblock at a time. If this superblock contains JUMP_TO_TOP uops there is a phi node at the very start (with as many terms as there are JUMP_TO_TOP uops, plus one), but we can just assume that at the top all variables have arbitrary contents, so we don't even need that phi, and everything becomes much simpler. Right? At least for the current semester's work.

It does concern me a bit that the talk has an Open Question on whether their approach can outperform traditional approaches. Although in our case we only need to be faster than PEP 659. :-)

Trivia question: what does the "square U" symbol mean? It is used in some slide titles.

gvanrossum · 2023-08-21T05:20:50Z

@Fidget-Spinner Another question: How do your current plans compare to the plans that Mark wrote up back in March or so, gathered under the label epic-tier2-optimizer?

Fidget-Spinner · 2023-08-21T06:44:52Z

Trivia question: what does the "square U" symbol mean? It is used in some slide titles.

From the paper: "the join operation ⊔ℓ, which merges abstract stores at control-flow join points such as ℓ3".

@Fidget-Spinner Another question: How do your current plans compare to the plans that Mark wrote up back in March or so, gathered under the label epic-tier2-optimizer?

They are complementary. Our proposal concerns only the "superblock optimization:" partial evaluation and specialization passes. It will generate the same code, if not better code than just traditional partial evaluation. The other parts about superblock management, creation, etc. are unaffected.

mlemerre · 2023-08-24T23:25:25Z

Hi, @Fidget-Spinner told me that you were discussing ideas from my paper here. I am very happy that these ideas could find a widespread use! I took the liberty of jumping in the conversation to introduce myself and add some comments; don't hesitate to ask me if you have further questions about the paper!

Regarding my question about loops and your answer, I think I get it now. [...]

Your understanding is correct.

It does concern me a bit that the talk has an Open Question on whether their approach can outperform traditional approaches. Although in our case we only need to be faster than PEP 659. :-)

The main potential performance problem in my opinion is the worst-case complexity when you have to do a fixpoint iteration. The traditional approach is more "pessimistic" and thus converges faster (but would be less precise in that it would initially introduce more phi nodes). It seems to me that your loop-free "superblocks" should not suffer from this potential performance issue.

Fidget-Spinner · 2024-09-01T14:17:20Z

Done. We have a abstract-interpretation based optimizer on stack bytecode now.

Fidget-Spinner added the epic-tier2-optimizer Linear code region optimizer for 3.13 and beyond. label Aug 18, 2023

gvanrossum changed the title ~~[Proposal] Abstract interepretation algorithm for optimization of tier 2 uops~~ [Proposal] Abstract interpretation algorithm for optimization of tier 2 uops Aug 21, 2023

Fidget-Spinner closed this as completed Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Abstract interpretation algorithm for optimization of tier 2 uops #615

[Proposal] Abstract interpretation algorithm for optimization of tier 2 uops #615

Fidget-Spinner commented Aug 18, 2023 •

edited

Loading

gvanrossum commented Aug 18, 2023

gvanrossum commented Aug 18, 2023

iritkatriel commented Aug 18, 2023

gvanrossum commented Aug 18, 2023

gvanrossum commented Aug 19, 2023

Fidget-Spinner commented Aug 19, 2023 •

edited

Loading

Fidget-Spinner commented Aug 19, 2023

gvanrossum commented Aug 21, 2023

gvanrossum commented Aug 21, 2023

gvanrossum commented Aug 21, 2023

Fidget-Spinner commented Aug 21, 2023

mlemerre commented Aug 24, 2023

Fidget-Spinner commented Sep 1, 2024

[Proposal] Abstract interpretation algorithm for optimization of tier 2 uops #615

[Proposal] Abstract interpretation algorithm for optimization of tier 2 uops #615

Comments

Fidget-Spinner commented Aug 18, 2023 • edited Loading

gvanrossum commented Aug 18, 2023

gvanrossum commented Aug 18, 2023

iritkatriel commented Aug 18, 2023

gvanrossum commented Aug 18, 2023

gvanrossum commented Aug 19, 2023

Fidget-Spinner commented Aug 19, 2023 • edited Loading

Fidget-Spinner commented Aug 19, 2023

gvanrossum commented Aug 21, 2023

gvanrossum commented Aug 21, 2023

gvanrossum commented Aug 21, 2023

Fidget-Spinner commented Aug 21, 2023

mlemerre commented Aug 24, 2023

Fidget-Spinner commented Sep 1, 2024

Fidget-Spinner commented Aug 18, 2023 •

edited

Loading

Fidget-Spinner commented Aug 19, 2023 •

edited

Loading