Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single instruction "superinstructions" #585

Closed
markshannon opened this issue May 19, 2023 · 15 comments
Closed

Single instruction "superinstructions" #585

markshannon opened this issue May 19, 2023 · 15 comments

Comments

@markshannon
Copy link
Member

markshannon commented May 19, 2023

Superinstructions can be problematic, as they break the one-instruction-at-a-time model that instrumentation relies on, and are likely to cause problems in optimizers that make the same one-instruction-at-a-time assumption.

Instrumentation rewrites all superinstructions back to the simpler form.
Super instructions also prevent specialization, so superinstructions cannot include any specializable instruction.
#584 proposes a specialization of LOAD_CONST, so that leaves only LOAD_FAST and STORE_FAST to be combined.

If we remove LOAD_CONST we have only:

super(LOAD_FAST__LOAD_FAST) = LOAD_FAST + LOAD_FAST;
super(STORE_FAST__LOAD_FAST)  = STORE_FAST + LOAD_FAST;
super(STORE_FAST__STORE_FAST) = STORE_FAST + STORE_FAST;

That is still quite a lot of instructions, dynamically, so we don't want to just remove them.
Instead of removing them, we can combine them into a single instruction.
Given that most locals will have a index in range(16) we combine the operations into a single instruction.

inst(LOAD_FAST_LOAD_FAST_COMPACT, ( -- val1, val2)) {
    val1 = GETLOCAL(oparg&15);
    val1 = GETLOCAL(oparg>>4);
    Py_INCREF(val1);
    Py_INCREF(val2);
}

There will be fewer LOAD_FAST_LOAD_FAST_COMPACT than LOAD_FAST__LOAD_FAST, but they should be a little faster (fewer memory reads). I would expect the performance to be about the same.

@brandtbucher
Copy link
Member

Just a note from our previous discussions for anyone else reading: this is only legal if both instructions are on the same line. When I tried a similar idea out before, this caused almost all of the STORE_FAST__LOAD_FAST superinstructions to disappear (since they basically always span more than one line).

@markshannon
Copy link
Member Author

Instructions don't really have lines, effects do. Since LOAD_FAST has no observable side effects, it can be moved from line to line.
Consider

a = 1
return b

which compiles to:

1     LOAD_CONST               1 (1)
      STORE_FAST               0 (a)

2     LOAD_FAST                1 (b)
      RETURN_VALUE

Which implies that we cannot merge the STORE_FAST and LOAD_FAST. But as LOAD_FAST has no observable side effects, this is a legal translation:

1     LOAD_CONST               1 (1)
      STORE_FAST               0 (a)

      LOAD_FAST                1 (b)
2     RETURN_VALUE

Which can be transformed to:

1     LOAD_CONST               1 (1)
      STORE_FAST_LOAD_FAST_COMPACT  0 (a), 1 (b)

2     RETURN_VALUE

@markshannon
Copy link
Member Author

Unfortunately, having superinstructions straddling lines prevents jumping to the line in question in a debugger, as the stack is inconsistent, which rules out the transformation above.

However, it appears that the reduction in code size more than compensates.

@carljm
Copy link

carljm commented Jun 2, 2023

which rules out the transformation above.

Does this mean that in general we have to consider the state of the value stack as an "observable side effect" when making optimizations in the compiler? That seems new (relative to my previous understanding of what was safe to optimize), and unfortunate.

@brandtbucher
Copy link
Member

brandtbucher commented Jun 2, 2023

Well, I would assume that jumping to a line would jump just prior to execution of every instruction on that line. It would be weird, I think, if a local variable load or store didn't happen after jumping to a line and and stepping through it.

@brandtbucher
Copy link
Member

In the example above, I would expect that if I had two lines foo = 42 and return foo, jumping over the first line would return the prior value of foo (or None if unbound). You think that frame_setlineno should be allowed to refuse the jump?

@carljm
Copy link

carljm commented Jun 2, 2023

I'm not proposing anything in particular; frame_setlineno is code I hadn't looked at before and didn't realize existed :) I've never used the capability to jump directly to a line in a debugger. So I'm just newly wrapping my head around the implications for optimization (and thinking through whether I know of any existing compiler optimizations that could already break this.)

It seems this does mean that LOAD_FAST (or anything that pushes to stack) isn't generally safe to shift to a different line. So for example this seems to invalidate the proposed superoptimizer design in python/cpython#102869

@brandtbucher
Copy link
Member

frame_setlineno is code I hadn't looked at before and didn't realize existed :)

Lucky! ;)

@iritkatriel
Copy link
Collaborator

Do we know anyone who uses this evil function?

@JelleZijlstra
Copy link
Contributor

Here's a third-party debugger that uses it: https://github.com/web2py/web2py/blob/7685d373474378e93132f8916145fb11f84cec71/gluon/contrib/dbg.py#L307. pdb also supports it (https://github.com/python/cpython/blob/e01b04c9075c6468ed57bc883693ec2a06a6dd8e/Lib/pdb.py#L1202).

Seems like we should seriously consider deprecating support for this feature though; it feels very rarely useful and very hard to support.

@brandtbucher
Copy link
Member

brandtbucher commented Jun 2, 2023

I mean, I'm pretty sure every Python debugger has a jump command, so we'd probably hear from them first if we removed it.

If we convinced them to drop the functionality too, then maybe we'd find out how many users are really relying on it. There are at least a handful that have filed various bug reports for CPython over the years.

I'm stuck somewhere between "I can see how it could be useful" and "it's too much pain to maintain and doesn't always work"... though, I have admittedly used jump commands in pdb a few times since learning about them (mostly if I mess up my debugging session somehow and don't want to re-run everything, but also sometimes to take a different branch or something).

@iritkatriel
Copy link
Collaborator

I know debuggers support it. But do we know of any people who use it?

@brandtbucher
Copy link
Member

Not counting people who use the feature via debuggers?

I mean, you can only do it from within a trace function, so probably just a handful of people writing toys like this:

import sys

def goto(line: int) -> None:
    def jumper(frame, event, arg) -> None:
        if event == "line":
            frame.f_lineno = line
            frame.f_trace = None
    frame = sys._getframe(1)
    frame.f_trace = jumper
    sys.settrace(jumper)

# Prints "B" in an infinite loop:
goto(15)    # 13
print("A")  # 14
print("B")  # 15
goto(13)    # 16
print("C")  # 17

@gvanrossum
Copy link
Collaborator

There are already many complicated prohibitions in the f_lineno setattr -- there is no guarantee that it will work. So I don't believe it ought to be used as a reason to skip line-fusing optimizations.

Additionally, stepping through code already frequently jumps back and forth in ways that are hard to grasp intuitively. Again, if optimizations cause the sequence of observed "line events" to change, that feels like a minor inconvenience at best. (And even though PEP 626 claims otherwise, its motivation feels arbitrary to me and its specification is rather vague: "Line events and the f_lineno attribute should act as an experienced Python user would expect.")

@markshannon
Copy link
Member Author

Let's not remove jumping in debuggers, or other features, that make our work more difficult interesting.

Many features that prevent optimizations in the bytecode compiler can be handled in the tier 1 optimizer.
Many features that prevent optimizations in the tier 1 optimizer, can be handled in the tier 2 optimizer.
In the tier 2 optimizer any call to a builtin function is a potential deoptimization. A single call to PyObject_GetAttr can do anything, so it makes almost no difference what features are available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants