Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-105481: Generate the opcode lists in dis from data extracted from bytecodes.c #106758

Merged
merged 30 commits into from Jul 18, 2023

Conversation

iritkatriel
Copy link
Member

@iritkatriel iritkatriel commented Jul 14, 2023

The generated lists are more correct than the (old) hard-coded ones, which were not maintained properly.

For example:

Old:

>>> [opcode.opname[op] for op in opcode.haslocal]
['LOAD_FAST', 'STORE_FAST', 'DELETE_FAST', 'LOAD_FAST_CHECK', 'LOAD_FAST_AND_CLEAR', 'STORE_FAST_MAYBE_NULL', 'LOAD_CLOSURE']

New:

>>> [opcode.opname[op] for op in opcode.haslocal]
['LOAD_FAST', 'STORE_FAST', 'DELETE_FAST', 'LOAD_FAST_CHECK', 'MAKE_CELL', 'LOAD_DEREF', 'STORE_DEREF', 'DELETE_DEREF', 'LOAD_FAST_AND_CLEAR', 'LOAD_FAST_LOAD_FAST', 'STORE_FAST_LOAD_FAST', 'STORE_FAST_STORE_FAST', 'LOAD_FROM_DICT_OR_DEREF', 'STORE_FAST_MAYBE_NULL', 'LOAD_CLOSURE']

@iritkatriel
Copy link
Member Author

The generated lists are more correct than the (old) hard-coded ones, which were not maintained properly.

For example:

Old:

>>> [opcode.opname[op] for op in opcode.haslocal]
['LOAD_FAST', 'STORE_FAST', 'DELETE_FAST', 'LOAD_FAST_CHECK', 'LOAD_FAST_AND_CLEAR', 'STORE_FAST_MAYBE_NULL', 'LOAD_CLOSURE']

New:

>>> [opcode.opname[op] for op in opcode.haslocal]
['LOAD_FAST', 'STORE_FAST', 'DELETE_FAST', 'LOAD_FAST_CHECK', 'MAKE_CELL', 'LOAD_DEREF', 'STORE_DEREF', 'DELETE_DEREF', 'LOAD_FAST_AND_CLEAR', 'LOAD_FAST_LOAD_FAST', 'STORE_FAST_LOAD_FAST', 'STORE_FAST_STORE_FAST', 'LOAD_FROM_DICT_OR_DEREF', 'STORE_FAST_MAYBE_NULL', 'LOAD_CLOSURE']

Maybe in this case actually the new list is incorrect - all the opcodes with DEREF seem to belong to hasfree rather than haslocal:

[opcode.opname[op] for op in opcode.hasfree]
['MAKE_CELL', 'LOAD_DEREF', 'STORE_DEREF', 'DELETE_DEREF', '<148>', 'LOAD_FROM_DICT_OR_DEREF']

Lib/opcode.py Outdated Show resolved Hide resolved
Lib/opcode.py Outdated

opname = ['<%r>' % (op,) for op in range(MAX_PSEUDO_OPCODE + 1)]
for op, i in opmap.items():
opname[i] = op

# _opcode may not be ready during early stages of the build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that separate from Larry's limerick at the top?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same thing. I'm planning to try and get rid of this issue - figure out what is needed in the build and separate it out somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot we've gotta keep a limerick in. :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it in the stable ABI?

Tools/cases_generator/generate_cases.py Outdated Show resolved Hide resolved
@gvanrossum
Copy link
Member

The generated lists are more correct than the (old) hard-coded ones, which were not maintained properly.
For example:
Old:

>>> [opcode.opname[op] for op in opcode.haslocal]
['LOAD_FAST', 'STORE_FAST', 'DELETE_FAST', 'LOAD_FAST_CHECK', 'LOAD_FAST_AND_CLEAR', 'STORE_FAST_MAYBE_NULL', 'LOAD_CLOSURE']

New:

>>> [opcode.opname[op] for op in opcode.haslocal]
['LOAD_FAST', 'STORE_FAST', 'DELETE_FAST', 'LOAD_FAST_CHECK', 'MAKE_CELL', 'LOAD_DEREF', 'STORE_DEREF', 'DELETE_DEREF', 'LOAD_FAST_AND_CLEAR', 'LOAD_FAST_LOAD_FAST', 'STORE_FAST_LOAD_FAST', 'STORE_FAST_STORE_FAST', 'LOAD_FROM_DICT_OR_DEREF', 'STORE_FAST_MAYBE_NULL', 'LOAD_CLOSURE']

Maybe in this case actually the new list is incorrect - all the opcodes with DEREF seem to belong to hasfree rather than haslocal:

[opcode.opname[op] for op in opcode.hasfree]
['MAKE_CELL', 'LOAD_DEREF', 'STORE_DEREF', 'DELETE_DEREF', '<148>', 'LOAD_FROM_DICT_OR_DEREF']

Ugh. Worse, now LOAD_CLOSURE is gone, it’s ambiguous. Who uses this? How? (Only dis.py?)

@iritkatriel
Copy link
Member Author

Ugh. Worse, now LOAD_CLOSURE is gone, it’s ambiguous. Who uses this? How? (Only dis.py?)

LOAD_CLOSURE is just an alias (pseudo op) for LOAD_FAST. It's in the haslocal list in the old and new version, no change there.

@iritkatriel iritkatriel marked this pull request as draft July 14, 2023 21:34
@iritkatriel iritkatriel changed the title gh-105481: add haslocal to _opcode.py. Generate most oplists in opcode.py with the functions in _opcode.py gh-105481: add haslocal/hasfree to _opcode.py. Generate most oplists in opcode.py with the functions in _opcode.py Jul 14, 2023
@iritkatriel
Copy link
Member Author

Now it's like this:

Old:

>>> sorted([opcode.opname[op] for op in opcode.hasfree])
['<148>', 'DELETE_DEREF', 'LOAD_DEREF', 'LOAD_FROM_DICT_OR_DEREF', 'MAKE_CELL', 'STORE_DEREF']
>>> sorted([opcode.opname[op] for op in opcode.haslocal])
['DELETE_FAST', 'LOAD_CLOSURE', 'LOAD_FAST', 'LOAD_FAST_AND_CLEAR', 'LOAD_FAST_CHECK', 'STORE_FAST', 'STORE_FAST_MAYBE_NULL']

New:

>>> sorted([opcode.opname[op] for op in opcode.hasfree])
['DELETE_DEREF', 'LOAD_DEREF', 'LOAD_FROM_DICT_OR_DEREF', 'MAKE_CELL', 'STORE_DEREF']
>>> sorted([opcode.opname[op] for op in opcode.haslocal])
['DELETE_FAST', 'LOAD_CLOSURE', 'LOAD_FAST', 'LOAD_FAST_AND_CLEAR', 'LOAD_FAST_CHECK', 'LOAD_FAST_LOAD_FAST', 'STORE_FAST', 'STORE_FAST_LOAD_FAST', 'STORE_FAST_MAYBE_NULL', 'STORE_FAST_STORE_FAST']

The "<148>" is because in opcode.py in main we have hasfree.append(148) even though there is no such opcode.

In the haslocal list, we see that the old list is missing the super instructions.

@iritkatriel
Copy link
Member Author

For the other lists:

Old:

>>> sorted([opcode.opname[op] for op in opcode.hasconst])
['KW_NAMES', 'LOAD_CONST', 'RETURN_CONST']

New:

>>> sorted([opcode.opname[op] for op in opcode.hasconst])
['INSTRUMENTED_RETURN_CONST', 'KW_NAMES', 'LOAD_CONST', 'RETURN_CONST']

old is missing the INSTRUMENTED_*.

Old:

>>> sorted([opcode.opname[op] for op in opcode.hasname])
['DELETE_ATTR', 'DELETE_GLOBAL', 'DELETE_NAME', 'IMPORT_FROM', 'IMPORT_NAME', 'LOAD_ATTR', 'LOAD_FROM_DICT_OR_GLOBALS', 'LOAD_GLOBAL', 'LOAD_METHOD', 'LOAD_NAME', 'LOAD_SUPER_ATTR', 'LOAD_SUPER_METHOD', 'LOAD_ZERO_SUPER_ATTR', 'LOAD_ZERO_SUPER_METHOD', 'STORE_ATTR', 'STORE_GLOBAL', 'STORE_NAME']

New:

>>> sorted([opcode.opname[op] for op in opcode.hasname])
['DELETE_ATTR', 'DELETE_GLOBAL', 'DELETE_NAME', 'IMPORT_FROM', 'IMPORT_NAME', 'LOAD_ATTR', 'LOAD_FROM_DICT_OR_GLOBALS', 'LOAD_GLOBAL', 'LOAD_METHOD', 'LOAD_NAME', 'LOAD_SUPER_ATTR', 'LOAD_SUPER_METHOD', 'LOAD_ZERO_SUPER_ATTR', 'LOAD_ZERO_SUPER_METHOD', 'STORE_ATTR', 'STORE_GLOBAL', 'STORE_NAME']

Old is missing LOAD_ZERO_SUPER_ATTR and LOAD_ZERO_SUPER_METHOD.

Old:

>>> sorted([opcode.opname[op] for op in opcode.hasjrel])
['FOR_ITER', 'JUMP', 'JUMP_BACKWARD', 'JUMP_BACKWARD_NO_INTERRUPT', 'JUMP_FORWARD', 'JUMP_NO_INTERRUPT', 'POP_JUMP_IF_FALSE', 'POP_JUMP_IF_NONE', 'POP_JUMP_IF_NOT_NONE', 'POP_JUMP_IF_TRUE', 'SEND']

New:

>>> sorted([opcode.opname[op] for op in opcode.hasjrel])
['ENTER_EXECUTOR', 'FOR_ITER', 'JUMP', 'JUMP_BACKWARD', 'JUMP_BACKWARD_NO_INTERRUPT', 'JUMP_FORWARD', 'JUMP_NO_INTERRUPT', 'POP_JUMP_IF_FALSE', 'POP_JUMP_IF_NONE', 'POP_JUMP_IF_NOT_NONE', 'POP_JUMP_IF_TRUE', 'SEND']

New has ENTER_EXECUTOR as a jump.

For hasarg, the old list contained a few that the new list does not:

{'INSTRUMENTED_CALL_FUNCTION_EX', 'INSTRUMENTED_RETURN_VALUE', 'INSTRUMENTED_LINE', 'INSTRUMENTED_END_SEND', 'INSTRUMENTED_INSTRUCTION', 'INSTRUMENTED_END_FOR'}

@iritkatriel
Copy link
Member Author

It mostly looks right, except for INSTRUMENTED_CALL_FUNCTION_EX, which delegates the work to CALL_FUNCTION_EX which does have an arg.

@gvanrossum
Copy link
Member

Ugh. Worse, now LOAD_CLOSURE is gone, it’s ambiguous. Who uses this? How? (Only dis.py?)

LOAD_CLOSURE is just an alias (pseudo op) for LOAD_FAST. It's in the haslocal list in the old and new version, no change there.

My point is that LOAD_CLOSURE used to be separate because it was in the hasfree list. Now some LOAD_FAST instances refer to a local and others to a free variable. Example:

>>> def foo(a):
...   print(a)
...   def bar(): return a
... 
>>> dis.dis(foo)
              0 MAKE_CELL                0 (a)

  1           2 RESUME                   0

  2           4 LOAD_GLOBAL              1 (NULL + print)
             14 LOAD_DEREF               0 (a)
             16 CALL                     1
             24 POP_TOP

  3          26 LOAD_FAST                0 (a)
             28 BUILD_TUPLE              1
             30 LOAD_CONST               1 (<code object bar at 0x101d18400, file "<stdin>", line 3>)
             32 MAKE_FUNCTION
             34 SET_FUNCTION_ATTRIBUTE   8 (closure)
             36 STORE_FAST               1 (bar)
             38 RETURN_CONST             0 (None)

Disassembly of <code object bar at 0x101d18400, file "<stdin>", line 3>:
              0 COPY_FREE_VARS           1

  3           2 RESUME                   0
              4 LOAD_DEREF               0 (a)
              6 RETURN_VALUE
>>> 

The LOAD_FAST (a) at offset 26 used to be a LOAD_CLOSURE in 3.12 and before (and in 3.13 until LOAD_CLOSURE was made a pseudo-op). It's harmless, because we have co_varnames indexed by both locals and cells. So maybe hasfree is no longer useful?

I don't think

@gvanrossum
Copy link
Member

The "<148>" is because in opcode.py in main we have hasfree.append(148) even though there is no such opcode.

Looks like a relic. In 3.11, 148 was LOAD_CLASSDEREF. That opcode no longer exists in 3.12 (it was removed in gh-103764 for PEP 695). So let's get rid of it.

@iritkatriel
Copy link
Member Author

The LOAD_FAST (a) at offset 26 used to be a LOAD_CLOSURE in 3.12 and before (and in 3.13 until LOAD_CLOSURE was made a pseudo-op). It's harmless, because we have co_varnames indexed by both locals and cells. So maybe hasfree is no longer useful?

So we merge hasfree into haslocal and remove hasfree? Or leave it empty and soft deprecate it?

@iritkatriel iritkatriel changed the title gh-105481: add haslocal/hasfree to _opcode.py. Generate most oplists in opcode.py with the functions in _opcode.py gh-105481: add has_local/has_free/etc to _opcode.py. Generate oplists in opcode.py with the functions in _opcode.py Jul 15, 2023
@gvanrossum
Copy link
Member

So we merge hasfree into haslocal and remove hasfree? Or leave it empty and soft deprecate it?

Hm, I was just writing up a rationale for this when I realized that we should just keep hasfree and populate it correctly, and explain the special situation where LOAD_FAST can be used to load a cell (previously LOAD_CLOSURE) separately.

@iritkatriel
Copy link
Member Author

iritkatriel commented Jul 15, 2023

Ok, then this PR is probably fine w.r.t. hasfree and haslocal.

The only list in oplists now is hascompare, which contains only COMPARE_OP:

>>> [opcode.opname[op] for op in opcode.hascompare]
['COMPARE_OP']

It is used by dis to identify that its op (>=, ==, etc) needs to be displayed.

Modules/_opcode.c Outdated Show resolved Hide resolved
@gvanrossum
Copy link
Member

It is used by dis to identify that its op (>=, ==, etc) needs to be displayed.

Since there's only one opcode like this, maybe dis can just check whether it's that opcode? Or do you anticipate that in the future there will be multiple ones? (Were there ever?)

@iritkatriel
Copy link
Member Author

It is used by dis to identify that its op (>=, ==, etc) needs to be displayed.

Since there's only one opcode like this, maybe dis can just check whether it's that opcode? Or do you anticipate that in the future there will be multiple ones? (Were there ever?)

It's just this one all the way back to 3.8.

dis can stop using the list, but it's documented in the dis module so can we remove it?

@gvanrossum
Copy link
Member

dis can stop using the list, but it's documented in the dis module so can we remove it?

I guess deprecate it like hasjabs/hasjrel.

@iritkatriel iritkatriel changed the title gh-105481: add has_local/has_free/etc to _opcode.py. Generate oplists in opcode.py with the functions in _opcode.py gh-105481: Generate the opcode lists in dis from data extracted from bytecodes.c Jul 17, 2023
Lib/opcode.py Outdated Show resolved Hide resolved
Lib/opcode.py Show resolved Hide resolved
Lib/opcode.py Outdated Show resolved Hide resolved
Lib/opcode.py Outdated Show resolved Hide resolved
Lib/opcode.py Show resolved Hide resolved

opname = ['<%r>' % (op,) for op in range(MAX_PSEUDO_OPCODE + 1)]
for op, i in opmap.items():
opname[i] = op

# The build uses older versions of Python which do not have _opcode.has_* functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite mysterious. IIRC Brandt (?) wrote some hack that reads opcode.py and then execs it? But I can't find it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in Tools/build/generate_opcode_h.py. But when it execs opcode.py that tries to import _opcode and call its has_arg etc to construct the oplists (which are not used during the build).

Ideally we should just get rid of generate_opcode_h.py if we can.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to import _opcode into opcode at all. We can just import it directly into dis (which is not used by any build scripts). I'll try that.

self.assertIsInstance(func(op), bool)
self.assertEqual(func(op), expected)

EXPECTED_OPLISTS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that each time we add or remove an opcode we need to update this test? That feels tedious. (Even though it looks like it was always done like this and you're just refactoring that code a bit.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past the list was built in opcode.py so we would need to make sure we added i there. Now the lists are derived automatically from bytecodes.c via heuristics, and you will need to update the test (which you are less likely to be able to do incorrectly). I'm not sure it's safe to not have these full tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. So the point of the heuristics is that we don't have to manually maintain it, but the tests are manually maintained to check that the heuristics are still working. If they are, and if an instruction is added (or deleted) that changes one of these lists, we'll get a bogus test failure, and hopefully the contributor who gets to debug the test failure understands they have to fix the test. I think this warrants a comment right there where we expect the test to fail (I think on line 102, self.assertEqual(res, op in expected)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manually maintained lists were not tested and had some errors. We could remove the tests and trust people to examine the diff in the metadata file when they make changes. If we do that we should remove it from the autogenerated list so that it's not collapsed in a PR diff.

@iritkatriel
Copy link
Member Author

I think I need to add in whatsnew that the dis module's opcode lists are no longer also defined in the opcode module. The opcode module is not documented, but if you look on GitHub you can see some places where people import it.

@iritkatriel
Copy link
Member Author

I think I need to add in whatsnew that the dis module's opcode lists are no longer also defined in the opcode module. The opcode module is not documented, but if you look on GitHub you can see some places where people import it.

Or maybe this is a breaking change that I can't make without deprecation? I'm not sure what the status of the opcode module is.

I can put the lists back in opcode, and instead work to remove the dependency of the build on opcode.

@gvanrossum
Copy link
Member

I have a feeling that opcode is actually a better place than dis for the official public API. And that's probably why people have been importing it despite its undocumented status.

…ta instead of opcode for _specialized_instructions"

This reverts commit 5e1c4a4.
…rom _opcode_metadata rather than via opcode"

This reverts commit d525c53.
@iritkatriel
Copy link
Member Author

OK, I reverted the last few commits so we leave everything in opcode. We will rid the build of opcode in a future PR once the opcode.h file is generated from bytecodes.c.

@iritkatriel iritkatriel merged commit 40f3f11 into python:main Jul 18, 2023
22 checks passed
@iritkatriel iritkatriel deleted the opcode_py branch July 25, 2023 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants