Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Lib/opcode.py from Python/bytecodes.c #102674

Closed
gvanrossum opened this issue Mar 14, 2023 · 3 comments
Closed

Generate Lib/opcode.py from Python/bytecodes.c #102674

gvanrossum opened this issue Mar 14, 2023 · 3 comments
Labels
type-feature A feature request or enhancement

Comments

@gvanrossum
Copy link
Member

gvanrossum commented Mar 14, 2023

This could also auto-generate Include/opcode.h, Include/internal/pycore_opcode.h and Python/opcode_targets.h, subsuming both Tools/build/generate_opcode_h.py and Python/makeopcodetargets.py -- although the simplest approach would probably be to just keep those tools and only ensure that they are re-run after opcode.py is updated.

Variables to generate

The auto-generatable contents of opcode.py is as follows:

  • opmap -- make up opcode numbers in a pass after reading all input; special-case CACHE = 0
  • HAVE_ARGUMENT -- done while making up opcode numbers (group argument-less ones in lower group)
  • ENABLE_SPECIALIZATION = True
  • EXTENDED_ARG -- set from opcode for EXTENDED_ARG inst, if there is one
  • opname -- invert opmap
  • pseudo opcode definitions, can add DSL syntax pseudo(NAME) = { name1, name2, ... };
  • hasarg -- can be derived from instr_format metadata
  • hasconst -- could hardcode to LOAD_CONST
  • hasname -- may have to check for occurrences of co_names in code?
  • hasjrel -- check for JUMPBY with arg that doesn't start with INLINE_CACHE_ENTRIES_
  • hasjabs-- no longer used, set to [] for backwards compatibility
  • haslocal -- opcode name contains '_FAST'
  • hascompare -- opcode name starts with COMPARE_
  • hasfree -- opcode name ends in DEREF or CELL or CLOSURE
  • hasexc -- pseudo opcode, name starts with SETUP_
  • MIN_PSEUDO_OPCODE = 256
  • MAX_PSEUDO_OPCODE -- derive from pseudo opcodes
  • __all__ -- hardcode

The following are not public but imported by dis.py so they are still needed:

  • _nb_ops -- just hardcode?
  • _specializations -- derive from families (with some adjustments)
  • _specialized_instructions -- compute from _specializations
  • _specialization_stats -- only used by test__opcode.py, move into there?
  • _cache_format -- compute from cache effects? (how to make up names?)
  • _inline_cache_entries -- compute from _cache_format

The hardcoded stuff can go in prologue and epilogue sections that are updated manually.

This project (if we decide to do it) might be a good reason to refactor generate_cases.py into a library (the code for this shouldn't be shoved into the main file).

Benefits

We can wait on this project until we are sure we need at least one of the following benefits:

  • Once we are generating the numeric opcode values it will be easier to also generate numeric values for micro-opcodes.
  • Avoid having to keep multiple definitions in sync (e.g. families, cache formats).
  • Easier to understand. E.g. where are the numeric opcode values for specialized instructions defined? (Would require also subsuming generate_opcode_h.py.)

Linked PRs

@corona10
Copy link
Member

https://github.com/python/cpython/blob/main/Tools/unicode/gencjkcodecs.py can be a good reference for this task.
(Template based output)

gvanrossum pushed a commit that referenced this issue Mar 14, 2023
It's not use except in a test, so move it there instead.
carljm added a commit to carljm/cpython that referenced this issue Mar 14, 2023
* main: (50 commits)
  pythongh-102674: Remove _specialization_stats from Lib/opcode.py (python#102685)
  pythongh-102660: Handle m_copy Specially for the sys and builtins Modules (pythongh-102661)
  pythongh-102354: change python3 to python in docs examples (python#102696)
  pythongh-81057: Add a CI Check for New Unsupported C Global Variables (pythongh-102506)
  pythonGH-94851: check unicode consistency of static strings in debug mode (python#102684)
  pythongh-100315: clarification to `__slots__` docs. (python#102621)
  pythonGH-100227: cleanup initialization of global interned dict (python#102682)
  doc: Remove a duplicate 'versionchanged' in library/asyncio-task (pythongh-102677)
  pythongh-102013: Add PyUnstable_GC_VisitObjects (python#102014)
  pythonGH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (pythonGH-102649)
  pythongh-102627: Replace address pointing toward malicious web page (python#102630)
  pythongh-98831: Use DECREF_INPUTS() more (python#102409)
  pythongh-101659: Avoid Allocation for Shared Exceptions in the _xxsubinterpreters Module (pythongh-102659)
  pythongh-101524: Fix the ChannelID tp_name (pythongh-102655)
  pythongh-102069: Fix `__weakref__` descriptor generation for custom dataclasses (python#102075)
  pythongh-98169 dataclasses.astuple support DefaultDict (python#98170)
  pythongh-102650: Remove duplicate include directives from multiple source files (python#102651)
  pythonGH-100987: Don't cache references to the names and consts array in `_PyEval_EvalFrameDefault`. (python#102640)
  pythongh-87092: refactor assemble() to a number of separate functions, which do not need the compiler struct (python#102562)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102631)
  ...
Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this issue Mar 27, 2023
…hon#102685)

It's not use except in a test, so move it there instead.
@iritkatriel iritkatriel added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Apr 4, 2023
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
…hon#102685)

It's not use except in a test, so move it there instead.
@iritkatriel
Copy link
Member

iritkatriel commented Nov 29, 2023

I wasn't aware of this issue and did most of this under #105481.

I think it's all done except for _cache_format.

@iritkatriel
Copy link
Member

I'll go ahead and close this, we left cache format out because it's not currently possible to generate it from bytecodes.c because the cache entries are not specified in a consistent way (with same names) across families. There isn't currently a good reason to resolve that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants