-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the dis module to provide better building blocks for bytecode analysis #56025
Comments
As discussed in bpo-11549 a couple of tests need to inspect disassembly of some code. Currently they have to override sys.stdout, run dis and restore stdout back. It would be much nicer if dis module provided functions that return disassembly as a string. Provided is a patch that adds file argument to most dis functions, defaulting to sys.stdout. On top of that there are 2 new functions: dis_to_str and disassembly_to_str that return disassembly as a string instead of writing it to a file. |
Inspecting the text disassembly is a bit fragile for testing. It would be better to scan a list of (opcode, oparg) pairs for given pattern (i.e. (LOAD_CONST, 3) where consts[3] --> some target value). |
Agreed, but that would require rewriting of all tests in test_peepholer. |
Yep! |
I really like the idea of adding some lower level infrastructure to dis to make it generator based, making the disassembly more amenable to programmatic manipulation. Consider if, for each line disassemble() currently prints, we had an underlying iterator that yielded a named tuple consisting of (index, opcode, oparg, linestart, details). I've created a proof-of-concept for that in my sandbox (http://hg.python.org/sandbox/ncoghlan/file/get_opinfo/Lib/dis.py) which adds a get_opinfo() function that does exactly. With disassemble() rewritten to use that, test_dis and test_peepholer still pass as currently written. Near-term, test_peepholer could easily continue to do what it does now (i.e. use the higher level dis() function and redirect sys.stdout). Longer term, it could be written to analyse the opcode stream instead of doing string comparisons. |
Changed issue title to cover ideas like get_opinfo(). |
Oops, I forgot to edit my comment to match the OpInfo definition I used in the proof-of-concept: OpInfo = collections.namedtuple("OpInfo",
"opindex opcode opname oparg details starts_line is_jump_target") |
So in the near term, dis-based tests should continue to copy/paste sys.stdout redirection code? |
If we decide our long term goal is the use of the opcode stream for programmatic access, then yes. |
FWIW in PyPy we have https://bitbucket.org/pypy/pypy/src/default/lib_pypy/disassembler.py which we use for some of our tools. |
Do not forget to update docs too. |
Nick, I still want to work on this one. |
The diff generator didn't work - I've uploaded the current patch manually to make it easier to review than it is in my bitbucket repo. I just noticed there's a missing element in the docs patch at the moment - to make testing easier, Ryan added a 'file' argument to the various print-based dis functions so the output can easily be captured in a StringIO object. The docs updates don't currently reflect that, they only cover the OpInfo and get_opinfo additions (along with a clarification of the dis module's slightly odd use of the term 'free'). Aside from that, the core concept of the patch is pretty simple:
One potential criticism is the complexity of the 'expected output' for the new OpInfoTestCase, but it seemed worth it to vet the way the new code handles several cases. The programmatic nature makes the opcode sequences much easier to read and maintain than the corresponding formatted output tests would have been. These new tests also cover an error that the previous incarnation of the test suite missed completely (I had a bug at one point where I had incorrectly omitted the second half of the list of cell names - there was no test to check that the disassembler handled references to such names correctly) |
Regenerated the get_opinfo patch against current 3.3 tip. Still haven't fixed the missing doc updates mentioned in my last message, though. |
Attached patch should now be complete, including the documentation for the new keyword-only 'file' parameter on various dis module functions. |
I took a quick look over the final patch (I will do a more thorough 'OpInfo' makes it sound like information concerning only the opcode, but for opinfo in dis.get_opinfo(thing):
process(opinfo) which seems vague. The following seems clearer to me: for instr in dis.bytecode_instructions(thing):
process(instr) And instead of 'OpInfo' perhaps 'ByteCodeInstruction'. Even the current |
'Op' is just an abbreviation of 'operation'. So 'operation code' becomes 'opcode' and 'operation information' becomes 'opinfo'. The fact that it comes for the 'dis' module gives the context that the *kind* of operation we're talking about is a Python byte code instruction. When people are hacking on bytecode in the future, they'll likely end up using get_opinfo() a fair bit, so swapping the succinct 'opinfo' for the verbose 'bytecode_instruction' strikes me as a poor trade-off. |
I agree that 'bytecode_instructions' is a long-winded. FWIW, I Here are a few examples:
|
Bitbucket repo and attached patch updated relative to current tip. |
Meador's suggested name change has grown on me, so I plan to switch the name of the new API to "get_instructions()" and the new class to "Instruction". |
Grr, "Create Patch" insists on trying to produce a patch based on https://bitbucket.org/ncoghlan/cpython_sandbox/changesets/9512712044a6. That checkin is from *September* and ignores all my recent changes :P Relevant meta-tracker issue: http://psf.upfronthosting.co.za/roundup/meta/issue429 Manual patch upload coming shortly... |
OK, manual up-to-date patch attached. |
@ron: Now that it has a reasonably clear signature, I could see my way clear to making the Instruction._disassemble() method public, which makes it easy for people to compose their own disassembly output. For all the other display methods, I prefer Ryan Kelly's suggestion of supporting a "file" argument which is then passed through to the underlying "print()" calls. This is inconsistent with what I originally did for the code_info() APIs (where I made a separate "give me the string" function), but it's the lowest impact change that avoids the need to capture stdout. |
MvL pointed out I hadn't updated the Hg repo reference when I moved my sandbox over to BitBucket - the diff it was generating was from the last time I updated my pydotorg sandbox in order to try something on the buildbots. |
Given the imminent 3.3 beta 1 feature freeze and the fact I would like to explore Ron's suggestion of a higher level ByteCode object to encapsulate a sequence of instructions (along with additional information from the code object), postponing this one. |
To clarify the vague allusion in my last comment, Ron's suggestion was along the lines of creating a dis.Bytecode object that encapsulated everything the dis module can figure out about a piece of compiled code. That would mean exposing the kind of info reported in a string by dis.code_info() as attributes/properties, and have the proposed "get_opinfo()" be the __iter__ method on the disassembled Bytecode objects. |
I've updated Nick's patch so that test_dis and test_peephole pass again, and added a prototype ByteCode class (without any docs or tests for now, to allow for API discussion). The prototype ByteCode is instantiated with any of the objects that get_instructions already accepts (functions, methods, code strings & code objects). Iterating over it yields Instruction objects. It has info(), show_info() and display_code() methods, which correspond to the code_info(), show_code() and disassemble() functions. I've tried to go for names that make sense, rather than names that fit the existing pattern, because the existing pattern feels a bit messy. E.g. the show_code() function doesn't actually show the code, so I've called its method equivalent show_info(). |
Thanks Thomas! It's a promising start - a few more detailed comments in the patch review. I like the idea of creating the initial version as an object-oriented wrapper around the existing APIs, rather than completely refactoring the module to make everything else a functional wrapper around an underlying object-oriented implementation. |
Updated version of the patch. Changed from review:
Still to do:
|
I've added docs and tests, and split the changes to test_peepholer into a separate patch. I haven't re-exposed details of the code object as attributes of Bytecode instances, because they're already available as e.g. bytecode.codeobj.co_names . I think it would be more confusing than useful to offer the same values in two places, though I'm open to discussion on this. I've re-organised the dis module docs a bit. I've put Bytecode at the top, as I think it's a more intuitive API than the functions, which have somewhat counter-intuitive names due to the module's history. |
Ping - the latest patches (dis_api3 & test_peepholer) are ready for review when someone's got a moment. Thanks! |
I created bpo-17916 after realising that the new OO API doesn't yet provide an equivalent to dis.distb that returns an appropriate Bytecode object. (I don't think it makes sense to hold up this patch for that change) |
Good thing test_peepholer was moved out to a separate patch - a failure of that picked up a bug in the new disassembly output (unifying the handling of name and constant dereferences had changed the way constant strings were reported in the disassembly, and the error was consistent in both the new implementation and in the new tests due to the way the expected test results had been generated) |
New changeset f65b867ce817 by Nick Coghlan in branch 'default': |
New changeset d3fee4c64654 by Nick Coghlan in branch 'default': |
And two-and-a-bit years later, we're done - thanks all, any further feedback or problems can be filed as a new issue :) |
test_dis is failing on some buildbots: http://buildbot.python.org/all/builders/AMD64 Ubuntu LTS 3.x/builds/1674/steps/test/logs/stdio Re-running test 'test_dis' in verbose mode
test test_dis crashed -- Traceback (most recent call last):
File "/opt/python/3.x.langa-ubuntu/build/Lib/test/regrtest.py", line 1294, in runtest_inner
the_module = importlib.import_module(abstest)
File "/opt/python/3.x.langa-ubuntu/build/Lib/importlib/__init__.py", line 92, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1603, in _gcd_import
File "<frozen importlib._bootstrap>", line 1584, in _find_and_load
File "<frozen importlib._bootstrap>", line 1551, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 591, in _check_name_wrapper
File "<frozen importlib._bootstrap>", line 1053, in load_module
File "<frozen importlib._bootstrap>", line 1034, in load_module
File "<frozen importlib._bootstrap>", line 567, in module_for_loader_wrapper
File "<frozen importlib._bootstrap>", line 901, in _load_module
File "<frozen importlib._bootstrap>", line 297, in _call_with_frames_removed
File "/opt/python/3.x.langa-ubuntu/build/Lib/test/test_dis.py", line 4, in <module>
from test.bytecode_helper import BytecodeTestCase
ImportError: No module named 'test.bytecode_helper' |
Yes, this is bytecode_helper hasn't been added to the repository. |
(this is *because*, sorry) |
Ping! The test is still failing. |
bytecode_helper is there in dis_api3.diff - anyone with commit rights should be able to add it to the repository. |
New changeset 84d1a0e32d3b by Nick Coghlan in branch 'default': |
I checked in the missing file after I woke up this morning. Maybe I'll learn to use hg import instead of patch some day... Sorry for the noise. |
New changeset bf997b22df06 by Victor Stinner in branch '3.5': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: