-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and fix invalid variable names #59
base: master
Are you sure you want to change the base?
Conversation
fixed_name = valid_names.add(fix_variable_name(co_name)) | ||
fixed_names.append(fixed_name) | ||
|
||
args = [code.co_argcount] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole section is ugly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment about that. Ideal is to add "replace" methods to the xdis code type and and a method to convert that code type to the native code type.
It is more work that we originally bargained for but after its done things will be much nicer I think.
@@ -253,3 +253,25 @@ def show_code(co, version, file=None, is_pypy=False): | |||
print(code_info(co, version, is_pypy=is_pypy)) | |||
else: | |||
file.write(code_info(co, version) + "\n") | |||
|
|||
class UniqueSuffixSet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not come up with a better name for this class
args = [code.co_argcount] | ||
if PYTHON3: | ||
args.append(code.co_kwonlyargcount) | ||
args += [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently I've come to learn that Python 3.8 there is now a "replace" method on Code. See https://docs.python.org/3/library/types.html#types.CodeType
So xdis in its code type should add this method (for all types of CodeType) . So the way ths might work to create an xdis code type and then convert that to the local code type.
I realize this is a digression from the main topic here, so it can be deferred. I do want to note it becuase this kind of code is pervasive and ugly and error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. I'm not super familiar with code objects especially not accross different Python versions and I'm also not familiar with this codebase so I don't know what Python versions xdis is aming to be compatible with but I'll gladly change this part of the code anytime as it is by far the worst part of the whole PR.
def add(self, value_candidate): | ||
"""Add a new value to the set and return the actual value, including suffix, added""" | ||
if value_candidate in self.values: | ||
for suffix_number in range(len(self.values)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the first name is already in the set, it's enough to try len(self.values)
additional names by the pigeonhole principle.
|
I want to go over this carefully, so it may take a day or so. Looks interesting so far. Thanks for undertaking. |
$ python3 -m pytest
pytest/test_disasm.py::test_funcoutput[test_tuple2-disassemble_file] FAILED
...
E # Stack size: 1
E - # Flags: 0x00000040 (NOFREE)
E ? - ^^^^^^
E + # Flags: 0x00000000 (0x0)
E ? + ^^^
E # First Line: 4
E # Constants: Yeah, I'm done for today so feel free to look through everything in your own pace and get back to me with comments. |
I just had a chance to look try this purely from a user level. Here are some observations.. The thing that was the biggest surprise is that without any options when run on When the Having now tried this as a user, it feels to me that --warn-invalid-vars / --nowarn-invalid-vars
warn about invalid variable names
--fix-invalid-vars / --nofix-invalid-vars
fix the names for variables with invalid
names should be choice options, not boolean flags. Something like
And although this might not be of concern for obfuscated code, there are a number of other checks that might be done on the sanity of the bytecode in the future. In a really secure environment, it might be nice to know whether bytecode is well formed. For example, if you get the stacksize wrong and it is too small, Python will SEGV with no information whatsoever. Although it might be hard to do, it is within the realm of possibility that malware might be able to craft bytecode in such a was so as to take advantage of this. I would be interested if someone has tried to do something like this. As for your specific questions...
All of them are suitable for repairing. I'll try to create some bytecode with all of the kinds of flaws mentioned, to make it easier to understand what I'm talking about.
Here's a suggestion. I've added this as a branch of xdis currently. Let's make branches off of this or fix directly on the branch as the need demands. I can make PR's against this branch in your repository. For example, expect one which contains additional invalid bytecode for testing. When everything is settled I'll merge this branch into master. While I do appreciate the great work on this so far, based on having tried this, I don't think it's ready to go into master just yet. I think things are a little more complicated and involved than either of us had imagined initially. That said, I do think we'll get there. How does this sound? If this doesn't work for you, then make a suggestion.
Should be answered above. If this is not clear or you think otherwise, let me know.
I see the CI failure with 3.8. I'll address this later. |
I think the test failure I am seeing is related to the fact that code types change between 3.7 and 3.8 yet again.
The better remedy is to use a more portable code type, and then "#replace" isn't needed. (However I'll add that so as to be more compatible with 3.8+ conventions). |
This is to give you a head's as to what's going on in xdis and how this fits in. A long overdue overhaul of xdis's portable code types is underway in branch feature/code-replace. Independent of this, I realize that what should have been done in the PR is not use Python's The disassemblers and deparsers all can use the portable version just as easily as the native version. When the refactoring of the code type is done, then this PR will have to be redone, although the changes caused by refactoring will be slight. |
Great. Do you have any rough estimate on when you will do this? I think that if it is in any reasonable near future I might as well just hold off on this PR completely and then just update it using the updated portable type instead. |
I'll probably be done by the end of the week. As part of this I will be adding some small kinds of validity checks, such as for the types of various parameters. After that there is some opcode stack manipulation tracking information that I want to add. And then a new release which will be a major version bump release. So my thought is that if this can hold off, I do the major release with some version checking. Wait for that to shake out a little, and then add this. Your thoughts? |
Yeah that sounds good. As soon as the interface is set I can update this and include the other changes you have mentioned above and then you can just hold off release to whenver it's suitable. |
218e4a6
to
05198ff
Compare
3398d64
to
22798af
Compare
Hi again @rocky ! I thought about giving this another look. What is the status of the code object interface at the moment? Do you think the base is there for me to finish this work? |
Sure, I think so. |
0786cab
to
c478ee5
Compare
018bfbf
to
fd64c6d
Compare
This is my attempt at implementing the fix to the issue discussed in #58
It does seem to work but there are four things I would like to discuss:
code.co_names
list here. I'm not extremely familiar with Python code objects. Can or should I look at any of the other lists of names as well?y = "Hello" #
(code),for
(identifier) and0x
(leading digit) respectively but should I add them to the test suite somehow?