Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and fix invalid variable names #59

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ZetaTwo
Copy link

@ZetaTwo ZetaTwo commented Apr 11, 2020

This is my attempt at implementing the fix to the issue discussed in #58

It does seem to work but there are four things I would like to discuss:

  1. I'm only considering the code.co_names list here. I'm not extremely familiar with Python code objects. Can or should I look at any of the other lists of names as well?
  2. Is there a better way to modify the code object than the reconstruction I'm doing in code.py:683-700?
  3. What should the default for warning and fixing variable names be? Here I went with True/True
  4. Can/should I add tests for this. I have added three files 02_invalid_variable_name{1,2,3}.pyc containing variables named y = "Hello" # (code), for (identifier) and 0x (leading digit) respectively but should I add them to the test suite somehow?

xdis/code.py Outdated Show resolved Hide resolved
fixed_name = valid_names.add(fix_variable_name(co_name))
fixed_names.append(fixed_name)

args = [code.co_argcount]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section is ugly

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about that. Ideal is to add "replace" methods to the xdis code type and and a method to convert that code type to the native code type.

It is more work that we originally bargained for but after its done things will be much nicer I think.

@@ -253,3 +253,25 @@ def show_code(co, version, file=None, is_pypy=False):
print(code_info(co, version, is_pypy=is_pypy))
else:
file.write(code_info(co, version) + "\n")

class UniqueSuffixSet:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not come up with a better name for this class

args = [code.co_argcount]
if PYTHON3:
args.append(code.co_kwonlyargcount)
args += [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently I've come to learn that Python 3.8 there is now a "replace" method on Code. See https://docs.python.org/3/library/types.html#types.CodeType

So xdis in its code type should add this method (for all types of CodeType) . So the way ths might work to create an xdis code type and then convert that to the local code type.

I realize this is a digression from the main topic here, so it can be deferred. I do want to note it becuase this kind of code is pervasive and ugly and error prone.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I'm not super familiar with code objects especially not accross different Python versions and I'm also not familiar with this codebase so I don't know what Python versions xdis is aming to be compatible with but I'll gladly change this part of the code anytime as it is by far the worst part of the whole PR.

def add(self, value_candidate):
"""Add a new value to the set and return the actual value, including suffix, added"""
if value_candidate in self.values:
for suffix_number in range(len(self.values)):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the first name is already in the set, it's enough to try len(self.values) additional names by the pigeonhole principle.

@rocky
Copy link
Owner

rocky commented Apr 11, 2020

  1. Definitely should have co_cellvars, co_freevars, co_names, co_varnames. But to be extra careful and safe I'd add co_name which uncompyle6 and others might use that in reconstructing source code.
  2. Was answered elsewhere
  3. For defaults, I don't like autofix, so warn by default, but don't fix by default. The disassemblers are free to change their defaults, but shouldn't do that here because this works with bytecode.
  4. Yes, absolutely tests are great! Thanks! pytest testing is in the pytest directory.

@rocky
Copy link
Owner

rocky commented Apr 11, 2020

I want to go over this carefully, so it may take a day or so. Looks interesting so far. Thanks for undertaking.

@rocky rocky marked this pull request as draft April 11, 2020 23:59
@ZetaTwo
Copy link
Author

ZetaTwo commented Apr 12, 2020

  1. Great. Are all of them also suitable for repairing? Or are could they somehow be referencing external things?
  2. How should I proceed with this point? Make something that works for now and then make a separate PR and go back and fix this or expand the scope of this PR to include that?
  3. Makes sense, I'll set default for warning=True and fixing=False. I left it to True/True in the command line tool. Should I change it to True/False there as well?
  4. I added tests which seem to work but it also seems they exposed another issue with point 2 as now the following test fails due to the co_flags value being changed for some reason. I don't really understand what's gonig on though.
$ python3 -m pytest
pytest/test_disasm.py::test_funcoutput[test_tuple2-disassemble_file] FAILED
...
E           # Stack size:        1
E         - # Flags:             0x00000040 (NOFREE)
E         ?                              -   ^^^^^^
E         + # Flags:             0x00000000 (0x0)
E         ?                               +  ^^^
E           # First Line:        4
E           # Constants:

Yeah, I'm done for today so feel free to look through everything in your own pace and get back to me with comments.

@rocky
Copy link
Owner

rocky commented Apr 12, 2020

I just had a chance to look try this purely from a user level. Here are some observations..

The thing that was the biggest surprise is that without any options when run on test/bytecode_3.7/02_invalid_variable_name1.pyc there was no warning but the invalid name in slot 2 of names was changed anyway. I would have expected that it would not change, or better, by default warn but not change the name. I see that you refer to this as item 3 above.

When the --warn-invalid-vars option is given, then there is a warning, but we don't capture which variable(s) were detected and what they were changed from.

Having now tried this as a user, it feels to me that

  --warn-invalid-vars / --nowarn-invalid-vars
                                  warn about invalid variable names
  --fix-invalid-vars / --nofix-invalid-vars
                                  fix the names for variables with invalid
                                  names

should be choice options, not boolean flags. Something like

--check = {none|warn|fix...}

And although this might not be of concern for obfuscated code, there are a number of other checks that might be done on the sanity of the bytecode in the future. In a really secure environment, it might be nice to know whether bytecode is well formed. For example, if you get the stacksize wrong and it is too small, Python will SEGV with no information whatsoever.

Although it might be hard to do, it is within the realm of possibility that malware might be able to craft bytecode in such a was so as to take advantage of this. I would be interested if someone has tried to do something like this.

As for your specific questions...

  1. Are all of them also suitable for repairing? Or are could they somehow be referencing external things?

All of them are suitable for repairing. I'll try to create some bytecode with all of the kinds of flaws mentioned, to make it easier to understand what I'm talking about.

  1. How should I proceed with this point? Make something that works for now and then make a separate PR and go back and fix this or expand the scope of this PR to include that?

Here's a suggestion. I've added this as a branch of xdis currently. Let's make branches off of this or fix directly on the branch as the need demands. I can make PR's against this branch in your repository. For example, expect one which contains additional invalid bytecode for testing.

When everything is settled I'll merge this branch into master. While I do appreciate the great work on this so far, based on having tried this, I don't think it's ready to go into master just yet. I think things are a little more complicated and involved than either of us had imagined initially. That said, I do think we'll get there.

How does this sound? If this doesn't work for you, then make a suggestion.

  1. Makes sense, I'll set default for warning=True and fixing=False. I left it to True/True in the command line tool. Should I change it to True/False there as well?

Should be answered above. If this is not clear or you think otherwise, let me know.

  1. ... but it also seems they exposed another issue with point 2 as now the following test fails due to the co_flags value being changed for some reason. I don't really understand what's gonig on though.

I am not able to reproduce this problem when I run pytest. But i know that Flags: 0x00000040 (NOFREE) is the right thing and the other while not wrong is not optimal.

I see the CI failure with 3.8. I'll address this later.

@rocky
Copy link
Owner

rocky commented Apr 13, 2020

I think the test failure I am seeing is related to the fact that code types change between 3.7 and 3.8 yet again.

The remedy I'm going to work with is to do add the replace() method and the convert to local-type method (the latter being the more relevant to this particular failure) in order to fix.

The better remedy is to use a more portable code type, and then "#replace" isn't needed. (However I'll add that so as to be more compatible with 3.8+ conventions).

@rocky
Copy link
Owner

rocky commented Apr 14, 2020

This is to give you a head's as to what's going on in xdis and how this fits in.

A long overdue overhaul of xdis's portable code types is underway in branch feature/code-replace.

Independent of this, I realize that what should have been done in the PR is not use Python's type.CodeType but one of the portable variants that I am currently cleaning up.

The disassemblers and deparsers all can use the portable version just as easily as the native version.

When the refactoring of the code type is done, then this PR will have to be redone, although the changes caused by refactoring will be slight.

@ZetaTwo
Copy link
Author

ZetaTwo commented Apr 15, 2020

When the refactoring of the code type is done, then this PR will have to be redone, although the changes caused by refactoring will be slight.

Great. Do you have any rough estimate on when you will do this? I think that if it is in any reasonable near future I might as well just hold off on this PR completely and then just update it using the updated portable type instead.

@rocky
Copy link
Owner

rocky commented Apr 15, 2020

I'll probably be done by the end of the week. As part of this I will be adding some small kinds of validity checks, such as for the types of various parameters.

After that there is some opcode stack manipulation tracking information that I want to add.

And then a new release which will be a major version bump release.

So my thought is that if this can hold off, I do the major release with some version checking. Wait for that to shake out a little, and then add this.

Your thoughts?

@ZetaTwo
Copy link
Author

ZetaTwo commented Apr 15, 2020

Yeah that sounds good. As soon as the interface is set I can update this and include the other changes you have mentioned above and then you can just hold off release to whenver it's suitable.

@rocky rocky force-pushed the master branch 2 times, most recently from 3398d64 to 22798af Compare October 30, 2021 10:18
@ZetaTwo
Copy link
Author

ZetaTwo commented Oct 31, 2021

Hi again @rocky ! I thought about giving this another look. What is the status of the code object interface at the moment? Do you think the base is there for me to finish this work?

@rocky
Copy link
Owner

rocky commented Nov 1, 2021

Hi again @rocky ! I thought about giving this another look. What is the status of the code object interface at the moment? Do you think the base is there for me to finish this work?

Sure, I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants