-
-
Notifications
You must be signed in to change notification settings - Fork 441
Description
Description
Calling uncompyle6.decompile_file() multiple time with multiple (unrelated) files can give different result depending on which files were decompiled and in what order.
Context
I do file analysis in an external python script where I may end up finding a python bytecode file, which it then tries to decompile using uncompyle6. The current code is calling uncompyle6.main.main(), and should probably be calling uncompyle6.decompile_file() instead, but I doubt this has any impact on this issue. The main context is that a python script may be calling uncompyle6's code multiple time for multiple (unrelated) files.
If calling uncompyle6.decompile_file() in my code multiple time with unrelated files is not the best practice for a use-case like that, I would be interested to know as it may simply a wrong entrypoint/approach from my end, and the rest of the issue may be invalid.
How to reproduce
I tried to limit the code to be as simple as possible, and ended up with the following (explanation after the python code block!):
>>> from io import StringIO
>>> import os
>>> import uncompyle6
>>>
>>> sample = os.path.join(os.getcwd(), "f2330efbac26d181662a98999dbc05645a89d83f4136d345d23f8e90fb631ffd.pyc")
>>> out = StringIO()
>>> uncompyle6.decompile_file(filename=sample, outstream=out)
[<uncompyle6.semantics.pysource.SourceWalker object at 0x7fa4f6d0a950>]
>>> print(out.getvalue().splitlines()[15])
parser.add_option('-r', '--wredir', action='store_true', help='Enable answers for netbios wredir suffix queries. Answering to wredir will likely break stuff on the network. Default: False', dest='Wredirect', default=False)
>>>
>>> sample = os.path.join(os.getcwd(), "deb9bf8bb61e9942014d104019432d2a6efaff31c2ff84ec0ad418dc2cca41e7.pyc")
>>> out = StringIO()
>>> uncompyle6.decompile_file(filename=sample, outstream=out)
[<uncompyle6.semantics.pysource.SourceWalker object at 0x7fa4f6c95f90>]
>>>
>>> sample = os.path.join(os.getcwd(), "f2330efbac26d181662a98999dbc05645a89d83f4136d345d23f8e90fb631ffd.pyc")
>>> out = StringIO()
>>> uncompyle6.decompile_file(filename=sample, outstream=out)
[<uncompyle6.semantics.pysource.SourceWalker object at 0x7fa4f68ad410>]
>>> print(out.getvalue().splitlines()[15])
parser.add_option('-r', '--wredir', 12='store_true', 14='Enable answers for netbios wredir suffix queries. Answering to wredir will likely break stuff on the network. Default: False', 16='Wredirect', 18=False)
The samples are probably not that important, but I extracted the f2330 file out of this executable and the deb9b file is already on VT.
When decompiling the f2330 file the first time, we get the correct output (as to what we'd have by calling uncompyle6 on the commandline), but after decompiling the deb9b file and re-decompiling the first file, we now have a different result. The fifteenth line is taken as an example, but shows that it does not contain the strings for the keys of the arguments to the function on the second decompilation.
Analysis
I tried to debug as much as I could by looking at what decompile_file() is doing. It ends up calling decompile() -> code_deparse() -> walker(). From my experiments, this is the line that needs to be executed so that re-decompiling the f2330 file gives a different result. The walker variable being a SourceWalker by default, it calls semantics.customize.customize_for_version(), which looks to be modifying the values found in uncompyle6.semantics.consts, most notably, TABLE_DIRECT and TABLE_R.
For references, the f2330 file is a python 2.7 file, and the deb9b file is a python 3.7.0 file:
>>> from xdis.load import load_module
>>> import os
>>> sample = os.path.join(os.getcwd(), "f2330efbac26d181662a98999dbc05645a89d83f4136d345d23f8e90fb631ffd.pyc")
>>> load_module(sample)
((2, 7), 0, 62211, <Code2 code object <module> at 0x7f98d9e99a50, file Responder.py>, line 17, False, None, None)
>>> sample = os.path.join(os.getcwd(), "deb9bf8bb61e9942014d104019432d2a6efaff31c2ff84ec0ad418dc2cca41e7.pyc")
>>> load_module(sample)
((3, 7, 0), 1678028857, 3394, <Code3 code object <module> at 0x7f98d9cb8450, file coronausb.py>, line 1, False, 6967, None)
I believe that those updates to TABLE_DIRECT and TABLE_R are the root of the issue, as they are persistent when analysing multiple files in a row. The length of TABLE_DIRECT goes from 177 after initial import to 190 after the f2330 file decompilation, and 248 after the deb9b file decompilation, without changing length by re-decompiling the first file. The TABLE_R goes from 2 to 21 to 29.
Workaround:
I was able to modify my test script to make a deep copy of the two tables at initialization time, and reset those after every analysis. This ended up giving the right result when analyzing the f2330 file the second time.
>>> from io import StringIO
>>> import os
>>> import uncompyle6
>>> from copy import deepcopy
>>> local_table_direct = deepcopy(uncompyle6.semantics.consts.TABLE_DIRECT)
>>> local_table_r = deepcopy(uncompyle6.semantics.consts.TABLE_R)
>>>
>>> sample = os.path.join(os.getcwd(), "f2330efbac26d181662a98999dbc05645a89d83f4136d345d23f8e90fb631ffd.pyc")
>>> out = StringIO()
>>> uncompyle6.decompile_file(filename=sample, outstream=out)
[<uncompyle6.semantics.pysource.SourceWalker object at 0x7f5c8a21a110>]
>>> print(out.getvalue().splitlines()[15])
parser.add_option('-r', '--wredir', action='store_true', help='Enable answers for netbios wredir suffix queries. Answering to wredir will likely break stuff on the network. Default: False', dest='Wredirect', default=False)
>>>
>>> uncompyle6.semantics.consts.TABLE_DIRECT = local_table_direct
>>> uncompyle6.semantics.consts.TABLE_R = local_table_r
>>>
>>> sample = os.path.join(os.getcwd(), "deb9bf8bb61e9942014d104019432d2a6efaff31c2ff84ec0ad418dc2cca41e7.pyc")
>>> out = StringIO()
>>> uncompyle6.decompile_file(filename=sample, outstream=out)
[<uncompyle6.semantics.pysource.SourceWalker object at 0x7f5c8a189950>]
>>>
>>> uncompyle6.semantics.consts.TABLE_DIRECT = local_table_direct
>>> uncompyle6.semantics.consts.TABLE_R = local_table_r
>>>
>>> sample = os.path.join(os.getcwd(), "f2330efbac26d181662a98999dbc05645a89d83f4136d345d23f8e90fb631ffd.pyc")
>>> out = StringIO()
>>> uncompyle6.decompile_file(filename=sample, outstream=out)
[<uncompyle6.semantics.pysource.SourceWalker object at 0x7f5c89cbd410>]
>>> print(out.getvalue().splitlines()[15])
parser.add_option('-r', '--wredir', action='store_true', help='Enable answers for netbios wredir suffix queries. Answering to wredir will likely break stuff on the network. Default: False', dest='Wredirect', default=False)
Help for debugging
I can upload the two specific example files to this issue as a password-protected zip if you want them for debugging, but I wanted to point out that they contain malicious payloads.
I will also do my best to answer any question and/or if you want me to test something locally.
If you would prefer, I can also try to generate two simple and benign bytecodes next week, I just wanted to make sure I could document everything down this week. 🙂