New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial commit for backend-smda #355
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a really great piece of work, nice job.
I can tell from the changes you made that you've gotten this working quite successfully. I also appreciate that the style is consistent, both within the PR and our existing code base - that really helps us integrate the changes a lot easier.
The biggest thing I'd like to see before merging is the addition of feature unit tests against the SMDA backend. there are 100 or so unit tests that ensure that each backend behaves essentially the same. fortunately, registering the SMDA backend should be pretty easy (if not, its a bug, and I'd like to work to make it easy for you).
would you duplicate test_viv_features.py as test_smda_features.py
and update it appropriately?
# https://stackoverflow.com/a/22947334/ offers a solution and decoding using getfilesystemencoding works | ||
# in our testing, however other sources suggest `sys.stdin.encoding` (https://stackoverflow.com/q/4012571/) | ||
"sample", | ||
type=str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a fix for #354, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great
@@ -550,7 +573,7 @@ def main(argv=None): | |||
# during the load of the RuleSet, we extract subscope statements into their own rules | |||
# that are subsequently `match`ed upon. this inflates the total rule count. | |||
# so, filter out the subscope rules when reporting total number of loaded rules. | |||
len(filter(lambda r: "capa/subscope-rule" not in r.meta, rules.rules.values())), | |||
len([i for i in filter(lambda r: "capa/subscope-rule" not in r.meta, rules.rules.values())]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
also, i broke master by merging a rule but not syncing the testfiles submodule. thats since fixed. if you can sync this branch from master then we should see the test results show up in CI. |
40 failed, 73 passed.
Check if memory referenced is a pointer to a string. Fixes mimikatz string test.
Found derefs in viv/insn.py, does exactly what we need!
Alright, here we go! We found that the only failing test (x64 nested thunks) does not succeed because disassemblers disagree where to define function borders. It seems vivisect wants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool, this looks great, I think the following items are left to do:
- find a new nested thunk test case that works across the board (explain
insn/api: x64 nested thunk
test case #356) - decide how to use/specify the analysis backend
- specify command-line argument(s)
- default to vivisect for Py2 and SMDA for Py3?
# https://stackoverflow.com/a/22947334/ offers a solution and decoding using getfilesystemencoding works | ||
# in our testing, however other sources suggest `sys.stdin.encoding` (https://stackoverflow.com/q/4012571/) | ||
"sample", | ||
type=str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@mr-tz @mike-hunhoff will one of you take a second look and then do the merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you for all of your hard work on this implementation! It's really cool to see capa running in Python 3 and your approach to implementing a new extractor for capa
. I left a few comments to consider but overall this looks great!
I tested locally and encountered errors while running the smda
tests on Windows with Python v3.7.8 (see below):
> C:\Exclusions\capa>python -m pytest tests\test_smda_features.py
...
=========================================================================================================== short test summary info ===========================================================================================================
FAILED tests/test_smda_features.py::test_smda_features[kernel32-64-function=0x180001010-api(RtlVirtualUnwind)-True0] - struct.error: unpack requires a buffer of 4 bytes
FAILED tests/test_smda_features.py::test_smda_features[kernel32-64-function=0x180001010-api(RtlVirtualUnwind)-True1] - struct.error: unpack requires a buffer of 4 bytes
FAILED tests/test_smda_features.py::test_smda_features[kernel32-64-function=0x1800202B0-api(RtlCaptureContext)-True0] - struct.error: unpack requires a buffer of 4 bytes
FAILED tests/test_smda_features.py::test_smda_features[kernel32-64-function=0x1800202B0-api(RtlCaptureContext)-True1] - struct.error: unpack requires a buffer of 4 bytes
FAILED tests/test_smda_features.py::test_smda_features[al-khaser x64-function=0x14004B4F0-api(__vcrt_GetModuleHandle)-True] - struct.error: unpack requires a buffer of 4 bytes
FAILED tests/test_smda_features.py::test_smda_features[kernel32-64-function=0x1800017D0-characteristic(peb access)-True] - struct.error: unpack requires a buffer of 4 bytes
FAILED tests/test_smda_features.py::test_smda_features[kernel32-64-function=0x180001068-characteristic(gs access)-True] - struct.error: unpack requires a buffer of 4 bytes
FAILED tests/test_smda_features.py::test_smda_features[a1982...-function=0x4014D0-characteristic(cross section flow)-True] - struct.error: unpack requires a buffer of 4 bytes
FAILED tests/test_smda_features.py::test_smda_features[kernel32-64-function=0x180001068-characteristic(cross section flow)-False] - struct.error: unpack requires a buffer of 4 bytes
Results (47.59s):
104 passed
9 failed
- tests/test_smda_features.py:13 test_smda_features[kernel32-64-function=0x180001010-api(RtlVirtualUnwind)-True0]
- tests/test_smda_features.py:13 test_smda_features[kernel32-64-function=0x180001010-api(RtlVirtualUnwind)-True1]
- tests/test_smda_features.py:13 test_smda_features[kernel32-64-function=0x1800202B0-api(RtlCaptureContext)-True0]
- tests/test_smda_features.py:13 test_smda_features[kernel32-64-function=0x1800202B0-api(RtlCaptureContext)-True1]
- tests/test_smda_features.py:13 test_smda_features[al-khaser x64-function=0x14004B4F0-api(__vcrt_GetModuleHandle)-True]
- tests/test_smda_features.py:13 test_smda_features[kernel32-64-function=0x1800017D0-characteristic(peb access)-True]
- tests/test_smda_features.py:13 test_smda_features[kernel32-64-function=0x180001068-characteristic(gs access)-True]
- tests/test_smda_features.py:13 test_smda_features[a1982...-function=0x4014D0-characteristic(cross section flow)-True]
- tests/test_smda_features.py:13 test_smda_features[kernel32-64-function=0x180001068-characteristic(cross section flow)-False]
The failed tests appear to be the result of the following error:
C:\Exclusions\capa>python -m capa.main tests\data\al-khaser_x64.exe_
WARNING:capa:skipping non-.yml file: 11736.py
loading : 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 422/422 [00:00<00:00, 1227.46 rules/s]
Traceback (most recent call last):
File "C:\Program Files\Python37\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Program Files\Python37\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Exclusions\capa\capa\main.py", line 713, in <module>
sys.exit(main())
File "C:\Exclusions\capa\capa\main.py", line 595, in main
extractor = get_extractor(args.sample, args.format, disable_progress=args.quiet)
File "C:\Exclusions\capa\capa\main.py", line 319, in get_extractor
return get_extractor_py3(path, format, disable_progress=disable_progress)
File "C:\Exclusions\capa\capa\main.py", line 308, in get_extractor_py3
smda_report = smda_disasm.disassembleFile(path)
File "smda\Disassembler.py", line 37, in disassembleFile
loader = FileLoader(file_path, map_file=True)
File "smda\utility\FileLoader.py", line 17, in __init__
self._loadFile()
File "smda\utility\FileLoader.py", line 32, in _loadFile
self._base_addr = loader.getBaseAddress(self._raw_data)
File "smda\utility\PeFileLoader.py", line 94, in getBaseAddress
base_addr = struct.unpack("L", binary[pe_offset + 0x30:pe_offset + 0x38])[0]
struct.error: unpack requires a buffer of 4 bytes
Let's address this error and then we should be ready for a merge 🚀
Hehe, seems like I fell into the struct format char size trap - I now changed this to What's still open for discussion:
|
Before we merge, can you please:
|
…Python version < 3.0
|
Great, this looks good to me then. Well done and many thanks again! |
lets do it! |
Hey all!
We propose to integrate SMDA as a disassembler backend into capa to enable full Python 3 support.
SMDA is a lightweight and fast recursive disassembler built on top of capstone, covering PE and ELF files in 32/64bit. It is compatible with both Python 2 and Python 3.
The pull request contains an implementation structurally oriented along the vivisect implementation and provides all capa-features covered by vivisect. We also made sure that the code is in line with the style requirements and has full test coverage using the
tests/fixtures.get_extractor()
method:This initial PR sets SMDA as a default backend for Python 3.
This choice was made to allow you an easier evaluation of the contribution.
As discussed out of band, we don't necessarily ask to make SMDA the default backend for Python 3 but instead propose to allow users choosing their preferred backend through a CLI parameter, which could also be used to pass additional parameters to the disassembler backend (e.g. SMDA can process shellcode / memory dumps and make use of ApiScout for reconstruction of dynamic imports - increasing the potential matches of rules).
This PR was a collaborative effort of @jcrussell and @danielplohmann during GeekWeek 7.