Skip to content

Commit 986e2e6

Browse files
authored
Merge pull request #1 from mandiant/binja-ci
2 parents 64323b3 + 0d4a92a commit 986e2e6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+327
-304
lines changed

.github/workflows/tests.yml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,37 @@ jobs:
9090
run: pip install -e .[dev]
9191
- name: Run tests
9292
run: pytest -v tests/
93+
94+
binja-tests:
95+
name: Binary Ninja tests for ${{ matrix.python-version }} on ${{ matrix.os }}
96+
runs-on: ubuntu-20.04
97+
strategy:
98+
fail-fast: false
99+
matrix:
100+
python-version: ["3.7", "3.11"]
101+
steps:
102+
- name: Checkout capa with submodules
103+
uses: actions/checkout@ac593985615ec2ede58e132d2e21d2b1cbd6127c # v3.3.0
104+
with:
105+
submodules: recursive
106+
- name: Set up Python ${{ matrix.python-version }}
107+
uses: actions/setup-python@d27e3f3d7c64b4bbf8e4abfb9b63b83e846e0435 # v4.5.0
108+
with:
109+
python-version: ${{ matrix.python-version }}
110+
- name: Install pyyaml
111+
run: sudo apt-get install -y libyaml-dev
112+
- name: Install capa
113+
run: pip install -e .[dev]
114+
- name: install Binary Ninja
115+
env:
116+
BN_SERIAL: ${{ secrets.BN_SERIAL }}
117+
run: |
118+
mkdir ./.github/binja
119+
curl "https://raw.githubusercontent.com/Vector35/binaryninja-api/6812c97/scripts/download_headless.py" -o ./.github/binja/download_headless.py
120+
python ./.github/binja/download_headless.py --serial $BN_SERIAL --output .github/binja/BinaryNinja-headless.zip
121+
unzip .github/binja/BinaryNinja-headless.zip -d .github/binja/
122+
python .github/binja/binaryninja/scripts/install_api.py --install-on-root --silent
123+
- name: Run tests
124+
env:
125+
BN_LICENSE: ${{ secrets.BN_LICENSE }}
126+
run: pytest -v tests/test_binja_features.py # explicitly refer to the binja tests for performance. other tests run above.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,3 +125,4 @@ scripts/perf/*.zip
125125
Pipfile
126126
Pipfile.lock
127127
/cache/
128+
.github/binja/binaryninja

CHANGELOG.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
### Breaking Changes
1010

11-
### New Rules (12)
11+
### New Rules (20)
1212

1313
- persistence/scheduled-tasks/schedule-task-via-at joren485
1414
- data-manipulation/prng/generate-random-numbers-via-rtlgenrandom william.ballenthin@mandiant.com
@@ -22,6 +22,14 @@
2222
- nursery/get-http-request-uri william.ballenthin@mandiant.com
2323
- nursery/create-zip-archive-in-dotnet michael.hunhoff@mandiant.com
2424
- nursery/extract-zip-archive-in-dotnet anushka.virgaonkar@mandiant.com michael.hunhoff@mandiant.com
25+
- data-manipulation/encryption/tea/decrypt-data-using-tea william.ballenthin@mandiant.com raymond.leong@mandiant.com
26+
- data-manipulation/encryption/tea/encrypt-data-using-tea william.ballenthin@mandiant.com raymond.leong@mandiant.com
27+
- data-manipulation/encryption/xtea/encrypt-data-using-xtea raymond.leong@mandiant.com
28+
- data-manipulation/encryption/xxtea/encrypt-data-using-xxtea raymond.leong@mandiant.com
29+
- nursery/hash-data-using-ripemd128 raymond.leong@mandiant.com
30+
- nursery/hash-data-using-ripemd256 raymond.leong@mandiant.com
31+
- nursery/hash-data-using-ripemd320 raymond.leong@mandiant.com
32+
- nursery/set-web-proxy-in-dotnet michael.hunhoff@mandiant.com
2533
-
2634

2735
### Bug Fixes
@@ -30,6 +38,8 @@
3038
- extractor: fix IDA and vivisect string and bytes features overlap and tests #1327 #1336 @xusheng6
3139

3240
### capa explorer IDA Pro plugin
41+
- fix exception when plugin loaded in IDA hosted under idat #1341 @mike-hunhoff
42+
- improve embedded PE detection performance and reduce FP potential #1344 @mike-hunhoff
3343

3444
### Development
3545

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/flare-capa)](https://pypi.org/project/flare-capa)
44
[![Last release](https://img.shields.io/github/v/release/mandiant/capa)](https://github.com/mandiant/capa/releases)
5-
[![Number of rules](https://img.shields.io/badge/rules-781-blue.svg)](https://github.com/mandiant/capa-rules)
5+
[![Number of rules](https://img.shields.io/badge/rules-787-blue.svg)](https://github.com/mandiant/capa-rules)
66
[![CI status](https://github.com/mandiant/capa/workflows/CI/badge.svg)](https://github.com/mandiant/capa/actions?query=workflow%3ACI+event%3Apush+branch%3Amaster)
77
[![Downloads](https://img.shields.io/github/downloads/mandiant/capa/total)](https://github.com/mandiant/capa/releases)
88
[![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE.txt)

capa/engine.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,12 @@ def __init__(self, description=None):
4343
self.description = description
4444

4545
def __str__(self):
46+
name = self.name.lower()
47+
children = ",".join(map(str, self.get_children()))
4648
if self.description:
47-
return "%s(%s = %s)" % (self.name.lower(), ",".join(map(str, self.get_children())), self.description)
49+
return f"{name}({children} = {self.description})"
4850
else:
49-
return "%s(%s)" % (self.name.lower(), ",".join(map(str, self.get_children())))
51+
return f"{name}({children})"
5052

5153
def __repr__(self):
5254
return str(self)
@@ -232,9 +234,9 @@ def evaluate(self, ctx, **kwargs):
232234

233235
def __str__(self):
234236
if self.max == (1 << 64 - 1):
235-
return "range(%s, min=%d, max=infinity)" % (str(self.child), self.min)
237+
return f"range({str(self.child)}, min={self.min}, max=infinity)"
236238
else:
237-
return "range(%s, min=%d, max=%d)" % (str(self.child), self.min, self.max)
239+
return f"range({str(self.child)}, min={self.min}, max={self.max})"
238240

239241

240242
class Subscope(Statement):

capa/features/common.py

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -149,11 +149,11 @@ def get_value_str(self) -> str:
149149
def __str__(self):
150150
if self.value is not None:
151151
if self.description:
152-
return "%s(%s = %s)" % (self.get_name_str(), self.get_value_str(), self.description)
152+
return f"{self.get_name_str()}({self.get_value_str()} = {self.description})"
153153
else:
154-
return "%s(%s)" % (self.get_name_str(), self.get_value_str())
154+
return f"{self.get_name_str()}({self.get_value_str()})"
155155
else:
156-
return "%s" % self.get_name_str()
156+
return f"{self.get_name_str()}"
157157

158158
def __repr__(self):
159159
return str(self)
@@ -242,7 +242,7 @@ def get_value_str(self) -> str:
242242

243243
def __str__(self):
244244
assert isinstance(self.value, str)
245-
return "substring(%s)" % escape_string(self.value)
245+
return f"substring({escape_string(self.value)})"
246246

247247

248248
class _MatchedSubstring(Substring):
@@ -267,11 +267,9 @@ def __init__(self, substring: Substring, matches: Dict[str, Set[Address]]):
267267
self.matches = matches
268268

269269
def __str__(self):
270+
matches = ", ".join(map(lambda s: '"' + s + '"', (self.matches or {}).keys()))
270271
assert isinstance(self.value, str)
271-
return 'substring("%s", matches = %s)' % (
272-
self.value,
273-
", ".join(map(lambda s: '"' + s + '"', (self.matches or {}).keys())),
274-
)
272+
return f'substring("{self.value}", matches = {matches})'
275273

276274

277275
class Regex(String):
@@ -290,7 +288,7 @@ def __init__(self, value: str, description=None):
290288
if value.endswith("/i"):
291289
value = value[: -len("i")]
292290
raise ValueError(
293-
"invalid regular expression: %s it should use Python syntax, try it at https://pythex.org" % value
291+
f"invalid regular expression: {value} it should use Python syntax, try it at https://pythex.org"
294292
) from exc
295293

296294
def evaluate(self, ctx, short_circuit=True):
@@ -336,7 +334,7 @@ def evaluate(self, ctx, short_circuit=True):
336334

337335
def __str__(self):
338336
assert isinstance(self.value, str)
339-
return "regex(string =~ %s)" % self.value
337+
return f"regex(string =~ {self.value})"
340338

341339

342340
class _MatchedRegex(Regex):
@@ -361,11 +359,9 @@ def __init__(self, regex: Regex, matches: Dict[str, Set[Address]]):
361359
self.matches = matches
362360

363361
def __str__(self):
362+
matches = ", ".join(map(lambda s: '"' + s + '"', (self.matches or {}).keys()))
364363
assert isinstance(self.value, str)
365-
return "regex(string =~ %s, matches = %s)" % (
366-
self.value,
367-
", ".join(map(lambda s: '"' + s + '"', (self.matches or {}).keys())),
368-
)
364+
return f"regex(string =~ {self.value}, matches = {matches})"
369365

370366

371367
class StringFactory:

capa/features/extractors/elf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,14 +121,14 @@ def _parse(self):
121121
elif ei_class == 2:
122122
self.bitness = 64
123123
else:
124-
raise CorruptElfFile("invalid ei_class: 0x%02x" % ei_class)
124+
raise CorruptElfFile(f"invalid ei_class: 0x{ei_class:02x}")
125125

126126
if ei_data == 1:
127127
self.endian = "<"
128128
elif ei_data == 2:
129129
self.endian = ">"
130130
else:
131-
raise CorruptElfFile("not an ELF file: invalid ei_data: 0x%02x" % ei_data)
131+
raise CorruptElfFile(f"not an ELF file: invalid ei_data: 0x{ei_data:02x}")
132132

133133
if self.bitness == 32:
134134
e_phoff, e_shoff = struct.unpack_from(self.endian + "II", self.file_header, 0x1C)

capa/features/extractors/helpers.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,15 +55,15 @@ def generate_symbols(dll: str, symbol: str) -> Iterator[str]:
5555
dll = dll.lower()
5656

5757
# kernel32.CreateFileA
58-
yield "%s.%s" % (dll, symbol)
58+
yield f"{dll}.{symbol}"
5959

6060
if not is_ordinal(symbol):
6161
# CreateFileA
6262
yield symbol
6363

6464
if is_aw_function(symbol):
6565
# kernel32.CreateFile
66-
yield "%s.%s" % (dll, symbol[:-1])
66+
yield f"{dll}.{symbol[:-1]}"
6767

6868
if not is_ordinal(symbol):
6969
# CreateFile

capa/features/extractors/ida/basicblock.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ def get_printable_len(op: idaapi.op_t) -> int:
3434
elif op.dtype == idaapi.dt_qword:
3535
chars = struct.pack("<Q", op_val)
3636
else:
37-
raise ValueError("Unhandled operand data type 0x%x." % op.dtype)
37+
raise ValueError(f"Unhandled operand data type 0x{op.dtype:x}.")
3838

3939
def is_printable_ascii(chars_: bytes):
4040
return all(c < 127 and chr(c) in string.printable for c in chars_)

capa/features/extractors/ida/file.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,14 @@
2121
from capa.features.common import FORMAT_PE, FORMAT_ELF, Format, String, Feature, Characteristic
2222
from capa.features.address import NO_ADDRESS, Address, FileOffsetAddress, AbsoluteVirtualAddress
2323

24+
MAX_OFFSET_PE_AFTER_MZ = 0x200
25+
2426

2527
def check_segment_for_pe(seg: idaapi.segment_t) -> Iterator[Tuple[int, int]]:
2628
"""check segment for embedded PE
2729
2830
adapted for IDA from:
29-
https://github.com/vivisect/vivisect/blob/7be4037b1cecc4551b397f840405a1fc606f9b53/PE/carve.py#L19
31+
https://github.com/vivisect/vivisect/blob/91e8419a861f49779f18316f155311967e696836/PE/carve.py#L25
3032
"""
3133
seg_max = seg.end_ea
3234
mz_xor = [
@@ -40,30 +42,32 @@ def check_segment_for_pe(seg: idaapi.segment_t) -> Iterator[Tuple[int, int]]:
4042

4143
todo = []
4244
for mzx, pex, i in mz_xor:
45+
# find all segment offsets containing XOR'd "MZ" bytes
4346
for off in capa.features.extractors.ida.helpers.find_byte_sequence(seg.start_ea, seg.end_ea, mzx):
4447
todo.append((off, mzx, pex, i))
4548

4649
while len(todo):
4750
off, mzx, pex, i = todo.pop()
4851

49-
# The MZ header has one field we will check e_lfanew is at 0x3c
52+
# MZ header has one field we will check e_lfanew is at 0x3c
5053
e_lfanew = off + 0x3C
5154

5255
if seg_max < (e_lfanew + 4):
5356
continue
5457

5558
newoff = struct.unpack("<I", capa.features.extractors.helpers.xor_static(idc.get_bytes(e_lfanew, 4), i))[0]
5659

60+
# assume XOR'd "PE" bytes exist within threshold
61+
if newoff > MAX_OFFSET_PE_AFTER_MZ:
62+
continue
63+
5764
peoff = off + newoff
5865
if seg_max < (peoff + 2):
5966
continue
6067

6168
if idc.get_bytes(peoff, 2) == pex:
6269
yield off, i
6370

64-
for nextres in capa.features.extractors.ida.helpers.find_byte_sequence(off + 1, seg.end_ea, mzx):
65-
todo.append((nextres, mzx, pex, i))
66-
6771

6872
def extract_file_embedded_pe() -> Iterator[Tuple[Feature, Address]]:
6973
"""extract embedded PE features
@@ -102,13 +106,13 @@ def extract_file_import_names() -> Iterator[Tuple[Feature, Address]]:
102106
for name in capa.features.extractors.helpers.generate_symbols(info[0], info[1]):
103107
yield Import(name), addr
104108
dll = info[0]
105-
symbol = "#%d" % (info[2])
109+
symbol = f"#{info[2]}"
106110
elif info[1]:
107111
dll = info[0]
108112
symbol = info[1]
109113
elif info[2]:
110114
dll = info[0]
111-
symbol = "#%d" % (info[2])
115+
symbol = f"#{info[2]}"
112116
else:
113117
continue
114118

@@ -176,7 +180,7 @@ def extract_file_format() -> Iterator[Tuple[Feature, Address]]:
176180
# no file type to return when processing a binary file, but we want to continue processing
177181
return
178182
else:
179-
raise NotImplementedError("unexpected file format: %d" % file_info.filetype)
183+
raise NotImplementedError(f"unexpected file format: {file_info.filetype}")
180184

181185

182186
def extract_features() -> Iterator[Tuple[Feature, Address]]:

capa/features/extractors/ida/helpers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def find_byte_sequence(start: int, end: int, seq: bytes) -> Iterator[int]:
2525
end: max virtual address
2626
seq: bytes to search e.g. b"\x01\x03"
2727
"""
28-
seqstr = " ".join(["%02x" % b for b in seq])
28+
seqstr = " ".join([f"{b:02x}" for b in seq])
2929
while True:
3030
# TODO find_binary: Deprecated. Please use ida_bytes.bin_search() instead.
3131
ea = idaapi.find_binary(start, end, seqstr, 0, idaapi.SEARCH_DOWN)

capa/features/extractors/pefile.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def extract_file_import_names(pe, **kwargs):
6464

6565
for imp in dll.imports:
6666
if imp.import_by_ordinal:
67-
impname = "#%s" % imp.ordinal
67+
impname = f"#{imp.ordinal}"
6868
else:
6969
try:
7070
impname = imp.name.partition(b"\x00")[0].decode("ascii")

capa/features/extractors/viv/basicblock.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ def get_printable_len(oper: envi.archs.i386.disasm.i386ImmOper) -> int:
121121
elif oper.tsize == 8:
122122
chars = struct.pack("<Q", oper.imm)
123123
else:
124-
raise ValueError("unexpected oper.tsize: %d" % (oper.tsize))
124+
raise ValueError(f"unexpected oper.tsize: {oper.tsize}")
125125

126126
if is_printable_ascii(chars):
127127
return oper.tsize

capa/features/extractors/viv/file.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ def extract_file_import_names(vw, **kwargs) -> Iterator[Tuple[Feature, Address]]
4444
modname, impname = tinfo.split(".", 1)
4545
if is_viv_ord_impname(impname):
4646
# replace ord prefix with #
47-
impname = "#%s" % impname[len("ord") :]
47+
impname = "#" + impname[len("ord") :]
4848

4949
addr = AbsoluteVirtualAddress(va)
5050
for name in capa.features.extractors.helpers.generate_symbols(modname, impname):

capa/features/freeze/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,7 @@ def loads(s: str) -> capa.features.extractors.base_extractor.FeatureExtractor:
329329

330330
freeze = Freeze.parse_raw(s)
331331
if freeze.version != 2:
332-
raise ValueError("unsupported freeze format version: %d", freeze.version)
332+
raise ValueError(f"unsupported freeze format version: {freeze.version}")
333333

334334
return null.NullFeatureExtractor(
335335
base_address=freeze.base_address.to_capa(),

capa/features/insn.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@
1515
def hex(n: int) -> str:
1616
"""render the given number using upper case hex, like: 0x123ABC"""
1717
if n < 0:
18-
return "-0x%X" % (-n)
18+
return f"-0x{(-n):X}"
1919
else:
20-
return "0x%X" % n
20+
return f"0x{(n):X}"
2121

2222

2323
class API(Feature):
@@ -31,7 +31,7 @@ def __init__(self, value: str, access: Optional[str] = None, description: Option
3131
super().__init__(value, description=description)
3232
if access is not None:
3333
if access not in VALID_FEATURE_ACCESS:
34-
raise ValueError("%s access type %s not valid" % (self.name, access))
34+
raise ValueError(f"{self.name} access type {access} not valid")
3535
self.access = access
3636

3737
def __hash__(self):
@@ -105,7 +105,7 @@ def __eq__(self, other):
105105

106106
class OperandNumber(_Operand):
107107
# cached names so we don't do extra string formatting every ctor
108-
NAMES = ["operand[%d].number" % i for i in range(MAX_OPERAND_COUNT)]
108+
NAMES = [f"operand[{i}].number" for i in range(MAX_OPERAND_COUNT)]
109109

110110
# operand[i].number: 0x12
111111
def __init__(self, index: int, value: int, description=None):
@@ -119,7 +119,7 @@ def get_value_str(self) -> str:
119119

120120
class OperandOffset(_Operand):
121121
# cached names so we don't do extra string formatting every ctor
122-
NAMES = ["operand[%d].offset" % i for i in range(MAX_OPERAND_COUNT)]
122+
NAMES = [f"operand[{i}].offset" for i in range(MAX_OPERAND_COUNT)]
123123

124124
# operand[i].offset: 0x12
125125
def __init__(self, index: int, value: int, description=None):

0 commit comments

Comments
 (0)