optimize rule matching by better indexing rule by features #2125

williballenthin · 2024-06-06T08:20:40Z

(continuation of #2080 rebased against master)

Implement the "tighten rule pre-selection" algorithm described here: #2063 (comment)

In summary:

Rather than indexing all features from all rules,
we should pick and index the minimal set (ideally, one) of
features from each rule that must be present for the rule to match.
When we have multiple candidates, pick the feature that is
probably most uncommon and therefore "selective".

This seems to work pretty well. Total evaluations when running against mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around 3x more functions per second (wow wow).

When doing large scale runs, capa is about 25% faster when using the vivisect backend (analysis heavy) or 3x faster when using the upcoming BinExport2 backend (minimal analysis).

closes #2074

Implement the "tighten rule pre-selection" algorithm described here: #2063 (comment) In summary: > Rather than indexing all features from all rules, > we should pick and index the minimal set (ideally, one) of > features from each rule that must be present for the rule to match. > When we have multiple candidates, pick the feature that is > probably most uncommon and therefore "selective". This seems to work pretty well. Total evaluations when running against mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around 3x more functions per second (wow wow). When doing large scale runs, capa is about 25% faster when using the vivisect backend (analysis heavy) or 3x faster when using the upcoming BinExport2 backend (minimal analysis).

mr-tz

🤟

mike-hunhoff

🚀

fariss

Thank you!

williballenthin · 2024-06-07T03:55:36Z

thanks for the detailed and constructive reviews along the way @mike-hunhoff @mr-tz @s-ff !

williballenthin added enhancement New feature or request performance Related to capa's performance labels Jun 6, 2024

williballenthin requested review from mr-tz, fariss and mike-hunhoff June 6, 2024 08:20

This was referenced Jun 6, 2024

tighten rule pre-selection #2080

Closed

investigate optimization of rule matching (May, 2024) #2063

Closed

williballenthin added 2 commits June 6, 2024 10:33

rules: add references to existing issues

8e3e225

test_scripts: avoid unsupported logic combinations

0f94a3b

mr-tz approved these changes Jun 6, 2024

View reviewed changes

mike-hunhoff approved these changes Jun 6, 2024

View reviewed changes

fariss approved these changes Jun 6, 2024

View reviewed changes

williballenthin merged commit 76a4a58 into master Jun 7, 2024
27 checks passed

williballenthin deleted the rebase-2080 branch June 7, 2024 03:54

williballenthin mentioned this pull request Jun 7, 2024

plan and release v7.1 #2131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize rule matching by better indexing rule by features #2125

optimize rule matching by better indexing rule by features #2125

williballenthin commented Jun 6, 2024

mr-tz left a comment

mike-hunhoff left a comment

fariss left a comment

williballenthin commented Jun 7, 2024

optimize rule matching by better indexing rule by features #2125

optimize rule matching by better indexing rule by features #2125

Conversation

williballenthin commented Jun 6, 2024

mr-tz left a comment

Choose a reason for hiding this comment

mike-hunhoff left a comment

Choose a reason for hiding this comment

fariss left a comment

Choose a reason for hiding this comment

williballenthin commented Jun 7, 2024