Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize rule matching by better indexing rule by features #2125

Merged
merged 3 commits into from
Jun 7, 2024

Conversation

williballenthin
Copy link
Collaborator

(continuation of #2080 rebased against master)

Implement the "tighten rule pre-selection" algorithm described here: #2063 (comment)

In summary:

Rather than indexing all features from all rules,
we should pick and index the minimal set (ideally, one) of
features from each rule that must be present for the rule to match.
When we have multiple candidates, pick the feature that is
probably most uncommon and therefore "selective".

This seems to work pretty well. Total evaluations when running against mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around 3x more functions per second (wow wow).

When doing large scale runs, capa is about 25% faster when using the vivisect backend (analysis heavy) or 3x faster when using the upcoming BinExport2 backend (minimal analysis).

closes #2074

Implement the "tighten rule pre-selection" algorithm described here:
#2063 (comment)

In summary:

> Rather than indexing all features from all rules,
> we should pick and index the minimal set (ideally, one) of
> features from each rule that must be present for the rule to match.
> When we have multiple candidates, pick the feature that is
> probably most uncommon and therefore "selective".

This seems to work pretty well. Total evaluations when running against
mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around
3x more functions per second (wow wow).

When doing large scale runs, capa is about 25% faster when using the
vivisect backend (analysis heavy) or 3x faster when using the
upcoming BinExport2 backend (minimal analysis).
@williballenthin williballenthin added enhancement New feature or request performance Related to capa's performance labels Jun 6, 2024
Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤟

Copy link
Collaborator

@mike-hunhoff mike-hunhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Collaborator

@fariss fariss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@williballenthin williballenthin merged commit 76a4a58 into master Jun 7, 2024
27 checks passed
@williballenthin williballenthin deleted the rebase-2080 branch June 7, 2024 03:54
@williballenthin
Copy link
Collaborator Author

thanks for the detailed and constructive reviews along the way @mike-hunhoff @mr-tz @s-ff !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Related to capa's performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

format is a global feature
4 participants