Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash lookup common bytes length prefixes #2128

Open
williballenthin opened this issue Jun 6, 2024 · 0 comments
Open

hash lookup common bytes length prefixes #2128

williballenthin opened this issue Jun 6, 2024 · 0 comments
Labels
performance Related to capa's performance

Comments

@williballenthin
Copy link
Collaborator

Today, we match bytes by doing a prefix search against encountered bytes (up to 0x100 long). Since many sequences of bytes we search for have some structure (well, common length), like a GUID or cryptographic S-Box, we can optimize some of these searches by indexing the bytes by their prefix (for common lengths, like 8, 16, 32, and 64 bytes). Then, when the wanted bytes feature has this same length, we can do if feature in features rather than for bytes in features: if bytes.startswith(feature).

This can also help the rule logic planner, since it can pre-filter more rule when the hashable features are known.

The tradeoff is that we generate N (probably 4-5) more features per bytes feature.

image

Maybe definitely do 16 (the size of a GUID).

8, 256, and 64 also look nice and round (and probably not-domain-specific), so consider those. 9 comes from OpenSSL SHA constants. 171 comes from Tiger S-Boxes.


Against mimikatz with the changes in #2080, we have the following evaluation counts by Bytes feature size:

feature class evaluation count
evaluate.feature.bytes 261,464
evaluate.feature.bytes.171 71,400
evaluate.feature.bytes.64 35,794
evaluate.feature.bytes.256 34,002
evaluate.feature.bytes.16 24,226
evaluate.feature.bytes.9 18,837
evaluate.feature.bytes.128 17,002
evaluate.feature.bytes.8 10,576
evaluate.feature.bytes.56 10,200
evaluate.feature.bytes.28 7,176
evaluate.feature.bytes.48 6,800
evaluate.feature.bytes.32 6,091
evaluate.feature.bytes.7 3,588
evaluate.feature.bytes.5 3,588
evaluate.feature.bytes.20 3,400
evaluate.feature.bytes.72 3,400
evaluate.feature.bytes.121 1,794
evaluate.feature.bytes.40 897
evaluate.feature.bytes.6 897
evaluate.feature.bytes.4 897
evaluate.feature.bytes.12 897
evaluate.feature.bytes.232 2

Indexing the power-of-2 lengths would save about 49% of the scanning evaluations. I'm not sure what this costs in runtime. Will investigate before going deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to capa's performance
Projects
None yet
Development

No branches or pull requests

1 participant