hash lookup common bytes length prefixes #2128

williballenthin · 2024-06-06T08:27:00Z

Today, we match bytes by doing a prefix search against encountered bytes (up to 0x100 long). Since many sequences of bytes we search for have some structure (well, common length), like a GUID or cryptographic S-Box, we can optimize some of these searches by indexing the bytes by their prefix (for common lengths, like 8, 16, 32, and 64 bytes). Then, when the wanted bytes feature has this same length, we can do if feature in features rather than for bytes in features: if bytes.startswith(feature).

This can also help the rule logic planner, since it can pre-filter more rule when the hashable features are known.

The tradeoff is that we generate N (probably 4-5) more features per bytes feature.

Maybe definitely do 16 (the size of a GUID).

8, 256, and 64 also look nice and round (and probably not-domain-specific), so consider those. 9 comes from OpenSSL SHA constants. 171 comes from Tiger S-Boxes.

Against mimikatz with the changes in #2080, we have the following evaluation counts by Bytes feature size:

feature class	evaluation count
evaluate.feature.bytes	261,464
evaluate.feature.bytes.171	71,400
evaluate.feature.bytes.64	35,794
evaluate.feature.bytes.256	34,002
evaluate.feature.bytes.16	24,226
evaluate.feature.bytes.9	18,837
evaluate.feature.bytes.128	17,002
evaluate.feature.bytes.8	10,576
evaluate.feature.bytes.56	10,200
evaluate.feature.bytes.28	7,176
evaluate.feature.bytes.48	6,800
evaluate.feature.bytes.32	6,091
evaluate.feature.bytes.7	3,588
evaluate.feature.bytes.5	3,588
evaluate.feature.bytes.20	3,400
evaluate.feature.bytes.72	3,400
evaluate.feature.bytes.121	1,794
evaluate.feature.bytes.40	897
evaluate.feature.bytes.6	897
evaluate.feature.bytes.4	897
evaluate.feature.bytes.12	897
evaluate.feature.bytes.232	2

Indexing the power-of-2 lengths would save about 49% of the scanning evaluations. I'm not sure what this costs in runtime. Will investigate before going deeper.

The text was updated successfully, but these errors were encountered:

williballenthin mentioned this issue Jun 6, 2024

investigate optimization of rule matching (May, 2024) #2063

Closed

williballenthin added the performance Related to capa's performance label Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hash lookup common bytes length prefixes #2128

hash lookup common bytes length prefixes #2128

williballenthin commented Jun 6, 2024

hash lookup common bytes length prefixes #2128

hash lookup common bytes length prefixes #2128

Comments

williballenthin commented Jun 6, 2024