Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading rules takes a while #306

Closed
williballenthin opened this issue Sep 11, 2020 · 7 comments · Fixed by #323
Closed

loading rules takes a while #306

williballenthin opened this issue Sep 11, 2020 · 7 comments · Fixed by #323
Assignees
Labels
enhancement New feature or request

Comments

@williballenthin
Copy link
Collaborator

loading the ruleset of 300+ rules from their yaml files takes around 5s on my laptop. it's minorly annoying when I'm doing analysis, and also prevents me from running capa iteratively against a large corpus of samples.

we should profile the rule loading logic and see if we can make this much faster. running capa probably involves reading and parsing 300+ python files, but this happens within a few ms. so why is rule loading so much slower?

@williballenthin williballenthin added the enhancement New feature or request label Sep 11, 2020
@williballenthin
Copy link
Collaborator Author

williballenthin commented Sep 11, 2020

looks like yaml parsing is known to be slow:

https://stackoverflow.com/questions/27743711/can-i-speedup-yaml
https://stackoverflow.com/questions/47715566/cannot-load-cloader-with-pyyaml/62543781#62543781

definitely looks like we want to use the CLoader when possible. probably want to avoid using the ruamel parser except when doing linting/formatting. hopefully we can embed the shared object in the pyinstaller distribution.

@williballenthin
Copy link
Collaborator Author

image
7 seconds just to load the rules!

@williballenthin
Copy link
Collaborator Author

williballenthin commented Sep 12, 2020

flamegraph generated using https://github.com/benfred/py-spy

profile.zip

% of runtime phase
51% loading rules
9% indexing rules
10% analyzing binary
26% matching rules

image

@williballenthin
Copy link
Collaborator Author

williballenthin commented Sep 12, 2020

sudo apt install libyaml-dev
pip --no-cache-dir install --verbose --force-reinstall -I pyyaml

@williballenthin
Copy link
Collaborator Author

williballenthin commented Sep 12, 2020

PyYAML CLoader doesn't like our description syntax:

image
image

...and as i look at this, i'm not sure i quite understand how this should be parsed, either. hmm...

@williballenthin
Copy link
Collaborator Author

wow, if i strip the descriptions out, then PyYAML.CLoader loads the rules in around 0.3 seconds:

image

@williballenthin
Copy link
Collaborator Author

need to figure out #312 before we continue here.

should prefer to use PyYAML.CLoader when possible. hopefully we can embed this within pyinstaller binaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant