Skip to content
This repository has been archived by the owner on Feb 11, 2024. It is now read-only.

40 times speed optimization #45

Closed
gwillem opened this issue Jan 17, 2017 · 4 comments
Closed

40 times speed optimization #45

gwillem opened this issue Jan 17, 2017 · 4 comments

Comments

@gwillem
Copy link

gwillem commented Jan 17, 2017

Hi NBS, thanks for your great work!

I found a huge optimization by moving the whitelist hashing out of Yara. My client implementation is 40x faster on a standard Magento 2.0.6 source, while scanning the same stuff:

# time ./phpmalwarefinder -l php /data/all-magento/magento-2.0.6
[...]

real	9m59.357s
user	9m46.948s
sys	0m4.432s

vs

# time mwscan --ruleset nbs /data/all-magento/magento-2.0.6 --deep
Tue Jan 17 15:11:33 2017 Using NBS rules.
Tue Jan 17 15:11:33 2017 Fetching php.yar
Tue Jan 17 15:11:33 2017 Fetching whitelist.yar
Tue Jan 17 15:11:34 2017 Fetching whitelists/drupal.yar
Tue Jan 17 15:11:34 2017 Fetching whitelists/wordpress.yar
Tue Jan 17 15:11:34 2017 Fetching whitelists/symfony.yar
Tue Jan 17 15:11:34 2017 Fetching whitelists/phpmyadmin.yar
Tue Jan 17 15:11:34 2017 Fetching whitelists/magento2.yar
Tue Jan 17 15:11:34 2017 Fetching whitelists/prestashop.yar
Tue Jan 17 15:11:34 2017 Fetching whitelists/custom.yar
Tue Jan 17 15:11:34 2017 Fetching common.yar
Tue Jan 17 15:11:34 2017 Loaded 15 yara rules and 1279 whitelist entries
Tue Jan 17 15:11:48 2017 Finished scanning 41514 files: 76 malware and 25 whitelisted.

real	0m14.652s
user	0m10.116s
sys	0m1.512s

The profit comes from how inefficient Yara handles hashing. You mentioned that in the source already. They have recently improved things in the master branch a bit, but it will take a while before that version ends up in various Linux distributions.

To test mwscan on Ubuntu:

sudo apt install -qy python-pip python-dev gcc
sudo pip install --no-cache-dir --upgrade mwscan
mwscan --help
mwscan --ruleset nbs <path> 

Or CentOS:

 wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
 sudo rpm -ivh epel-release-latest-6.noarch.rpm
  
 sudo yum -y install python-pip python-devel gcc
 sudo pip install --no-cache-dir --upgrade mwscan

Cheers!
Willem

(update: I've published mwscan as package, so you can do just pip install mwscan now)

@gwillem
Copy link
Author

gwillem commented Jan 17, 2017

And extra profit: if you move the whitelist out of Yara, you can (almost for free) whitelist entire Magento/Wordpress/Whatever releases (millions of files). Which is really necessary, because the php-malware-finder produces 76 false positives on a standard Magento install (see above).

@jvoisin
Copy link
Owner

jvoisin commented Jan 18, 2017

I'd like to see benchmarks for the hash caching in yara.

@gwillem
Copy link
Author

gwillem commented Jan 18, 2017

Yes, me too. Another potential optimization is using indexed lookups for hashes which my Python implementation does using set() (currently it seems Yara only support sequential matching).

@mdeous
Copy link
Collaborator

mdeous commented Mar 25, 2017

This has now been upstreamed in YARA, thank you very much for this. If you come near Paris, ping us so we can offer you a beer!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants