Two malware datasets are used. Check the links below
We explore 3 TDA techniques: Persistent diagrams, Mapper, and Tomato. In unsupervised learning, TDA techniques generated clusters allowing to pinpoint malware clusters, even identifying zero-day malware behaviors.
For supervised learning, TDA techniques are used as feature extractor and fed to ML techniques such as xgboost, lightgbm, random forest, and decision tree.
We record various metrics:
(1) false positive rate, detection rate, accuracy, precision, f-score, etc.
(2) training time, inference time, training memory usage, inference memory usage, etc.