Reproducibility artifacts for the paper Evading Provenance-Based ML Detectors with Adversarial System Actions.
Folder | Description |
---|---|
gadget-finder |
Folder containing the code and data to execute the gadget-finder algorithms. |
intrusion-detection-system |
Folder containing the code and data files for IDS execution. |
We will use conda
as the python environment manager. Install the project dependencies from the provng.yml using this command:
conda env update --name provng --file provng.yml
Activate the conda environment before running the experiments by running this command
conda activate provng
Running the gadget finder script:
python gadget-finder.py -i input.csv -p FrequencyDB/SAMPLE_WINDOWS_FREQUENCY_DB.csv -o output/gadgets.txt
SIGL[1]
- sigl
- Driver script for SIGL, which is an Autoencoder based IDS that detects anomalous paths.
- Sample causal paragraphs and feature vectors for Enterprise APT available in sample-enterprise-data directory.
Running the SIGL script:
python sigl.py
ProvDetector[2]
- provdetector
- Driver script for ProvDetector, which is an LOF based IDS that detects anomalous paths.
- Sample causal paragraphs and feature vectors for Enterprise APT available in sample-enterprise-data directory.
Running the ProvDetector script:
python provdetector.py
- S-GAT
- Driver script for S-GAT, which is an GNN based IDS that detects anomalous graph using graph structure and attributes, e.g., node/edge types.
- Run download_sample_supply_chain_data.sh to download and unzip the sample Supply-Chain APT data from Google Drive
- The weighted average F1 score on the provided data with the provided model should be 0.88.
Running the S-GAT script:
python gnnDriver.py gat -if 5 -hf 10 -lr 0.001 -e 20 -n 5 -bs 128 -bi -s
- Prov-GAT
- Driver script for Prov-GAT, which is an GNN based IDS that detects anomalous graph using node and edge attributes on top of features used by S-GAT feature.
- Run download_sample_supply_chain_data.sh to download and unzip the sample Supply-Chain APT data from Google Drive
- The weighted average F1 score on the provided data with the provided model should be 0.95.
Running the Prov-GAT script:
python gnnDriver.py gat -if 768 -hf 10 -lr 0.001 -e 20 -n 5 -bs 128 -bi
- ProvNinja-Graph
- Driver script for ProvNinja-Graph which is an adversarial example generator.
- Run download_sample_supply_chain_data.sh to download and unzip the sample Supply-Chain APT data from Google Drive
- Output will be in directory adversarial_examples.
- The evasion rate should be approximately 168 / 198 true positives for the provided data with the provided models.
Running the ProvNinja-Graph script:
python provninjaGraph.py
@inproceedings{mukherjee2023sec,
title = {Evading Provenance-Based ML Detectors with Adversarial System Actions},
author = {Kunal Mukherjee and Josh Wiedemeier and Tianhao Wang and James Wei and Feng Chen and Muhyun Kim and Murat Kantarcioglu and Kangkook Jee},
year = 2023,
booktitle = {Proceedings of USENIX Security Symposium (SEC)},
series = {USENIX '23}
}
[1] X. Han, X. Yu, T. Pasquier, et al., “Sigl: Securing software installations through deep graph learning,” in
USENIX Security Symposium (SEC), 2021.
[2] Q. Wang, W. U. Hassan, D. Li, et al., “You Are What
You Do: Hunting Stealthy Malware via Data Provenance Analysis,” in Network and Distributed System
Security Symposium (NDSS), Feb. 2020.