The repo contains notebook, that are up to this date :
- pagerank on sendmail
- gnn on lid_DS
- on scenario Bruteforce_CWE-307
- on scenario CVE-2014-0160
src/intrusion_detection.
model=gnn: graph classification with PyTorch Geometric.model=pagerank: PageRank-based anomaly detector built from normal syscall patterns.
model=gnn_gcn: weighted GCN baseline.model=gnn_graphsage: GraphSAGE with mean aggregation.model=gnn_gat: GAT with attention over directed weighted syscall transitions.model=gnnwithmodel.architecture=...: direct override path if you want to tweak a preset.
dataset=adfa_ld: expectsdata/ADFA-LD/dataset=lid_ds: expectsdata/LIS-DS/in the current repo layout
Here are some useful command to get our proposed results :
uv run python inspect_graphs.py
uv run python main.py model=gnn_gcn
uv run python main.py model=gnn_graphsage
uv run python main.py model=gnn_gat
uv run python main.py model=pagerank
uv run python main.py model=gnn_gcn report.enabled=true report.experiment_name=adfa_ld_gcn
uv run python main.py model=gnn_graphsage report.enabled=true report.experiment_name=adfa_ld_graphsage
uv run python main.py model=gnn_gat report.enabled=true report.experiment_name=adfa_ld_gat
uv run python main.py dataset=lid_ds model=gnn
uv run python main.py dataset=lid_ds dataset.scenario=CWE-89-SQL-injection model=gnn
uv run python main.py model=gnn model.architecture=gat
uv run python main.py report.enabled=true report.experiment_name=adfa_ld_gnn
uv run python main.py model=pagerank report.enabled=true report.experiment_name=adfa_ld_pagerank
uv run python main.py train.epochs=5Configs live in configs/:
configs/dataset/*.yamlconfigs/model/*.yamlconfigs/config.yaml
Hydra run outputs are written under outputs/.
The pipeline is intentionally local-only: metrics, reports, and checkpoints are stored in Hydra run directories without any external experiment tracker.
- The
lid_dsloader supports both extracted CSV-like traces and the.sctraces stored inside the per-sample.ziparchives currently present underdata/LIS-DS/. - For
dataset=lid_ds, choose a scenario at train time withdataset.scenario=Bruteforce_CWE-307,dataset.scenario=CVE-2014-0160, ordataset.scenario=CWE-89-SQL-injection. The defaultdataset.scenario=allloads all available scenarios. - The default
lid_dssplit policy follows the notebook: it keeps the originaltesttraces and re-splits them stratified into train/validation/test. Override withdataset.split_strategy=predefinedif you want to use the dataset folders as-is. - The current graph statistics suggest using message-passing models that work well on small-to-medium directed attributed graphs:
GCNas a weighted baseline,GraphSAGEfor more robust neighborhood aggregation, andGATwhen transition importance may be heterogeneous across edges. uv run python inspect_graphs.pygeneratesreport /generated_graph_inspection.tex,report /generated_graph_inspection.json, and companion PDF figures for the report.- GNN training logs are printed epoch by epoch in the console.
- With
report.enabled=true, each run updatesreport /generated_results.texandreport /generated_results.json, and GNN runs also export paper-ready PDF curves underreport /figures/.