LLM-powered vulnerability detection using Code Property Graphs and multi-agent reasoning.
Aegis is a vulnerability detection system that analyzes C/C++ source code through a four-stage pipeline:
- Clue Discovery — LLM scans a target function and identifies suspicious code lines
- Context Augmentation — For each clue, traces dataflow via Joern CPG and expands cross-file function calls guided by LLM decisions
- Verification — Adversarial Red Team / Blue Team analysis determines whether each clue is a real vulnerability
- Audit — A second LLM reviews VULNERABLE verdicts to reduce false positives
Target Function
│
▼
┌─────────────┐ ┌──────────────────────┐ ┌────────────┐ ┌───────┐
│ Clue │───▶│ Context │───▶│ Verifier │───▶│ Audit │
│ Discovery │ │ Augmentation (CPG) │ │ (Red/Blue) │ │ │
└─────────────┘ └──────────────────────┘ └────────────┘ └───────┘
clues.json clue_{line}.json clue_{line}.json clue_{line}.json
- Python 3.11+
- uv for package management
- Joern for CPG generation (see below)
- An LLM API key (OpenAI, Anthropic, or TensorBlock)
cd Aegis
uv sync
source .venv/bin/activateAegis uses Joern for static code analysis and CPG (Code Property Graph) generation. Install Joern to $HOME/joern/:
# Download and install Joern
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh
chmod +x joern-install.sh
./joern-install.sh --install-dir=$HOME/joern
# Add to PATH (add to ~/.bashrc or ~/.zshrc for persistence)
export JOERN_HOME="$HOME/joern"
export PATH="$JOERN_HOME:$PATH"
# Verify installation
which joern-parseJoernClient auto-discovers Joern in the following locations (in order): ~/joern, ~/.local/joern, ~/bin/joern, /opt/joern, /usr/local/joern, or joern-parse in system PATH.
cp env.example .env
# Edit .env with your API key and model choiceExample .env:
OPENAI_API_KEY=sk-...
LLM_PROVIDER=tensorblock
LLM_MODEL=tensorblock/deepseek-v3.1
GITHUB_TOKEN=ghp_...
Supported providers: openai, anthropic, tensorblock
aegis detect /path/to/sample/folder
aegis detect /path/to/sample/folder --verboseThe sample folder must contain info.json (with func, file_name, project_url fields) and a cloned repo directory. Results are saved into phase1_clue_discovery/, phase2_context_augmentation/, phase3_verification/, and phase4_audit/ subdirectories.
Run each pipeline stage independently on a directory of samples:
# Stage 1: Clue discovery
python scripts/run_clue_discovery.py -m tensorblock/deepseek-v3.1 --samples-dir samples/
# Stage 2: Context augmentation (requires Joern)
python scripts/run_context_augmentation.py -m tensorblock/deepseek-v3.1 --samples-dir samples/
# Stage 3: Verification
python scripts/run_verifier.py -m tensorblock/deepseek-v3.1 --samples-dir samples/
# Stage 4: Audit
python scripts/run_audit.py -m tensorblock/deepseek-v3.1 --samples-dir samples/Common options:
-m MODEL— model identifier (required)-p PROVIDER— LLM provider (openai, anthropic, tensorblock)-n LIMIT— process only the first N samples-k TOP_K— only process the top-k clues by confidence-mt— enable multithreading--samples-dir DIR— base samples directory (default:samples/)
After running the pipeline (or using pre-computed results from Zenodo), compute paper metrics:
python scripts/calculate_metrics.py
python scripts/calculate_metrics.py --samples-dir samples/ --max-k 5This reports for each top-k clue budget (k=1..10):
- RQ1: Pair detection outcomes (P-V/P-C/P-R/P-B), accuracy, precision, recall, F1, FPR
- RQ2: Context recall (clue recall rate, augmented context recall rate) and per-phase token usage
- RQ3: Audit veto correctness and the change in P-C count after audit
Requires scripts/pair_vul2clean.json (vul-to-clean pair mapping) and scripts/changed_lines.json (ground-truth changed lines per sample).
from aegis.core.orchestrator import Orchestrator
from aegis.config.settings import PipelineConfig
config = PipelineConfig(max_clues=2, max_slice_depth=10, max_expansions=50)
orchestrator = Orchestrator(config=config)
results = orchestrator.analyze(
func_code="void foo(char *input) { strcpy(buf, input); }",
file_name="src/foo.c",
repo_url="https://github.com/example/repo",
local_repo_path="/path/to/cloned/repo",
sample_path="/path/to/output/folder", # saves intermediate results
)
for clue, evidence, verification, audit in results:
final = audit.final_verdict if audit else verification.verdict
print(f"Line {clue.line_number}: {final}")This repository does not include the samples/ directory. To reproduce results:
-
Download pre-computed results from Google Drive: https://drive.google.com/drive/folders/13AIff2GXRu8dv9QT28RoCGa6xv4JuDMk?usp=share_link
-
Download source repositories from the PrimeVul paired test set. Each sample folder should contain the cloned repository.
samples/
├── 187732/
│ ├── info.json # metadata: func, file_name, project_url, etc.
│ ├── qemu/ # cloned repository
│ ├── phase1_clue_discovery/
│ │ └── clues.json # discovered clues with line numbers
│ ├── phase2_context_augmentation/
│ │ ├── clue_2120.json # evidence trace per clue
│ │ └── clue_2120_metadata.json
│ ├── phase3_verification/
│ │ └── clue_2120.json # verification verdict per clue
│ └── phase4_audit/
│ └── clue_2120.json # audit result (VULNERABLE verdicts only)
├── 194963/
│ └── ...
aegis/
├── agents/ # LLM-powered pipeline agents
│ ├── clue_discovery.py
│ ├── context_augmentation.py
│ ├── verifier.py
│ └── audit.py
├── core/
│ └── orchestrator.py # Pipeline orchestrator
├── config/
│ └── settings.py # PipelineConfig, Settings
├── models/ # Pydantic data models (Clue, EvidenceTrace, etc.)
├── tools/ # Joern CPG builder, repo cloner, parsers
└── utils/ # LLM client, formatters, logging
scripts/ # Batch processing scripts
MIT