Aegis

LLM-powered vulnerability detection using Code Property Graphs and multi-agent reasoning.

Overview

Aegis is a vulnerability detection system that analyzes C/C++ source code through a four-stage pipeline:

Clue Discovery — LLM scans a target function and identifies suspicious code lines
Context Augmentation — For each clue, traces dataflow via Joern CPG and expands cross-file function calls guided by LLM decisions
Verification — Adversarial Red Team / Blue Team analysis determines whether each clue is a real vulnerability
Audit — A second LLM reviews VULNERABLE verdicts to reduce false positives

Target Function
      │
      ▼
┌─────────────┐    ┌──────────────────────┐    ┌────────────┐    ┌───────┐
│ Clue        │───▶│ Context              │───▶│ Verifier   │───▶│ Audit │
│ Discovery   │    │ Augmentation (CPG)   │    │ (Red/Blue) │    │       │
└─────────────┘    └──────────────────────┘    └────────────┘    └───────┘
   clues.json       clue_{line}.json           clue_{line}.json  clue_{line}.json

Setup

Prerequisites

Python 3.11+
uv for package management
Joern for CPG generation (see below)
An LLM API key (OpenAI, Anthropic, or TensorBlock)

Installation

cd Aegis
uv sync
source .venv/bin/activate

Joern Installation

Aegis uses Joern for static code analysis and CPG (Code Property Graph) generation. Install Joern to $HOME/joern/:

# Download and install Joern
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh
chmod +x joern-install.sh
./joern-install.sh --install-dir=$HOME/joern

# Add to PATH (add to ~/.bashrc or ~/.zshrc for persistence)
export JOERN_HOME="$HOME/joern"
export PATH="$JOERN_HOME:$PATH"

# Verify installation
which joern-parse

JoernClient auto-discovers Joern in the following locations (in order): ~/joern, ~/.local/joern, ~/bin/joern, /opt/joern, /usr/local/joern, or joern-parse in system PATH.

Configuration

cp env.example .env
# Edit .env with your API key and model choice

Example .env:

OPENAI_API_KEY=sk-...
LLM_PROVIDER=tensorblock
LLM_MODEL=tensorblock/deepseek-v3.1
GITHUB_TOKEN=ghp_...

Supported providers: openai, anthropic, tensorblock

Usage

Single Sample Detection (CLI)

aegis detect /path/to/sample/folder
aegis detect /path/to/sample/folder --verbose

The sample folder must contain info.json (with func, file_name, project_url fields) and a cloned repo directory. Results are saved into phase1_clue_discovery/, phase2_context_augmentation/, phase3_verification/, and phase4_audit/ subdirectories.

Batch Processing (Scripts)

Run each pipeline stage independently on a directory of samples:

# Stage 1: Clue discovery
python scripts/run_clue_discovery.py -m tensorblock/deepseek-v3.1 --samples-dir samples/

# Stage 2: Context augmentation (requires Joern)
python scripts/run_context_augmentation.py -m tensorblock/deepseek-v3.1 --samples-dir samples/

# Stage 3: Verification
python scripts/run_verifier.py -m tensorblock/deepseek-v3.1 --samples-dir samples/

# Stage 4: Audit
python scripts/run_audit.py -m tensorblock/deepseek-v3.1 --samples-dir samples/

Common options:

-m MODEL — model identifier (required)
-p PROVIDER — LLM provider (openai, anthropic, tensorblock)
-n LIMIT — process only the first N samples
-k TOP_K — only process the top-k clues by confidence
-mt — enable multithreading
--samples-dir DIR — base samples directory (default: samples/)

Metrics Calculation

After running the pipeline (or using pre-computed results from Zenodo), compute paper metrics:

python scripts/calculate_metrics.py
python scripts/calculate_metrics.py --samples-dir samples/ --max-k 5

This reports for each top-k clue budget (k=1..10):

RQ1: Pair detection outcomes (P-V/P-C/P-R/P-B), accuracy, precision, recall, F1, FPR
RQ2: Context recall (clue recall rate, augmented context recall rate) and per-phase token usage
RQ3: Audit veto correctness and the change in P-C count after audit

Requires scripts/pair_vul2clean.json (vul-to-clean pair mapping) and scripts/changed_lines.json (ground-truth changed lines per sample).

Python API

from aegis.core.orchestrator import Orchestrator
from aegis.config.settings import PipelineConfig

config = PipelineConfig(max_clues=2, max_slice_depth=10, max_expansions=50)
orchestrator = Orchestrator(config=config)

results = orchestrator.analyze(
    func_code="void foo(char *input) { strcpy(buf, input); }",
    file_name="src/foo.c",
    repo_url="https://github.com/example/repo",
    local_repo_path="/path/to/cloned/repo",
    sample_path="/path/to/output/folder",  # saves intermediate results
)

for clue, evidence, verification, audit in results:
    final = audit.final_verdict if audit else verification.verdict
    print(f"Line {clue.line_number}: {final}")

Data & Samples

This repository does not include the samples/ directory. To reproduce results:

Download pre-computed results from Google Drive: https://drive.google.com/drive/folders/13AIff2GXRu8dv9QT28RoCGa6xv4JuDMk?usp=share_link
Download source repositories from the PrimeVul paired test set. Each sample folder should contain the cloned repository.

Sample Folder Structure

samples/
├── 187732/
│   ├── info.json                          # metadata: func, file_name, project_url, etc.
│   ├── qemu/                              # cloned repository
│   ├── phase1_clue_discovery/
│   │   └── clues.json                     # discovered clues with line numbers
│   ├── phase2_context_augmentation/
│   │   ├── clue_2120.json                 # evidence trace per clue
│   │   └── clue_2120_metadata.json
│   ├── phase3_verification/
│   │   └── clue_2120.json                 # verification verdict per clue
│   └── phase4_audit/
│       └── clue_2120.json                 # audit result (VULNERABLE verdicts only)
├── 194963/
│   └── ...

Project Structure

aegis/
├── agents/              # LLM-powered pipeline agents
│   ├── clue_discovery.py
│   ├── context_augmentation.py
│   ├── verifier.py
│   └── audit.py
├── core/
│   └── orchestrator.py  # Pipeline orchestrator
├── config/
│   └── settings.py      # PipelineConfig, Settings
├── models/              # Pydantic data models (Clue, EvidenceTrace, etc.)
├── tools/               # Joern CPG builder, repo cloner, parsers
└── utils/               # LLM client, formatters, logging
scripts/                 # Batch processing scripts

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
aegis		aegis
scripts		scripts
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
env.example		env.example
log		log
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aegis

Overview

Setup

Prerequisites

Installation

Joern Installation

Configuration

Usage

Single Sample Detection (CLI)

Batch Processing (Scripts)

Metrics Calculation

Python API

Data & Samples

Sample Folder Structure

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aegis

Overview

Setup

Prerequisites

Installation

Joern Installation

Configuration

Usage

Single Sample Detection (CLI)

Batch Processing (Scripts)

Metrics Calculation

Python API

Data & Samples

Sample Folder Structure

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages