Skip to content

secureai4code/Aegis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aegis

LLM-powered vulnerability detection using Code Property Graphs and multi-agent reasoning.

Overview

Aegis is a vulnerability detection system that analyzes C/C++ source code through a four-stage pipeline:

  1. Clue Discovery — LLM scans a target function and identifies suspicious code lines
  2. Context Augmentation — For each clue, traces dataflow via Joern CPG and expands cross-file function calls guided by LLM decisions
  3. Verification — Adversarial Red Team / Blue Team analysis determines whether each clue is a real vulnerability
  4. Audit — A second LLM reviews VULNERABLE verdicts to reduce false positives
Target Function
      │
      ▼
┌─────────────┐    ┌──────────────────────┐    ┌────────────┐    ┌───────┐
│ Clue        │───▶│ Context              │───▶│ Verifier   │───▶│ Audit │
│ Discovery   │    │ Augmentation (CPG)   │    │ (Red/Blue) │    │       │
└─────────────┘    └──────────────────────┘    └────────────┘    └───────┘
   clues.json       clue_{line}.json           clue_{line}.json  clue_{line}.json

Setup

Prerequisites

  • Python 3.11+
  • uv for package management
  • Joern for CPG generation (see below)
  • An LLM API key (OpenAI, Anthropic, or TensorBlock)

Installation

cd Aegis
uv sync
source .venv/bin/activate

Joern Installation

Aegis uses Joern for static code analysis and CPG (Code Property Graph) generation. Install Joern to $HOME/joern/:

# Download and install Joern
curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh
chmod +x joern-install.sh
./joern-install.sh --install-dir=$HOME/joern

# Add to PATH (add to ~/.bashrc or ~/.zshrc for persistence)
export JOERN_HOME="$HOME/joern"
export PATH="$JOERN_HOME:$PATH"

# Verify installation
which joern-parse

JoernClient auto-discovers Joern in the following locations (in order): ~/joern, ~/.local/joern, ~/bin/joern, /opt/joern, /usr/local/joern, or joern-parse in system PATH.

Configuration

cp env.example .env
# Edit .env with your API key and model choice

Example .env:

OPENAI_API_KEY=sk-...
LLM_PROVIDER=tensorblock
LLM_MODEL=tensorblock/deepseek-v3.1
GITHUB_TOKEN=ghp_...

Supported providers: openai, anthropic, tensorblock

Usage

Single Sample Detection (CLI)

aegis detect /path/to/sample/folder
aegis detect /path/to/sample/folder --verbose

The sample folder must contain info.json (with func, file_name, project_url fields) and a cloned repo directory. Results are saved into phase1_clue_discovery/, phase2_context_augmentation/, phase3_verification/, and phase4_audit/ subdirectories.

Batch Processing (Scripts)

Run each pipeline stage independently on a directory of samples:

# Stage 1: Clue discovery
python scripts/run_clue_discovery.py -m tensorblock/deepseek-v3.1 --samples-dir samples/

# Stage 2: Context augmentation (requires Joern)
python scripts/run_context_augmentation.py -m tensorblock/deepseek-v3.1 --samples-dir samples/

# Stage 3: Verification
python scripts/run_verifier.py -m tensorblock/deepseek-v3.1 --samples-dir samples/

# Stage 4: Audit
python scripts/run_audit.py -m tensorblock/deepseek-v3.1 --samples-dir samples/

Common options:

  • -m MODEL — model identifier (required)
  • -p PROVIDER — LLM provider (openai, anthropic, tensorblock)
  • -n LIMIT — process only the first N samples
  • -k TOP_K — only process the top-k clues by confidence
  • -mt — enable multithreading
  • --samples-dir DIR — base samples directory (default: samples/)

Metrics Calculation

After running the pipeline (or using pre-computed results from Zenodo), compute paper metrics:

python scripts/calculate_metrics.py
python scripts/calculate_metrics.py --samples-dir samples/ --max-k 5

This reports for each top-k clue budget (k=1..10):

  • RQ1: Pair detection outcomes (P-V/P-C/P-R/P-B), accuracy, precision, recall, F1, FPR
  • RQ2: Context recall (clue recall rate, augmented context recall rate) and per-phase token usage
  • RQ3: Audit veto correctness and the change in P-C count after audit

Requires scripts/pair_vul2clean.json (vul-to-clean pair mapping) and scripts/changed_lines.json (ground-truth changed lines per sample).

Python API

from aegis.core.orchestrator import Orchestrator
from aegis.config.settings import PipelineConfig

config = PipelineConfig(max_clues=2, max_slice_depth=10, max_expansions=50)
orchestrator = Orchestrator(config=config)

results = orchestrator.analyze(
    func_code="void foo(char *input) { strcpy(buf, input); }",
    file_name="src/foo.c",
    repo_url="https://github.com/example/repo",
    local_repo_path="/path/to/cloned/repo",
    sample_path="/path/to/output/folder",  # saves intermediate results
)

for clue, evidence, verification, audit in results:
    final = audit.final_verdict if audit else verification.verdict
    print(f"Line {clue.line_number}: {final}")

Data & Samples

This repository does not include the samples/ directory. To reproduce results:

  1. Download pre-computed results from Google Drive: https://drive.google.com/drive/folders/13AIff2GXRu8dv9QT28RoCGa6xv4JuDMk?usp=share_link

  2. Download source repositories from the PrimeVul paired test set. Each sample folder should contain the cloned repository.

Sample Folder Structure

samples/
├── 187732/
│   ├── info.json                          # metadata: func, file_name, project_url, etc.
│   ├── qemu/                              # cloned repository
│   ├── phase1_clue_discovery/
│   │   └── clues.json                     # discovered clues with line numbers
│   ├── phase2_context_augmentation/
│   │   ├── clue_2120.json                 # evidence trace per clue
│   │   └── clue_2120_metadata.json
│   ├── phase3_verification/
│   │   └── clue_2120.json                 # verification verdict per clue
│   └── phase4_audit/
│       └── clue_2120.json                 # audit result (VULNERABLE verdicts only)
├── 194963/
│   └── ...

Project Structure

aegis/
├── agents/              # LLM-powered pipeline agents
│   ├── clue_discovery.py
│   ├── context_augmentation.py
│   ├── verifier.py
│   └── audit.py
├── core/
│   └── orchestrator.py  # Pipeline orchestrator
├── config/
│   └── settings.py      # PipelineConfig, Settings
├── models/              # Pydantic data models (Clue, EvidenceTrace, etc.)
├── tools/               # Joern CPG builder, repo cloner, parsers
└── utils/               # LLM client, formatters, logging
scripts/                 # Batch processing scripts

License

MIT

About

Agent for Vulnerability Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors