Skip to content

mysteriousbug/AuditLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ AuditLLM — Automated LLM Safety Evaluation & Red-Team Framework

Team: GHOST//SHELL

Systematic. Standards-Mapped. Audit-Grade.


What is AuditLLM?

AuditLLM is a modular red-teaming framework that systematically evaluates LLM safety across the OWASP LLM Top 10 (2025) risk categories, with findings mapped to NIST AI RMF and MITRE ATLAS. It produces audit-grade evidence reports suitable for enterprise security teams, regulators, and AI governance boards.

The Problem

LLMs are being deployed across Indian BFSI, governance, and healthcare without structured safety evaluation. Current red-teaming is ad hoc, inconsistent, and produces no standardised evidence trail.

The Solution

AuditLLM provides three layers:

  1. Attack Library — 30+ adversarial test cases across 10 OWASP categories
  2. Evaluation Engine — Automated execution + LLM-as-judge scoring
  3. Audit Dashboard — Risk heatmaps, model comparisons, NIST mapping, exportable reports

Quick Start

# 1. Clone and install
cd auditllm
pip install -r requirements.txt

# 2. Set API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."

# 3. Dry run — list all attacks
python core/harness.py --dry-run

# 4. Run full scan
python core/harness.py

# 5. Launch dashboard
streamlit run dashboard.py

Run Options

# Scan specific model
python core/harness.py --model gpt-4o-mini

# Scan specific OWASP category
python core/harness.py --category LLM01

# Filter by severity
python core/harness.py --severity critical

# Custom output path
python core/harness.py --output reports/my_scan.json

Project Structure

auditllm/
├── attacks/
│   └── taxonomy.yaml        # Full attack taxonomy (OWASP-mapped)
├── core/
│   └── harness.py           # Red team execution engine
├── data/
│   └── scan_results.json    # Scan output (generated)
├── reports/                  # Exported reports
├── config.yaml              # Target models & settings
├── dashboard.py             # Streamlit visualisation
├── requirements.txt
└── README.md

Standards Mapping

Framework Coverage
OWASP LLM Top 10 LLM01–LLM10 (all categories)
NIST AI RMF GOVERN, MAP, MEASURE, MANAGE functions
MITRE ATLAS Adversarial ML threat techniques

Key Features

  • Multi-model: Test OpenAI, Anthropic, Groq, or any OpenAI-compatible endpoint
  • LLM-as-Judge: Automated pass/fail scoring with reasoning
  • Multi-turn attacks: Escalation and context manipulation tests
  • Indirect injection: Document-embedded and tool-result injection
  • Audit-grade output: Timestamped evidence, OWASP/NIST mapping, exportable JSON
  • Interactive dashboard: Risk heatmaps, model comparison, severity breakdown

Author

Ananya Aithal Graduate Programme Associate — Information & Cybersecurity Audit Standard Chartered GBS | Quantum ML Researcher, ICTS-TIFR


License

MIT — Built for the ISB Cybersecurity & AI Safety Hackathon 2026

About

A modular, open-source framework that systematically red-teams any LLM endpoint across OWASP LLM Top 10 risk categories and produces audit-grade evidence reports.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages