Skip to content

hidearmoon/auditlens

Repository files navigation

English | 简体中文

AuditLens

CI PyPI version Python 3.9+ License: Apache-2.0 Coverage GitHub Stars

Compliance-first audit trail for AI/LLM systems. Record decision chains, auto-redact PII, generate EU AI Act / GDPR / SOC2 reports — in under 10 lines of code.


Why AuditLens?

The compliance gap in the AI tooling ecosystem:

EU AI Act enforcement begins August 2026. High-risk AI systems must maintain automatic logs of decisions, inputs, and outputs. GDPR Article 22 requires documented logic for every automated decision. SOC2 demands tamper-evident audit trails. Yet:

  • Langfuse / LangSmith are observability tools — built for debugging, not compliance. No PII redaction, no regulatory report templates.
  • LLM Guard is a security gateway — it filters inputs/outputs but keeps no audit logs.
  • Agent Compliance Layer is the only dedicated compliance tool — but it's closed-source SaaS with no self-hosting option.

AuditLens is the first open-source project that combines LLM decision-chain recording, automatic PII redaction, and compliance report generation into a single Python SDK.

Comparison

Feature AuditLens Langfuse LangSmith LLM Guard
Decision chain recording
PII auto-redaction
EU AI Act report (Art. 12/19)
GDPR Art. 22 report
GDPR Art. 30 report (RoPA)
SOC2 audit trail
Data lineage (DSAR support)
Self-hosted / zero-knowledge
Framework agnostic
Python native
Open source

Quick Start

pip install auditlens
from auditlens import AuditEngine, audit_context

# One-time setup — defaults to SQLite at ./audit.db
engine = AuditEngine()

# Option 1: Decorator — wrap any LLM-calling function
@engine.trace(provider="openai", model="gpt-4o")
def ask_llm(prompt: str) -> str:
    return my_llm_client.complete(prompt)

# Option 2: Context manager — group calls into a session
with audit_context(engine, session_id="user-123", purpose="customer_support") as ctx:
    answer = ask_llm("How do I reset my password?")
    ctx.annotate(decision_type="assisted", confidence_score=0.95)

# Generate a compliance report
from auditlens.reports import EUAIActReportGenerator
from auditlens.storage import create_storage

storage = create_storage("audit.db")
print(EUAIActReportGenerator(storage).to_json(system_name="My AI System"))

Features

🔍 Decision Chain Recording

Every LLM call is recorded with SHA-256 hashes of inputs and outputs, creating a tamper-evident audit trail. Multi-step pipelines are linked under a shared chain_id.

🛡️ PII Detection & Redaction

Built-in regex engine detects emails, phone numbers, SSNs, credit cards, IPs, Chinese ID cards, IBANs, AWS keys, and more. Three redaction strategies:

  • replace[EMAIL]
  • hash[SHA:ab12...]
  • maskj***@example.com

📊 Compliance Reports

Four report templates mapped directly to regulatory articles:

  • EU AI Act Art. 12/19 — usage period, input references, risk events, retention compliance
  • GDPR Art. 22 — automated decision records with algorithm logic, confidence scores, right-to-contest
  • GDPR Art. 30 — Records of Processing Activities (RoPA): purposes, data categories, retention periods
  • SOC2 — tamper-evident hash chain, model change detection, access logs

🔗 Data Lineage Tracking

Answer GDPR Art. 15 Data Subject Access Requests (DSARs): which LLM calls processed a given user's data? Full lineage exported per data_subject_id.

💾 Pluggable Storage

  • SQLite (default, zero config) — indexed queries, suitable for production at moderate scale
  • JSONL — log-pipeline friendly (Fluentd, Logstash, S3)
  • PostgreSQL — planned for v0.2

⌨️ CLI Tools

auditlens stats
auditlens query
auditlens report
auditlens export
auditlens lineage

Installation

pip install auditlens
pip install auditlens[dev]   # with pytest, ruff, mypy

Requirements: Python 3.9+. The only runtime dependency is click.


Usage

AuditEngine Configuration

from auditlens import AuditEngine

engine = AuditEngine(
    storage="sqlite:///audit.db",    # or "audit.jsonl"
    pii_enabled=True,
    pii_method="replace",            # replace | hash | mask
    store_raw_text=True,             # False = hash-only privacy mode
    environment="production",
    application_name="MyApp",
    application_version="1.0.0",
)

Decorator Usage

@engine.trace(provider="openai", model="gpt-4o")
def ask_llm(prompt: str) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    ).choices[0].message.content

Async Support

@engine.trace(provider="anthropic", model="claude-sonnet-4-6")
async def ask_llm_async(prompt: str) -> str:
    return await anthropic_client.messages.create(
        model="claude-sonnet-4-6",
        messages=[{"role": "user", "content": prompt}],
    )

Context Manager — Multi-Step Chain

with audit_context(
    engine,
    session_id="sess-001",
    data_subject_id="user-42",
    purpose="loan_assessment",
    legal_basis="contract",
) as ctx:
    # All @engine.trace calls inside are linked under the same chain_id
    features = extract_features(application)
    decision = assess_risk(features)
    ctx.annotate(human_review_required=True)

Manual Recording

event = engine.record(
    input_text="Summarise this contract.",
    output_text="The contract covers...",
    provider="openai",
    model="gpt-4o",
    data_subject_id="user-42",
    processing_purpose="legal_review",
    legal_basis="legitimate_interest",
    decision_type="automated",
    confidence_score=0.97,
    retention_days=180,
)

Compliance Reports

from auditlens.reports import (
    EUAIActReportGenerator,
    GDPRArticle22ReportGenerator,
    GDPRArticle30ReportGenerator,
    SOC2ReportGenerator,
)
from auditlens.storage import create_storage

storage = create_storage("audit.db")

# EU AI Act Art. 12/19
print(EUAIActReportGenerator(storage).to_json(system_name="My AI"))

# GDPR Art. 22 — automated decision records
print(GDPRArticle22ReportGenerator(storage).to_json(
    controller_name="Acme Corp",
    dpo_contact="dpo@acme.com",
))

# GDPR Art. 30 — Records of Processing Activities
print(GDPRArticle30ReportGenerator(storage).to_csv())   # JSON and CSV supported

# SOC2
print(SOC2ReportGenerator(storage).to_json(organization="Acme Corp"))

CLI Reference

export AUDITLENS_DB=audit.db

# Summary statistics
auditlens stats
auditlens stats --format json

# Query audit events
auditlens query --provider openai --limit 50
auditlens query --session-id sess-123 --format json
auditlens query --start 2025-01-01 --end 2025-12-31

# Generate compliance reports
auditlens report --type eu-ai-act
auditlens report --type gdpr-art22 --controller "Acme Corp"
auditlens report --type gdpr-art30 --format csv --output ropa.csv
auditlens report --type soc2 --org "Acme Corp" --output soc2.json

# Export raw data
auditlens export --format jsonl --output events.jsonl
auditlens export --format csv --output events.csv

# Data lineage — answer DSARs
auditlens lineage --subject-id user-42
auditlens lineage --subject-id user-42 --format json
auditlens lineage --request-id <event-id>
auditlens lineage --chain-id <chain-id>

Data Lineage

from auditlens.lineage import LineageTracker

tracker = LineageTracker(storage)
summary = tracker.get_subject_summary("user-42")
# {
#   "subject_id": "user-42",
#   "total_llm_calls": 47,
#   "providers_used": ["openai", "anthropic"],
#   "processing_purposes": ["support", "analytics"],
#   "data_categories": ["name", "email"],
#   ...
# }

Supported Regulations

Regulation Articles Covered Report Type
EU AI Act Art. 12 (transparency logs), Art. 19 (record-keeping) eu-ai-act
GDPR Art. 22 (automated decisions), Art. 30 (processing records) gdpr-art22, gdpr-art30
SOC 2 CC7 (tamper-evident logs, access audit) soc2
NIST AI RMF GOVERN 1.7, MAP 1.5 (traceability & accountability) lineage + chain logs
ISO 42001 Clause 8.4 (AI system operation records) lineage + chain logs

EU AI Act timeline: Enforcement begins August 2026. Report format follows pre-enforcement technical guidance; minor updates may be needed when implementing acts are published. Early adoption gives you a head start.


PII Detection — Limitations & Scope

AuditLens uses a regex-based pattern-matching engine for PII detection. This is intentional — it keeps the library dependency-free and fast — but comes with well-defined trade-offs.

What the engine covers well

Pattern Example
Email addresses user@example.com
US/international phone numbers +1-800-555-0100, +44 7911 123456
US Social Security Numbers 123-45-6789
Credit card numbers (Visa/MC/Amex/Discover) 4111 1111 1111 1111
IPv4 / IPv6 addresses 192.168.1.1
Chinese ID cards (18-digit) 110101199003077777
UK NIN AB123456C
IBAN GB33BUKB20201555555555
AWS access keys AKIAIOSFODNN7EXAMPLE
API key / secret heuristic api_key=abc123...

Known false positives

Pattern False-positive scenario
PASSPORT ([A-Z]{1,2}\d{6,9}) Software build IDs, license keys
IBAN EU regulation codes with similar structure
IP_ADDRESS Version strings in dotted-quad notation
PHONE Long numeric sequences (order IDs, reference numbers)

Known false negatives

The regex engine cannot detect:

  • Person names ("John Smith", "张伟")
  • Physical addresses in free text
  • Dates of birth in natural language
  • Implicit identifiers (account nicknames, usernames)

Recommended use

Scenario Recommendation
Dev / staging audit log review ✅ Built-in engine is sufficient
Catching structured PII in LLM I/O ✅ Works well with pii_method="replace"
Production compliance gateway (standalone) ⚠️ Supplement with Microsoft Presidio
GDPR Article 17 erasure completeness proof ⚠️ Use data_subject_id lineage tracking

Roadmap: v0.2 will add optional Presidio integration (pip install auditlens[presidio]) for NLP-backed entity recognition.


Architecture

auditlens/
├── core/
│   ├── engine.py        # AuditEngine — central coordinator
│   ├── interceptor.py   # @engine.trace decorator + audit_context() manager
│   ├── models.py        # AuditEvent, DecisionChain, DataLineage
│   └── config.py        # AuditConfig, PIIConfig, StorageConfig
├── pii/
│   ├── detector.py      # PIIDetector — regex scanning
│   ├── redactor.py      # PIIRedactor — replace / hash / mask
│   └── patterns.py      # Built-in PII patterns
├── lineage/
│   └── tracker.py       # LineageTracker — DSAR support
├── storage/
│   ├── base.py          # StorageBackend ABC
│   ├── sqlite.py        # SQLite (default)
│   └── jsonl.py         # JSONL file
├── reports/
│   ├── eu_ai_act.py     # EU AI Act Art. 12/19 report
│   ├── gdpr.py          # GDPR Art. 22 + Art. 30 reports
│   └── soc2.py          # SOC2 audit report
└── cli/
    └── main.py          # Click-based CLI

Development

git clone https://github.com/hidearmoon/auditlens.git
cd auditlens
pip install -e ".[dev]"

pytest
ruff check .
ruff format --check .
mypy auditlens/

Contributing

See CONTRIBUTING.md. All public APIs must have docstrings; new features must include tests (maintain ≥80% coverage). Keep the dependency footprint minimal.


License

Apache 2.0 — see LICENSE.

Built by OpenForge AI — open-source tools for AI safety, observability, and compliance.

About

Compliance-first AI audit SDK — LLM decision chain logging, PII auto-redaction, EU AI Act / GDPR / SOC2 report generation

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages