ThreatPrism

AI-Assisted SOC Analysis with Deterministic Guardrails

ThreatPrism is an AI-assisted SOC analysis pipeline with deterministic guardrails, evidence-first reporting, and multi-source security log ingestion.

What It Is

ThreatPrism is a CLI-first SOC analysis system that ingests Windows EVTX-derived JSONL, AWS CloudTrail logs, and GCP Audit Logs. It normalizes events into a common envelope, applies prompt-injection guardrails, uses an LLM only for structured extraction, validates the output, renders deterministic reports, and persists analysis records to SQLite.

ThreatPrism assists SOC analysts; it does not execute response actions.

Why It Exists

Security analysts often need to triage noisy host and cloud telemetry without losing provenance or over-trusting generated text. ThreatPrism keeps deterministic processing around the model boundary so AI can help summarize and structure evidence while the system preserves traceability, validation, and analyst control.

The tool is designed for analyst augmentation, not autonomous remediation.

Architecture

ThreatPrism uses a linear pipeline:

Source -> Ingest -> Normalize -> Sanitize -> Enrich -> LLM Analyze -> Validate Output -> Report -> Persist

See docs/ARCHITECTURE.md for the full architecture guide.

Supported Inputs

Windows event logs pre-parsed from EVTX to JSONL
AWS CloudTrail JSON or JSONL
AWS CloudTrail CSV after conversion with scripts/aws_csv_to_jsonl.py
GCP Audit Logs JSON or JSONL
Prompt-injection lab datasets under data/redteam/

ThreatPrism processes one source type per run. Mixed-source correlation is a future enhancement.

Processing Pipeline

Detect the source type from CLI flags, file extensions, and schema markers.
Ingest records from the selected source.
Normalize records into a common event envelope with source_file, record_index, and optional event_id.
Sanitize instruction-like content before model analysis.
Enrich cloud logs with deterministic context such as plane tags and identity hints.
Send constrained batches to the selected LLM provider unless --dry-run is used.
Validate model output against schemas, policy rules, and optional semantic checks.
Render the report with Python.
Persist run metadata, findings, hypotheses, IOCs, and report text to SQLite.

Guardrails

LLM inputs and outputs are treated as untrusted.
LLM output is constrained, validated, and policy-checked before reporting.
Pydantic schemas define the structured output contract.
Policy checks block unsafe authority claims and completed-action language.
Prompt firewall logic redacts or quarantines instruction-like strings before LLM analysis.
--dry-run validates ingestion and guardrail behavior without external LLM calls.

Evidence-First Reporting

Reports are deterministic and evidence-first. The LLM does not write the narrative report; Python renders structured findings into a repeatable text format.

Every finding must cite provenance:

source_file
record_index
event_id when available, such as GCP insertId

Observability / AIOps Artifacts

Each run writes operational artifacts under runs/<run_id>/:

run_log.jsonl
metrics.json
what_broke.md on failure
evidence.txt when generated with the evidence helper

Useful helpers:

python scripts/evidence_artifact.py --run-id <run_id>
python scripts/failure_drill_1.py

See RUNBOOK.md and docs/THREATPRISM_AIOPS_CODEX_SPEC.md.

Example Workflow

Validate the bundled Windows sample without calling an LLM:

python -m src.main --input data/evtx_sample --dry-run

Convert the bundled AWS sample CSV and validate ingestion:

python scripts/aws_csv_to_jsonl.py data/sample_cloudtrail.csv data/sample_aws.jsonl
python -m src.main --input data/sample_aws.jsonl --source aws --dry-run

Validate the bundled GCP synthetic mini-lab:

python -m src.main --input data/gcp_synthetic_minilab.jsonl --source gcp --dry-run

Run model-backed analysis only after configuring a real provider key:

python -m src.main --input data/evtx_sample --provider gemini --model gemini-flash-latest
python -m src.main --input data/evtx_sample --provider openai --model gpt-4o

Security Limitations

ThreatPrism does not determine that activity is definitively malicious.
ThreatPrism does not isolate hosts, disable accounts, block network traffic, or modify cloud resources.
LLM extraction can be incomplete or incorrect, so analyst review is required.
AWS and GCP plane tagging is heuristic and conservative.
Binary EVTX parsing is out of scope; EVTX must be converted to JSONL first.
Raw CloudTrail records and sensitive request/response payloads are intentionally not stored in SQLite.
External LLM calls require valid provider credentials and should not be used for local validation unless explicitly intended.

Run Locally

Clone the future public repository URL:

git clone https://github.com/mwill20/threatprism.git
cd threatprism

Create and activate a virtual environment:

python -m venv .venv

PowerShell:

.\.venv\Scripts\Activate.ps1

bash:

source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Optional provider setup for non-dry-run analysis:

cp .env.example .env

Then add GEMINI_API_KEY or OPENAI_API_KEY to .env.

Safe local validation commands:

python -m src.main --input data/evtx_sample --dry-run
python -m pytest
python -m compileall .

Project Status

ThreatPrism is a local CLI project for AI-assisted SOC analysis experiments and demonstrations. Current functionality includes Windows, AWS, and GCP ingestion paths; deterministic reporting; SQLite persistence; prompt-injection guardrails; and run-level observability artifacts.

Planned improvements include a richer analyst UI, multi-source correlation, stronger offline evaluation workflows, and broader detection coverage.

License and Attribution

License: MIT, see LICENSE
EVTX dataset: https://github.com/sbousseaden/EVTX-ATTACK-SAMPLES
AWS dataset: https://www.kaggle.com/datasets/nobukim/aws-cloudtrails-dataset-from-flaws-cloud

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Lessons		Lessons
data		data
docs		docs
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
RUNBOOK.md		RUNBOOK.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ThreatPrism

What It Is

Why It Exists

Architecture

Supported Inputs

Processing Pipeline

Guardrails

Evidence-First Reporting

Observability / AIOps Artifacts

Example Workflow

Security Limitations

Run Locally

Project Status

License and Attribution

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ThreatPrism

What It Is

Why It Exists

Architecture

Supported Inputs

Processing Pipeline

Guardrails

Evidence-First Reporting

Observability / AIOps Artifacts

Example Workflow

Security Limitations

Run Locally

Project Status

License and Attribution

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages