An AI SOC triage agent for Microsoft Sentinel and Defender XDR, built on the Claude Agent SDK. It does the work of a Tier-1 analyst — pulls the incident, enriches the entities, runs hunting queries, and produces a verdict with a full evidence trail — then hands off to a human analyst with a proposed deep-dive plan.
The agent never closes incidents or takes response actions on its own. It produces verdicts; analysts decide.
Given a Sentinel or Defender XDR incident ID, the agent:
- Pulls the incident envelope, alerts, and entities (Microsoft Graph Security API).
- Enriches entities — IPs/hashes/domains through VirusTotal, GreyNoise, AbuseIPDB, MS Threat Intel; users through Entra ID risky-user score + MFA state; devices through MDE timeline.
- Runs targeted KQL hunts in both Sentinel Log Analytics and Defender XDR Advanced Hunting.
- Returns a strictly-typed verdict (zod-validated): classification, confidence, evidence chain, deep-dive plan, recommended actions.
- Streams every step to the UI so analysts can watch and intervene.
Three example incidents shipped with the repo demonstrate the spectrum:
| Fixture | Truth | Agent verdict | What it shows |
|---|---|---|---|
inc-001-failed-signins-fp |
False positive | FalsePositive 95% |
Failed sign-ins resolved by a self-service password reset. The agent spots the AuditLog row between the failures and the success and clears it. |
inc-002-brute-force-investigate |
True positive | TruePositive 97% · deep dive |
Password spray from a known-bad IP succeeds against a no-MFA service account, followed by SharePoint downloads. The agent independently flags the CA-policy gap and the PII-breach angle. |
inc-003-malware-tp |
True positive | TruePositive 98% · deep dive |
Wacatac trojan with C2 callout. The agent notes Defender's remediation was partial and proposes memory forensics + tenant-wide IOC sweep. |
┌─────────────────────┐ ┌──────────────────┐ ┌────────────────────┐
│ Sentinel automation │──▶ │ /api/ingest/ │──▶ │ Redis queue │
│ → Logic App │ │ sentinel │ │ (BullMQ) │
└─────────────────────┘ └──────────────────┘ └─────────┬──────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Worker service — Claude Agent SDK │
│ • Triage Agent (Sonnet 4.6, read-only tools) │
│ • SOC MCP tools → Microsoft Graph + Log Analytics │
│ • submit_verdict (zod-validated) │
│ • Streams every step to Redis pub/sub + Postgres │
└────────────────────────┬────────────────────────────┘
│
▼
┌────────────────────┐ ┌──────────────────────────────────────────┐
│ Postgres │◀── │ Next.js UI │
│ • incidents │ │ • Queue (verdict pills, status) │
│ • triage_runs │───▶│ • Incident detail (verdict + evidence + │
│ • tool_calls │ │ deep-dive plan + decision panel) │
│ • agent_thoughts │ │ • SSE stream of live agent reasoning │
│ • decisions │ │ • Eval report (accuracy, FN rate) │
└────────────────────┘ └──────────────────────────────────────────┘
This is the fastest way to see the UI. No Postgres, no Redis, no Azure access — the UI reads pre-computed verdicts from replay-output/.
git clone https://github.com/rod-trent/SIEMTriage.git
cd SIEMTriage
npm install
cp .env.example .env # add ANTHROPIC_API_KEY for the replay step
npm run replay -- --all # generates replay-output/*.json
echo "TRIAGE_DEMO_MODE=1" > apps/web/.env.local
npm run dev:web # http://localhost:3000Pages:
/— incident queue/incidents/<id>— verdict, evidence, deep-dive plan, decision panel, live-streaming agent reasoning/eval— accuracy, false-negative rate, confusion matrix, per-case results
For end-to-end with persistence, ingestion, and analyst decisions:
# 1. Bring up Postgres + Redis
docker compose up -d
# 2. Env
cp .env.example .env
# Fill in ANTHROPIC_API_KEY, leave TRIAGE_BACKEND=fixture for now
# 3. Apply schema
cd packages/db && npx drizzle-kit push && cd ../..
# 4. Run the worker in one terminal
npm run worker
# 5. Run the web app in another
npm run dev:web
# 6. Enqueue a fixture incident
curl -X POST http://localhost:3000/api/ingest/sentinel \
-H "content-type: application/json" \
-d '{"id":"inc-001-failed-signins-fp","title":"Failed sign-ins","severity":"Medium","source":"Sentinel","createdTimeUtc":"2026-05-10T14:32:00Z"}'The worker picks it up, runs the agent, streams every step to the incident page in real time, and stops at awaiting_decision for you to click Close or Proceed.
# In .env
TRIAGE_BACKEND=azure
AZURE_TENANT_ID=<your-tenant>
AZURE_LOG_ANALYTICS_WORKSPACE_ID=<workspace-guid>
# Optional threat-intel keys — missing keys mean the source is silently skipped
VIRUSTOTAL_API_KEY=
GREYNOISE_API_KEY=
ABUSEIPDB_API_KEY=Auth: uses DefaultAzureCredential. Locally run az login; in Azure use a managed identity granted:
- Microsoft Graph:
SecurityIncident.Read.All,SecurityAlert.Read.All,ThreatHunting.Read.All,User.Read.All,IdentityRiskyUser.Read.All,UserAuthenticationMethod.Read.All - Log Analytics workspace:
Log Analytics Reader
Sentinel ingestion: in the Sentinel portal create an automation rule that triggers a Logic App on incident creation. The Logic App posts the incident envelope to https://<your-host>/api/ingest/sentinel with header X-Ingest-Secret: <INGEST_SECRET>.
Before letting the agent auto-close anything, replay it against your historical incidents and measure:
TRIAGE_BACKEND=azure npm run eval -- --since 2026-02-01 --until 2026-05-01 --max 200Output:
Total incidents: 187
Overall accuracy: 164/187 (87.7%)
False negatives: 2/64 real TPs (3.1%) ← the number that gates autonomy
No verdict: 1/187 (0.5%)
Mean tool calls: 10.3
Mean duration: 86.7s
=== Auto-close gate ===
Cases meeting auto-close criteria (FP/Benign + !deepDive + conf >= 0.9): 47
…of which were actually TPs: 0 ← MUST be zero before enabling auto-close
The dangerous auto-close count must be zero before any production auto-closure of incidents. The eval CLI exits non-zero when it isn't.
SIEMTriage/
├── packages/
│ ├── shared/ Zod schemas: Incident, Verdict, Evidence, KQL, Enrichment
│ ├── mcp-soc-tools/ SDK MCP server. Backend interface with two impls:
│ │ ├── backend/fixture Reads from packages/mcp-soc-tools/fixtures/
│ │ └── backend/azure Microsoft Graph + Log Analytics + XDR + Entra + MDE
│ ├── agent-core/ Triage Agent (system prompt + submit_verdict gate)
│ ├── db/ Drizzle ORM schema for incidents/runs/tools/decisions
│ └── queue/ BullMQ + Redis pub/sub for SSE event streaming
├── apps/
│ ├── replay/ CLI: run the agent against fixture incidents
│ ├── eval/ CLI: replay against closed incidents + metrics
│ ├── worker/ Long-running service that consumes the queue
│ └── web/ Next.js 15 UI (queue, incident detail, eval)
├── docker-compose.yml Local Postgres + Redis
└── .env.example
- Two-agent split (Triage + Investigation) — the Triage Agent runs fast and cheap on every incident; the Investigation Agent only runs when an analyst clicks "Proceed deep dive." This keeps per-incident cost down when 80% close at triage, and means the analyst is approving "go deeper," not "do whatever."
submit_verdictas the completion signal — the agent doesn't free-text its answer; it calls a tool whose zod schema is the verdict. Schema validation enforces evidence-with-every-claim by construction.tools: []in the agent config strips all built-in tools. The agent can only call SOC MCP tools andsubmit_verdict— no file system, no shell, no surprises.- Read-only at the agent layer. Response actions (isolate host, disable user, revoke sessions) are recommended in the verdict but only executed by an analyst clicking through the UI's decision panel. The audit trail is the analyst's, not the agent's.
- Cite numbers, not adjectives. The system prompt drills this in: "8 failed sign-ins followed by a successful auth at 14:30:08 UTC" beats "had some failures then succeeded." Tier-2 needs to reproduce the evidence.
- Eval gates autonomy. The
dangerousAutoClosesmetric in the eval harness must be zero before enabling any auto-closure path in production.
- Entra SSO —
apps/web/src/lib/auth.tsreturns a dev user from a cookie. Replace with NextAuth + the Azure AD provider for production. Every UI/API surface reads throughgetCurrentUser(), so it's literally one file. - Investigation Agent — the second-phase agent (broader tools, runs after an analyst clicks "Proceed deep dive") isn't implemented yet. The decision panel already calls
getTriageQueue().add("deep_dive", ...)so the wiring is there. - Production deployment — no infra-as-code yet. The intended target is Azure Container Apps for both worker and web, Azure Database for PostgreSQL, Azure Cache for Redis, managed identity for auth.
MIT