Releases: nuclide-research/ai-llm-redteam-operator
Releases · nuclide-research/ai-llm-redteam-operator
v0.2.0 - Agentic execution layer
First functional release as a two-stage agentic workflow: plan writes a Scenario Packet (the policy), run executes it against one authorized target as a sense-plan-act loop and returns an evidence-ledger Run Report.
Highlights
- New
runsubcommand andagent.py-RedTeamAgentruns the packet: recon pre-pass, signal evaluation, attack-chain walking that advances a step only on confirmation, and a Run Report (findings + chain outcomes + evidence ledger).planis unchanged and remains the default, so the barecategory Xform still works. - Four default-safe gates, each lifted only by an explicit flag:
- Authorization: no
--authorizereference, no send. - Dry-run by default: plans every request, sends nothing. A dry-run can never produce a confirmed finding.
- Single-host scope: every request is the target's scheme and host with a packet path appended. Redirects are captured, not followed, so a probe cannot walk the agent off-target.
- Two independent probe gates: a noise cap (
--max-aggressiveness) for reads and a separate mutation gate (--allow-writes) that blocks every write method until set.
- Authorization: no
- Evidence-backed findings only - a hypothesis is confirmed solely when a sent observation carries a matching status, header, or body token. Restraint is enforced in the loop: one proof artifact per step, a byte cap on every response sample, a global request budget.
- Optional LLM strategist (OpenAI-compatible endpoint via urllib, no SDK) ranks which chain to pursue first. Off unless an endpoint is supplied; the report records the data egress and warns on remote or plaintext endpoints.
- Standard library only, including the LLM path.
Security hardening
This release shipped after an adversarial multi-lens review (19 verified findings, 0 uncertain). Notable fixes:
- Critical: urllib's default opener auto-followed 3xx redirects off the target host, defeating the single-host scope lock and corrupting the evidence ledger. Fixed with a no-follow opener plus a final-URL assertion; scheme-relative and absolute packet paths are now rejected.
- Signal evaluator: single-direction path scoping, prefer-2xx evidence selection, confirmation gated on 2xx-with-body or a token on a 2xx response, punctuation-safe path extraction.
- Findings are evaluated for every test case independent of chain walking, so a confirmable exposure is never dropped when a chain stalls early.
- Mutation gate made independent of the noise cap (write probes rated
mediumwould otherwise have fired on the cap alone);_agg_rankfails safe on unknown labels.
Install
pip install -e .
ai-llm-redteam-operator plan platform LiteLLM
ai-llm-redteam-operator run platform LiteLLM --target https://10.0.0.5:4000 --authorize ENG-2026-014 # dry-runAuthorized assessment tooling. The agent performs network activity by design and is gated accordingly. Every scenario assumes explicit, written authorization for the target in scope.