The PRD review tool for product managers who don't have time to wait for a senior PM to read their doc.
Paste a PRD. Get a structured 100-point score across 10 dimensions, the single biggest gap, three specific rewrite suggestions, and the questions a reviewer would actually ask. Profile-aware: a startup PRD, a TCS engagement doc, and a Big Tech design doc are scored differently.
This is the production product. The repository also contains experimental modules (task breakdown, assignment, sprint tracking) — see Roadmap for current status.
Two modes for the analyzer:
| Mode | Latency | Cost | What it tells you |
|---|---|---|---|
scan |
<100ms | $0 | What sections exist, what's missing, structural completeness % |
analyze |
~3s | ~$0.03 | Full 100-point score, dimension breakdown, rewrite suggestions, reviewer questions |
Profile-aware scoring for:
startup— falsifiability, lean PRD conventionsit_services— SOW alignment, client-side acceptance, KT planbig_tech— design-doc conventions, alternatives mandatory, OKRsfinancial_services— regulatory considerations (see Compliance Caveat)
Three modes, in order of friction. Pick whichever fits.
pip install -r requirements.txt
python -m eval.runner --mode rules_only --max 3 # see it working in 1 secondScore any PRD with the deterministic regex + heuristic engine (82% band-pass on our 50-fixture eval — see LEADERBOARD.md). Free, offline, private. Useful as a CI gate or a fast pre-check.
from engine.module3_analyzer.prd_analyzer import PRDAnalyzer
analyzer = PRDAnalyzer() # no LLM client
result = analyzer.analyze(open("my_prd.md").read(), mode="rules_only")
print(f"{result['total_score']}/100 — {result['rating']}")# Install Ollama from https://ollama.com, then:
ollama pull llama3.1:8b
export AEGIS_LOCAL_BASE_URL=http://localhost:11434/v1
export AEGIS_LOCAL_MODEL=llama3.1:8b
python -m engine.verify --live # confirms local model respondsSame JSON output shape as the cloud path, but runs entirely on your hardware. Slower (~30-60s per PRD on CPU) but unlimited and fully private.
Pick either of these free providers — no credit card:
# Option A: Groq (recommended — fast, generous daily limit)
echo "GROQ_API_KEY=gsk_xxxxxxxxxxxx" >> .env # get one at console.groq.com
# Option B: OpenRouter (free models like deepseek-chat-v3:free)
echo "OPENROUTER_API_KEY=sk-or-xxxxxxxxxxxx" >> .env # openrouter.ai
python -m engine.verify --liveAegis auto-detects whichever key is present. Set AEGIS_REDACT=1 if you handle
PII or regulated data — see SECURITY.md. For detailed signup
steps and per-provider rate limits, see BUDGET.md.
from engine.ai_engine import AegisEngine
eng = AegisEngine()
# Free, instant: structural scan (no LLM)
scan = eng.scan_prd(open("my_prd.md").read())
print(f"Completeness: {scan['completeness']}%")
print(f"Missing: {scan['missing_sections']}")
# Full analysis (uses whatever's configured: rules, local, or cloud)
result = eng.analyze_prd(open("my_prd.md").read())
print(f"Score: {result['total_score']}/100 — {result['rating']}")
print(f"Critical gap: {result['critical_gap']}")
for tip in result['top_improvements']:
print(f" - {tip}") ┌──────────────────┐
│ PRD text input │
└────────┬─────────┘
│
┌──────────┴──────────┐
│ │
▼ ▼
┌──────────┐ ┌──────────────┐
│ Layer 1 │ │ Layer 1+2+3 │
│ scan() │ │ analyze() │
│ regex │ │ regex → │
│ <100ms │ │ LLM → │
│ free │ │ RAG ground │
└──────────┘ │ ~3s, $0.03 │
└──────┬───────┘
│
▼
┌──────────────┐
│ 100-pt score │
│ 10 dims │
│ rewrites │
│ Q&A │
└──────────────┘
Underneath:
- Module 1 — RAG Knowledge Base. Hybrid BM25 + dense vector + Reciprocal Rank Fusion + cross-encoder re-ranking, over a 99K-word curated PM corpus.
- Module 3 — Analyzer. 10-dimension rubric (problem statement, target user, goals, metrics, solution, risks, stakeholders, open questions, launch, writing). Profile-specific scoring overlays.
- LLMClient. Provider-agnostic (OpenAI / Anthropic / Azure / local OpenAI-compatible). Built-in redaction, audit logging, and local-model fallback. See SECURITY.md.
Every prompt change is gated by an eval set:
# Structural eval (no API key, runs in CI)
pytest eval/test_eval.py -v
# Full LLM eval (10 anchors × 5 perturbations = 50 fixtures)
# Costs $0.00 if you use Groq / OpenRouter / Gemini free tier — see BUDGET.md
python -m eval.runnerSee eval/README.md for acceptance thresholds.
| Module | Status | Notes |
|---|---|---|
| 1 — RAG knowledge base | ✅ Production | 99K-word corpus, hybrid retrieval |
| 2 — PRD generator | ✅ Production | Profile + template aware |
| 3 — PRD analyzer | ✅ Production — the wedge | 100-pt rubric, 50-fixture eval |
| 4 — Task breakdown | 🧪 Experimental | See module4_tasks/EXPERIMENTAL.md |
| 5 — Task assignment | 🧪 Experimental | Greedy bin-packing, naive skill match |
| 6 — Resource planner | 🧪 Experimental | Hardcoded focus factor, magic-number defaults |
| 7 — Sprint tracker | 🧪 Experimental | In-memory state only, no persistence |
Experimental modules are gated behind the env var AEGIS_ENABLE_EXPERIMENTAL=1.
Do not deploy experimental modules to production.
The financial_services profile enriches scoring with finserv-specific
heuristics — it does not make this system finserv-compliant.
In particular, the system as shipped:
- Sends PRD content to a third-party LLM unless redaction is enabled.
- Has no guaranteed data residency.
- Does not produce a regulator-grade audit trail.
- Has not been independently security-reviewed or SOC2-attested.
If you operate under SEC, FINRA, MiFID, GDPR-with-finserv-overlay, or
similar regimes, do not use the LLM-backed analyze path on
material non-public information without:
- Setting
AEGIS_REDACT=1(see SECURITY.md). - Configuring a local model fallback (
AEGIS_LOCAL_BASE_URL). - Independent legal sign-off on the data flow.
For a one-page customer-facing summary aimed at compliance officers, see TRUST.md. For the full technical detail (redaction patterns, audit-log schema, env vars, every guarantee + every non-guarantee), see SECURITY.md.
# Run all tests (no API key required for unit tests)
pytest tests/ eval/test_eval.py -v
# Eval the analyzer end-to-end
python -m eval.runner --scan-only # CI mode, no API
python -m eval.runner # Full mode, costs ~$0.15 with gpt-4o-miniWhen you change a prompt:
- Run the full eval.
- Commit the resulting
eval/results.json. - Diff
band_pass_rateanddim_*counters against the previous baseline. - Don't merge a regression without an explicit reason in the PR.