Currently we can only diff versions of the same prompt.
This lets you compare two completely different prompts against each other.
What to build:
- Run both prompts against the same test cases
- Side-by-side score table: clarity, specificity, eval score, latency
- Declare a winner with reasoning from LLM judge
- Works even if prompts have no test cases (falls back to analysis scores only)
Files to touch: src/commands/ (new compareCmd.ts)