🎯 Waza Platform Roadmap - Tracking Issue

> _Migrated from [spboyer/waza#66](https://github.com/spboyer/waza/issues/66)_

# Waza Skills Development Platform

This is the tracking issue for the waza platform implementation based on the <a>PRD</a> and <a href="https://github.com/spboyer/azure-mcp-v-skills/blob/main/squad-proposal.md">Squad Proposal</a>.

## Primary Phase (Core Features)

### E1: Go CLI Foundation (P0)
- [x] #24 - `waza run` command
- [x] #25 - `waza init` command
- [x] #26 - `waza generate` command
- [x] #27 - `waza compare` command
- [x] #28 - All 8 grader types
- [x] #29 - Copilot SDK executor
- [x] #30 - Verbose mode
- [x] #31 - Transcript logging
- [x] #16 - JSON-RPC server for IDE integration
- [x] #21 - Session event logging and viewer

### E2: Sensei Engine (P0)
- [x] #32 - `waza dev` command (Sensei loop)
- [x] #33 - Compliance scoring system
- [x] #34 - Improvement suggestions engine
- [x] #35 - Target score option
- [x] #36 - Trigger accuracy tests
- [x] #37 - `--skip-integration` flag
- [x] #38 - `--fast` flag

### E3: Evaluation Framework (P0)
- [x] #39 - Multiple model execution
- [x] #40 - Task completion metrics
- [x] #41 - Trigger accuracy metrics
- [x] #42 - Behavior quality metrics
- [x] #43 - Trials for statistical confidence
- [x] #44 - LLM-powered suggestions
- [x] #45 - Parallel task execution
- [x] #46 - Task filtering

### E4: Token Management (P1)
- [x] #47 - `waza tokens count`
- [x] #48 - `waza tokens check`
- [x] #49 - `--strict` mode
- [x] #50 - `waza tokens suggest`
- [x] #51 - `waza tokens compare`
- [x] #80 - BPE token counter

### E5: Waza Skill (P1)
- [x] #52 - SKILL.md for microsoft/skills
- [x] #53 - Guided requirements gathering
- [x] #54 - Conversational readiness check
- [x] #55 - Result interpretation
- [x] #56 - CLI command invocation

## Secondary Phase (Integration & Extensions)

### E6: CI/CD Integration (P1)
- [x] #57 - GitHub Actions workflow template
- [x] #58 - CI exit codes
- [x] #59 - GitHub PR comment reporter
- [x] #60 - microsoft/skills CI compatibility
- [x] #61 - Evaluation result caching

### E7: AZD Extension (P2)
- [x] #62 - Package as AZD extension
- [x] #63 - `azd waza` commands
- [x] #64 - IntelliSense metadata
- [x] #65 - azure.yaml integration (closed — shipped via extension.yaml)

### E8: Getting Started Experience (P1)
- [x] #10 - Getting Started Experience umbrella
- [x] #168 - Getting started documentation
- [x] #169 - Redesign waza init
- [x] #170 - waza new skill scaffolding
- [x] #171 - Retrofit CLI commands for workspace awareness
- [x] #172 - internal/workspace package

---

### E3: Evaluation Framework — Azure ML Evaluator Integration
- [x] #104 - Implement prompt (LLM-as-judge) grader
- [x] #105 - Implement action_sequence grader
- [x] #106 - Port Azure ML tool_call evaluation rubrics
- [x] #107 - Port Azure ML task evaluation rubrics
- [x] #108 - Create example eval YAMLs using new graders
- [x] #109 - Document prompt and action_sequence grader types
- [x] #138 - Multi-model evaluation with recommendation engine

### E3: Multi-Skill Evaluation Support
- [x] #142 - Wire skill_directories from eval YAML to Copilot SDK
- [x] #143 - Add required_skills preflight validation
- [x] #144 - Add skill_invocation grader for asserting dependent skill usage

### E3: A/B Skill Impact Measurement
- [x] #194 - `--baseline` flag for A/B skill impact comparison

### E3: Extended Evaluation Features
- [x] #98 - Behavior grader
- [x] #99 - Diff grader for workspace changes
- [x] #184 - Retry/attempts mechanism
- [x] #185 - Lifecycle hooks
- [x] #186 - Template variable support
- [x] #187 - CSV dataset support
- [x] #188 - Result groupBy/categorization
- [x] #189 - Custom inputs

### E9: Competitive Positioning (P1)
- [ ] #195 - Multi-agent engine support (assigned: richardpark-msft)

### E10: Web UI (P2)
- [x] #14 - Web UI + Dashboard ([competitive analysis](docs/research/web-ui-competitive-analysis.md))
- [x] #201 - Scaffold React 19 + Vite + Tailwind CSS v4 (PR #212)
- [x] #202 - Dashboard shell layout with DevEx-style dark theme (PR #215)
- [x] #203 - HTTP web server for waza serve (PR #211)
- [x] #204 - Phase 1 REST API endpoints (PR #210)
- [x] #205 - KPI summary cards component (PR #214)
- [x] #206 - Recent Runs sortable table (PR #216)
- [x] #207 - Run Detail drill-down view (PR #217)
- [x] #208 - Playwright E2E test infrastructure (PR #218)

## v0.8.0: Advanced Features & MCP Integration (Shipped)

### E11: MCP Server Integration (P0)
- [x] #286 - Always-on waza serve with MCP transport
- [x] #316 - MCP scoring validators integration
- [x] #289 - 10 MCP tools (run, init, generate, compare, dev, tokens, serve, new, init-task, help)

### E12: LLM-Powered Intelligence (P0)
- [x] #287 - `waza suggest` command for AI-powered eval recommendations
- [x] #309 - `--judge-model` flag for separate judge LLM configuration

### E13: Advanced Skill Development (P1)
- [x] #288 - Interactive skill for workflow orchestration
- [x] #319 - Auto-generate trigger tests from skill triggers
- [x] #311 - Skill profile with static token analysis

### E14: Evaluation Enhancements (P1)
- [x] #299 - Grader weighting for weighted composite scores
- [x] #308 - Statistical confidence intervals via bootstrap
- [x] #317 - Batch processing with `waza dev` multi-skill support
- [x] #318 - Token budget enforcement with strict comparison mode

### E15: Compliance & Validation (P2)
- [x] #314 - agentskills.io specification compliance checks
- [x] #315 - SkillsBench 5 advisory checks
- [x] #312 - JUnit XML reporter for CI pipeline integration


## v0.9.0: A/B Testing, Discovery & Competitive Features

### E16: A/B Testing & Comparative Evaluation (P0)
- [x] #307 - A/B baseline testing (--baseline flag)
- [x] #310 - Pairwise LLM judging with bias mitigation
- [x] #391 - Tool constraint assertions (expect_tools / reject_tools)
- [x] #392 - Auto skill discovery (--discover flag)

### E17: Documentation & Site (P1)
- [x] #383 - Releases page on GitHub Pages site
- [x] #381 - Convert ASCII diagrams to Mermaid

### E18: Eval & Grader Registry (P2) — Design Complete
- [ ] #385 - Eval & Grader Registry design doc (parent epic)
- [ ] #386 - Map OpenAI Evals YAML format to waza graders
- [ ] #387 - Go-module-style grader/eval references
- [ ] #388 - Registry backend evaluation
- [ ] #389 - Composable eval construction
- [ ] #390 - Grader plugin extensibility

## Summary

| Epic | Total | Done | Open | Priority |
|------|-------|------|------|----------|
| E1: Go CLI Foundation | 10 | 9 | 1 | P0 |
| E2: Sensei Engine | 7 | 5 | 2 | P0 |
| E3: Evaluation Framework | 23 | 23 | 0 | P0 |
| E4: Token Management | 6 | 6 | 0 | P1 |
| E5: Waza Skill | 5 | 5 | 0 | P1 |
| E6: CI/CD Integration | 5 | 5 | 0 | P1 |
| E7: AZD Extension | 4 | 4 | 0 | P2 |
| E8: Getting Started | 6 | 6 | 0 | P1 |
| E9: Competitive Positioning | 1 | 0 | 1 | P1 |
| E10: Web UI | 9 | 9 | 0 | P2 |
| E11: MCP Server | 3 | 3 | 0 | P0 |
| E12: LLM Intelligence | 2 | 2 | 0 | P0 |
| E13: Skill Development | 3 | 3 | 0 | P1 |
| E14: Evaluation Enhancements | 4 | 4 | 0 | P1 |
| E15: Compliance & Validation | 3 | 3 | 0 | P2 |
| E16: A/B Testing | 4 | 4 | 0 | P0 |
| E17: Documentation & Site | 2 | 2 | 0 | P1 |
| E18: Eval Registry | 6 | 0 | 6 | P2 |
| **Total** | **103** | **93** | **10** | |

## Related Documents
- <a>PRD</a>
- <a href="https://github.com/spboyer/azure-mcp-v-skills/blob/main/squad-proposal.md">Squad Proposal</a>
- [Web UI Competitive Analysis](docs/research/web-ui-competitive-analysis.md)
- [SkillsBench Competitive Analysis](docs/research/skillsbench-competitive-analysis.md)
- [Positioning Strategy](docs/research/positioning-strategy.md)

Epic	Total	Done	Open	Priority
E1: Go CLI Foundation	10	9	1	P0
E2: Sensei Engine	7	5	2	P0
E3: Evaluation Framework	23	23	0	P0
E4: Token Management	6	6	0	P1
E5: Waza Skill	5	5	0	P1
E6: CI/CD Integration	5	5	0	P1
E7: AZD Extension	4	4	0	P2
E8: Getting Started	6	6	0	P1
E9: Competitive Positioning	1	0	1	P1
E10: Web UI	9	9	0	P2
E11: MCP Server	3	3	0	P0
E12: LLM Intelligence	2	2	0	P0
E13: Skill Development	3	3	0	P1
E14: Evaluation Enhancements	4	4	0	P1
E15: Compliance & Validation	3	3	0	P2
E16: A/B Testing	4	4	0	P0
E17: Documentation & Site	2	2	0	P1
E18: Eval Registry	6	0	6	P2
Total	103	93	10

🎯 Waza Platform Roadmap - Tracking Issue #8

Description

Waza Skills Development Platform

Primary Phase (Core Features)

E1: Go CLI Foundation (P0)

E2: Sensei Engine (P0)

E3: Evaluation Framework (P0)

E4: Token Management (P1)

E5: Waza Skill (P1)

Secondary Phase (Integration & Extensions)

E6: CI/CD Integration (P1)

E7: AZD Extension (P2)

E8: Getting Started Experience (P1)

E3: Evaluation Framework — Azure ML Evaluator Integration

E3: Multi-Skill Evaluation Support

E3: A/B Skill Impact Measurement

E3: Extended Evaluation Features

E9: Competitive Positioning (P1)

E10: Web UI (P2)

v0.8.0: Advanced Features & MCP Integration (Shipped)

E11: MCP Server Integration (P0)

E12: LLM-Powered Intelligence (P0)

E13: Advanced Skill Development (P1)

E14: Evaluation Enhancements (P1)

E15: Compliance & Validation (P2)

v0.9.0: A/B Testing, Discovery & Competitive Features

E16: A/B Testing & Comparative Evaluation (P0)

E17: Documentation & Site (P1)

E18: Eval & Grader Registry (P2) — Design Complete

Summary

Related Documents

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions