Skip to content

🎯 Waza Platform Roadmap - Tracking Issue #8

@spboyer

Description

@spboyer

Migrated from spboyer/waza#66

Waza Skills Development Platform

This is the tracking issue for the waza platform implementation based on the PRD and Squad Proposal.

Primary Phase (Core Features)

E1: Go CLI Foundation (P0)

E2: Sensei Engine (P0)

E3: Evaluation Framework (P0)

E4: Token Management (P1)

E5: Waza Skill (P1)

Secondary Phase (Integration & Extensions)

E6: CI/CD Integration (P1)

E7: AZD Extension (P2)

E8: Getting Started Experience (P1)


E3: Evaluation Framework — Azure ML Evaluator Integration

  • #104 - Implement prompt (LLM-as-judge) grader
  • #105 - Implement action_sequence grader
  • #106 - Port Azure ML tool_call evaluation rubrics
  • #107 - Port Azure ML task evaluation rubrics
  • #108 - Create example eval YAMLs using new graders
  • #109 - Document prompt and action_sequence grader types
  • #138 - Multi-model evaluation with recommendation engine

E3: Multi-Skill Evaluation Support

  • #142 - Wire skill_directories from eval YAML to Copilot SDK
  • #143 - Add required_skills preflight validation
  • #144 - Add skill_invocation grader for asserting dependent skill usage

E3: A/B Skill Impact Measurement

  • #194 - --baseline flag for A/B skill impact comparison

E3: Extended Evaluation Features

E9: Competitive Positioning (P1)

  • #195 - Multi-agent engine support (assigned: richardpark-msft)

E10: Web UI (P2)

  • feat: Map OpenAI Evals YAML format → waza graders #14 - Web UI + Dashboard (competitive analysis)
  • #201 - Scaffold React 19 + Vite + Tailwind CSS v4 (PR #212)
  • #202 - Dashboard shell layout with DevEx-style dark theme (PR #215)
  • #203 - HTTP web server for waza serve (PR #211)
  • #204 - Phase 1 REST API endpoints (PR #210)
  • #205 - KPI summary cards component (PR #214)
  • #206 - Recent Runs sortable table (PR #216)
  • #207 - Run Detail drill-down view (PR #217)
  • #208 - Playwright E2E test infrastructure (PR #218)

v0.8.0: Advanced Features & MCP Integration (Shipped)

E11: MCP Server Integration (P0)

  • #286 - Always-on waza serve with MCP transport
  • #316 - MCP scoring validators integration
  • #289 - 10 MCP tools (run, init, generate, compare, dev, tokens, serve, new, init-task, help)

E12: LLM-Powered Intelligence (P0)

  • #287 - waza suggest command for AI-powered eval recommendations
  • #309 - --judge-model flag for separate judge LLM configuration

E13: Advanced Skill Development (P1)

  • #288 - Interactive skill for workflow orchestration
  • #319 - Auto-generate trigger tests from skill triggers
  • #311 - Skill profile with static token analysis

E14: Evaluation Enhancements (P1)

  • #299 - Grader weighting for weighted composite scores
  • #308 - Statistical confidence intervals via bootstrap
  • #317 - Batch processing with waza dev multi-skill support
  • #318 - Token budget enforcement with strict comparison mode

E15: Compliance & Validation (P2)

  • #314 - agentskills.io specification compliance checks
  • #315 - SkillsBench 5 advisory checks
  • #312 - JUnit XML reporter for CI pipeline integration

v0.9.0: A/B Testing, Discovery & Competitive Features

E16: A/B Testing & Comparative Evaluation (P0)

  • #307 - A/B baseline testing (--baseline flag)
  • #310 - Pairwise LLM judging with bias mitigation
  • #391 - Tool constraint assertions (expect_tools / reject_tools)
  • #392 - Auto skill discovery (--discover flag)

E17: Documentation & Site (P1)

  • #383 - Releases page on GitHub Pages site
  • #381 - Convert ASCII diagrams to Mermaid

E18: Eval & Grader Registry (P2) — Design Complete

  • #385 - Eval & Grader Registry design doc (parent epic)
  • #386 - Map OpenAI Evals YAML format to waza graders
  • #387 - Go-module-style grader/eval references
  • #388 - Registry backend evaluation
  • #389 - Composable eval construction
  • #390 - Grader plugin extensibility

Summary

Epic Total Done Open Priority
E1: Go CLI Foundation 10 9 1 P0
E2: Sensei Engine 7 5 2 P0
E3: Evaluation Framework 23 23 0 P0
E4: Token Management 6 6 0 P1
E5: Waza Skill 5 5 0 P1
E6: CI/CD Integration 5 5 0 P1
E7: AZD Extension 4 4 0 P2
E8: Getting Started 6 6 0 P1
E9: Competitive Positioning 1 0 1 P1
E10: Web UI 9 9 0 P2
E11: MCP Server 3 3 0 P0
E12: LLM Intelligence 2 2 0 P0
E13: Skill Development 3 3 0 P1
E14: Evaluation Enhancements 4 4 0 P1
E15: Compliance & Validation 3 3 0 P2
E16: A/B Testing 4 4 0 P0
E17: Documentation & Site 2 2 0 P1
E18: Eval Registry 6 0 6 P2
Total 103 93 10

Related Documents

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions