SPML — ML SuperPowers

SPML is an addon plugin for Superpowers that extends it with ML experiment workflows: Validation Pyramid, experiment-driven development, and Watchdog-based training monitoring.

Superpowers provides the foundation — TDD, code review, subagent architecture, verification. SPML adds the ML domain knowledge on top: what to validate, how to monitor training, and how to draw evidence-based conclusions.

What makes ML different

In traditional software, code runs = result correct. In ML, code runs without errors does NOT mean the result is correct.

"Not working" is reasonable in ML, but the process must be correct. If an implementation error causes poor results, you may misjudge your experimental strategy as ineffective, wasting an entire research direction.

SPML addresses this with:

Validation Pyramid — 3-level verification (static analysis, runtime metrics, e2e pipeline) that separates "implementation bug" from "strategy doesn't work"
Watchdog Agent — active monitoring of long-running training with auto-restart, parameter fixing, and sub-agent spawning for complex issues
Experiment-driven workflow — hypothesis, independent/dependent/control variables, conclusion recording with metric evidence

Installation

Prerequisites

Install Superpowers first. SPML depends on Superpowers for general development skills (TDD, code review, debugging, etc.).

Claude Code

In Claude Code:

/plugin marketplace add qqhard/superpowers-ML
/plugin install spml

Codex

SPML also works with Codex through native skill discovery.

Install superpowers first, then install SPML:

git clone https://github.com/obra/superpowers.git ~/.codex/superpowers
git clone https://github.com/qqhard/superpowers-ML.git ~/.codex/spml
mkdir -p ~/.agents/skills
ln -s ~/.codex/superpowers/skills ~/.agents/skills/superpowers
ln -s ~/.codex/spml/skills ~/.agents/skills/spml

See docs/README.codex.md for the full Codex guide.

Verify Installation

Start a new session and check that both skill sets are available.

Claude Code:

/superpowers:brainstorm   → general software brainstorming
/spml:brainstorm          → ML experiment brainstorming

Codex:

Ask Codex to use superpowers:brainstorming for general software work or spml:ml-brainstorming for ML experiment work.

How the two plugins work together

General software development:
  /superpowers:brainstorm → superpowers:writing-plans → superpowers:subagent-driven-development
  All skills from Superpowers, SPML not involved.

ML experiments:
  /spml:brainstorm → spml:experiment-planning → spml:ml-subagent-dev
  ML workflow from SPML, general discipline (TDD, code review) from Superpowers.

SPML skills reference Superpowers skills where needed (e.g., superpowers:finishing-a-development-branch, superpowers:using-git-worktrees). Cross-plugin skill invocation works transparently.

The ML Workflow

brainstorming
    Refine hypothesis, collect context, confirm validation scope
    |
experiment-planning
    Break into atomic subtasks with validation criteria
    |
ml-subagent-dev
    Execute each subtask: unit test → implement → Validation Pyramid
    |
training-handoff
    Generate training script + Watchdog prompt + experiment context
    |
watchdog (independent session)
    Active monitoring: auto-restart, parameter fixing, anomaly diagnosis
    |
training-resume (independent session)
    Analyze results or diagnose issues, decide next step
    |
verification
    Evidence-based conclusion: effective / ineffective / inconclusive

Validation Pyramid

Each subtask passes through 3 levels of validation before claiming correctness:

Level	What it checks	Time
L0: Static Analysis	Device consistency, precision config, FlashAttention, optimizer, DataLoader, logging & observability + 15 advisory checks	Seconds
L1: Runtime Validation	MFU, TCA, throughput, gradient health, loss trend, architecture-specific metrics	~5 min
L2: E2E Pipeline	Full flow on tiny data: data → train → checkpoint → infer → evaluate (1-5 steps per stage)	~2 min

L0 runs as a subagent (code review style). L1 and L2 run as skills invoked by the orchestrator. Each level must pass before proceeding to the next.

Watchdog Agent

Long-running training is monitored by an independent agent session with three operating modes:

Monitor — report only, no intervention
Guardian (default) — auto-restart on environment failures, auto-fix simple parameter problems, report complex issues
Autonomous — handle everything including complex issues via sub-agent spawning

Problems are classified into 3 tiers: environment problems (restart), simple parameter problems (fix + restart), and complex problems (sub-agent or report). The watchdog produces a recovery or completion prompt for the next session.

Skills

ML Workflow

Skill	Purpose
brainstorming	Experiment design, context collection, validation scope confirmation
experiment-planning	Subtask decomposition with validation criteria
data-preparation	TDD-first dataset processing: validate on small-scale, then full-scale
ml-subagent-dev	Execute subtasks with VP integration and experiment-aware review
diagnostics	Systematic diagnosis: why not converging, early anomalies, efficiency bottlenecks
verification	Evidence-based conclusion with experiment summary
training-handoff	Generate training script + Watchdog prompt + experiment context
watchdog	Active monitoring of long-running tasks with 3 operating modes
training-resume	Recovery or completion entry point after long-running tasks

Validation Pyramid

Skill	Checks
validation-pyramid	3-level validation orchestration integrated into ml-subagent-dev workflow
ml-static-checks	L0: Static analysis — device consistency, precision, FA, optimizer, DataLoader, logging & observability + 15 advisory checks
ml-runtime-validator	L1: Minutes-level runtime — MFU, TCA, throughput, gradient health, loss trend, arch-specific metrics
ml-e2e-validator	L2: End-to-end pipeline — data → train → checkpoint → infer → evaluate (1-5 steps per stage)

Shared Infrastructure (modified from Superpowers)

Skill	Why modified
executing-plans	Routes to `spml:experiment-planning` instead of `superpowers:writing-plans`

From Superpowers (not included, used via cross-plugin reference)

TDD, systematic-debugging, brainstorming, writing-plans, dispatching-parallel-agents, using-git-worktrees, requesting/receiving-code-review, finishing-a-development-branch, verification-before-completion, writing-skills — all provided by Superpowers.

Toolkit

Profiling tools that agents struggle to write correctly from scratch:

Tool	Purpose
`toolkit/profiling/l0_runner.py`	L1 runtime validation entry point — orchestrates metric collection
`toolkit/profiling/mfu_calculator.py`	Theoretical FLOPS + MFU/TCA calculation
`toolkit/profiling/dcgm_profiler.py`	NVIDIA DCGM field 1004 profiling for TCA measurement
`toolkit/profiling/gap_analyzer.py`	Hierarchical bottleneck decomposition
`toolkit/profiling/layer_profiler.py`	Per-layer forward/backward timing
`toolkit/profiling/memory_profiler.py`	Memory analysis and fragmentation

Acknowledgments

SPML builds on Superpowers by Jesse Vincent. The skill system architecture, workflow patterns, and multi-platform support are all from Superpowers. Read more: Superpowers for Claude Code.

License

MIT License — see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 446 Commits
.claude-plugin		.claude-plugin
.codex		.codex
.cursor-plugin		.cursor-plugin
.github		.github
.opencode		.opencode
agents		agents
docs		docs
hooks		hooks
skills		skills
tests		tests
toolkit		toolkit
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RELEASE-NOTES.md		RELEASE-NOTES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPML — ML SuperPowers

What makes ML different

Installation

Prerequisites

Claude Code

Codex

Verify Installation

How the two plugins work together

The ML Workflow

Validation Pyramid

Watchdog Agent

Skills

ML Workflow

Validation Pyramid

Shared Infrastructure (modified from Superpowers)

From Superpowers (not included, used via cross-plugin reference)

Toolkit

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SPML — ML SuperPowers

What makes ML different

Installation

Prerequisites

Claude Code

Codex

Verify Installation

How the two plugins work together

The ML Workflow

Validation Pyramid

Watchdog Agent

Skills

ML Workflow

Validation Pyramid

Shared Infrastructure (modified from Superpowers)

From Superpowers (not included, used via cross-plugin reference)

Toolkit

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages