chore(sprint): mark story 1-9 as done by slabgorb · Pull Request #4 · slabgorb/sidequest

slabgorb · 2026-03-25T20:51:35Z

Story 1-9 (prompt framework) complete — merged in sidequest-api PR #5.

Co-Authored-By: Claude <noreply@anthropic.com>

Genre pack models, loader, and validation delivered. 70+ model structs, unified loader, trope inheritance with cycle detection, two-phase validation. 60 tests passing. PR #4 merged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…sition (#108) * docs(spec): LoRA training-to-runtime pipeline design Spec for moving off Draw Things-trained LoRAs (which mflux silently ignores) to mlx-examples + custom key-remapper, with hybrid genre/world LoRA stacking, OTEL-backed verification, and a top-down rollout plan. Covers: dataset layout, trainer wrapper, key remapper (mlx-npz -> mflux-safetensors), storage conventions, visual_style.yaml multi-LoRA schema with extend+exclude inheritance, verification gate (key-match + SSIM + trigger-discrimination), daemon multi-LoRA runtime changes, and a 6-phase migration plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): LoRA training-to-runtime pipeline implementation plan TDD-structured task breakdown for the spec at docs/superpowers/specs/2026-04-20-lora-pipeline-design.md. 6 phases, 23 tasks, 108 steps. Phase 0 captures mlx-examples output as prerequisite knowledge for Phase 1 (the remapper). Phase 2 is the top-down proof: one trained LoRA end-to-end before any infrastructure scaffolding. Phase 3 formalizes the verification gate; no second LoRA ships until it is live. Phase 4 widens daemon protocol to multi-LoRA with OTEL instrumentation. Phase 5 proves the pipeline on the second world (the_real_mccoy) to validate extend+exclude+add inheritance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(adr): ADR-083 multi-LoRA stacking and verification pipeline Captures four architectural decisions extending ADR-032: 1. MLX-native training (mlx-examples/flux) + custom key-remapper as mandatory pipeline stage. Alternatives (Draw Things, Kohya, ai-toolkit) rejected with rationale. 2. Hybrid genre + world LoRA stacking with extend/exclude/add merge semantics in visual_style.yaml. Zero-config for common case; explicit for divergent worlds like the_real_mccoy. 3. Two-layer verification: SSIM-based pre-promotion gate + runtime matched_key_count as attribute on the existing flux_mlx.render OTEL span. Closes the silent-fallback trap from both ends. 4. Protocol widening to lora_paths[]/lora_scales[]; clean cutover, no compat shim. Notes load-bearing constraints (mflux>=0.4,<0.5 pin, empirical SSIM thresholds calibrated in plan Phase 3) and consequences (keymap YAML becomes versioned artifact, mflux private-API dependency). Proposed. Extends ADR-032. Supersedes nothing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.1 — scaffold scripts/lora + pytest infra Establishes orchestrator-root Python project for the LoRA pipeline tooling (per the plan at docs/superpowers/plans/2026-04-20-lora-pipeline.md). Files: - pyproject.toml — minimal declaration (numpy, pyyaml; pytest in dev extras) - scripts/__init__.py + scripts/lora/__init__.py — package markers so `from scripts.lora.X import Y` resolves in tests - tests/__init__.py + tests/lora/__init__.py — test package markers - tests/lora/conftest.py — skeletal toy_npz_path + sample_keymap_path fixtures. Real MLX key patterns filled by Task 1.4 once Phase 0 captures them in docs/superpowers/notes/. Verification: `pytest tests/lora/ --collect-only` runs clean with configfile=pyproject.toml. Refs: ADR-083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.3 — remapper hard-fails on unknown MLX keys Minimal TDD increment: one test, one failure-mode path. scripts/lora/remap_mlx_to_mflux.py - RemapError exception - _load_keymap / _match_rule helpers (ready for Task 1.4 happy path) - remap_npz_to_safetensors detects unmapped keys, raises RemapError with every offending key named. Does not write output. - Happy-path write raises NotImplementedError — Task 1.4 fills it in once Phase 0 keymap data lands. tests/lora/test_remap.py - test_unknown_mlx_key_hard_fails: writes a .npz with one bogus key, asserts RemapError names it, asserts no partial safetensors written. Per the no-silent-fallback rule: mapping gaps are loud, never dropped. Refs: ADR-083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 2 Task 2.1 — training-wrapper dataset preflight scripts/lora/train.py (preflight surface only) - PreflightError exception - preflight_dataset(dir) validates: * directory exists * every image has a same-stem .txt caption (hard-fail on unpaired) * at least MIN_IMAGES=150 pairs (matches /sq-lora Step 3 floor) - CLI + subprocess invocation deferred to Task 2.3 when the first real overnight training exercises them end-to-end. tests/lora/test_train.py - 4 tests cover missing dir, unpaired files, low volume, valid dataset. Full lora test suite: 5/5 GREEN. No regression on Task 1.3. Refs: ADR-083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(lora): Phase 0 notes — mlx-examples Flux observations Closes Phase 0 of the LoRA pipeline plan. Captures: - Commit SHA 796f5b5 of ml-explore/mlx-examples - Trainer entrypoint: flux/dreambooth.py - Output format: CORRECTION — safetensors, NOT .npz as plan assumed. Plan Tasks 1.3, 1.4, 1.5 need fixtures swapped from np.savez/np.load to safetensors.torch. Task 1.4 handles the correction. - Observed 22 distinct module-path patterns, 608 keys at rank 4 / --lora-blocks -1 on flux-dev (19 double + 38 single blocks) - Lowercase .lora_a / .lora_b (not lora_A/B, not lora_down/up) - Shape convention: lora_a = (input_dim, rank); lora_b = (rank, output_dim) ↔ Kohya reverses both axes — every keymap rule needs transpose: true - Complete MLX→Kohya name translation table (26 rows) - Dataset format: train.jsonl (not paired .jpg+.txt) — plan Task 2.1 needs a jsonl conversion step - Empirical: 48.789 GB peak RAM, ~4.4s/iter at rank 4 on M3 Max 128GB Toy artifact preserved at ~/mlx-toy-lora/final_adapters.safetensors for plan Task 1.5's end-to-end render proof. Refs: ADR-083, docs/superpowers/plans/2026-04-20-lora-pipeline.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.2 — mlx→mflux keymap YAML 26 rules covering every MLX-examples Flux LoRA key pattern observed in Phase 0. 13 distinct module types × 2 directions (lora_a→lora_down, lora_b→lora_up): Double blocks (10 modules): img_attn.proj, img_attn.qkv, img_mlp.layers.{0,2}, img_mod.lin, txt_attn.proj, txt_attn.qkv, txt_mlp.layers.{0,2}, txt_mod.lin. Single blocks (3 modules): linear1, linear2, modulation.lin. Every rule has transpose: true — MLX stores lora_a as (input, rank) and lora_b as (rank, output); Kohya/mflux expect the reverse of both axes. Verification against Phase 0 toy safetensors (~/mlx-toy-lora/): 608 keys, 0 unmatched. Per-rule counts: 19 matches for each double-block rule (flux-dev has 19 double blocks), 38 matches for each single-block rule (flux-dev has 38 single blocks). Perfect coverage. Refs: ADR-083, docs/superpowers/notes/2026-04-20-mlx-examples-flux-notes.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.4 — remapper happy-path + safetensors correction Applies the Phase 0 correction: mlx-examples outputs safetensors, not .npz. The remapper function renames + swaps its I/O path accordingly. scripts/lora/remap_mlx_to_mflux.py - remap_npz_to_safetensors → remap_mlx_safetensors (rename reflects corrected input type) - I/O: safetensors.safe_open / safetensors.torch.save_file (was numpy.load / numpy.savez) - Happy-path implementation: iterate input keys, translate via keymap, transpose axes [0, 1] per rule, accumulate into output dict, write via safetensors.torch.save_file - Rank detection by min dim of any lora_down weight - Unknown-key hard-fail preserved (Task 1.3 behavior) tests/lora/conftest.py - Fixture rename: toy_npz_path → toy_safetensors_path with real MLX key patterns at small shapes (rank 4, hidden 32) - New fixture: real_keymap_path points at production keymap so happy-path tests catch keymap drift - Fixture rename: sample_keymap_path → empty_keymap_path (always had zero rules; rename makes intent explicit) tests/lora/test_remap.py - Migrated test_unknown_mlx_key_hard_fails to safetensors I/O - NEW test_happy_path_translates_known_keys: 4 MLX keys translate to 4 correctly-named Kohya keys - NEW test_transpose_flips_axes: verifies (input, rank) → (rank, input) and value-preservation pyproject.toml: add safetensors>=0.4 and torch>=2.0 runtime deps. End-to-end smoke on the real Phase 0 toy artifact (77MB, 608 keys): remaps cleanly to 608 Kohya-named keys, rank=4 detected, first lora_down shape (4, 3072), first lora_up shape (3072, 4). Test suite: 7/7 GREEN. Refs: ADR-083, docs/superpowers/notes/2026-04-20-mlx-examples-flux-notes.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(render): Phase 4 Task 4.1 (pulled forward) — send_render protocol widening Widens scripts/render_common.send_render() from singleton (lora_path: str, lora_scale: float) to arrays (lora_paths: list[str] | None, lora_scales: list[float] | None). No compat shim — per ADR-083, the daemon is a sidecar with a small known consumer surface; a dual-protocol shim would outlast the cutover. Also updates the sole caller (render_batch → send_render) to promote any legacy flat `lora:` / `lora_scale:` visual_style.yaml keys into single-entry arrays. This keeps the existing generate_poi_images.py working throughout the migration; Task 4.4 replaces the transitional shim with compose_lora_stack() once the extend/exclude/add schema lands. Pulled forward from Phase 4 per Architect correction #4 — Task 1.5's end-to-end remapper proof needs this protocol to test against. Validation: - asymmetric length check: lora_scales count must match lora_paths - legacy params lora_path/lora_scale removed from signature Tests: 3 new tests in tests/test_render_common.py, 10/10 suite GREEN. Refs: ADR-083, plan docs/superpowers/plans/2026-04-20-lora-pipeline.md §Task 4.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.5 — end-to-end remapper proof + fused-QKV fix The end-to-end roundtrip test caught a real silent fallback in the v1 keymap: mflux's BFL loader splits the lora_down rank-axis into N chunks (3 for double_blocks img/txt_attn_qkv, 4 for single_blocks linear1) and applies them per Q/K/V/MLP target. mlx-examples trains those layers with a SHARED rank-r down matrix, so the v1 keymap's transpose-only output gave mflux a rank-4 down that it sliced into rank-1 chunks — while the matching lora_up stayed rank-4. Result: a (1, seq, 1) @ (4, 3072) matmul fail at runtime, exactly the silent-fallback the test was built to detect. Fix: keymap gains a `replicate: N` field (default 1). The remapper tiles the transposed lora_down N times along axis 0 for fused layers, so each slice mflux extracts equals the original shared D — mathematically the same delta as the fused mlx form, because (x @ D) @ U_split equals split[(x @ D) @ U]. Non-fused layers (proj, mod, mlp_0/2, linear2, modulation_lin) keep replicate=1 — the old transpose path is unchanged. Source-rank inference moved off the output tensor (which is now tiled) and onto the source lora_a, where min(shape) is the true rank. Tests: two new unit tests verify the tile shape and per-chunk equality to the original transposed D; the roundtrip test renders the toy LoRA twice (with/without) and asserts visible pixel divergence — 169.83s wall clock against a warm daemon, 608/608 keys matched, 494 layers patched with no errors. Keymap version bumped 1 → 2. * feat(lora): Phase 4 Task 4.3 — compose_lora_stack pure resolver Adds the pure-function resolver that merges genre+world visual_style.yaml LoRA blocks into a flat per-render stack, filtered by tier. Per Architect correction #5, lives in scripts/render_common.py rather than a separate scripts/lora/compose.py — it's small, has no I/O, and the call sites (load_visual_style, render_batch) all live in render_common already. Schema (matches docs/superpowers/specs/2026-04-20-lora-pipeline-design.md): # genre — list of entries loras: - name: sw_landscape file: lora/spaghetti_western/sw_landscape.safetensors scale: 0.8 applies_to: [landscape, scene] trigger: sw_landscape # optional # world — dict with exclude/add loras: exclude: [sw_landscape] # drop inherited entries by name add: # append world-only entries - name: ... Composition order: genre entries (after exclude filter) come first, then world `add:` entries in declared order, then the whole flat list is filtered by `tier in entry["applies_to"]`. Hard-fails (per "no silent fallback") on: - missing required fields (name, file, scale, applies_to) - empty applies_to (an entry that fires on no tier is always misconfiguration — never useful, always a typo) - world.loras given as a list (legacy v1-schema file that hasn't been migrated; refusing to silently inherit it forces explicit migration) - world.add reusing a genre name without first excluding it (ambiguous intent — operator must say which one wins) Seven tests in tests/test_render_common.py cover: no-overrides inheritance, exclude-drops, add-appends, tier filtering, duplicate-name rejection, empty-applies_to rejection, and legacy list-form rejection. All green. Wiring lands in Task 4.4 (load_visual_style returns resolved_loras) and Task 4.6 (visual_style.yaml migration to the loras: schema). The transitional legacy promotion in render_batch (lines ~263-272) stays in place until 4.4 lands the wiring switchover. * feat(render): Phase 4 Task 4.4 — wire compose_lora_stack into load_visual_style Adds an optional `tier=` keyword to load_visual_style. When supplied, the function calls compose_lora_stack(genre_style, world_style, tier) and exposes the result as `merged["resolved_loras"]`. Without `tier=`, the return shape is unchanged from before — the three existing callers (generate_portrait_images.py, generate_creature_images.py, generate_poi_images.py) keep working without modification until Task 4.6 migrates the YAML files and updates them to opt into the new path. Two semantic changes worth pinning: 1. The `loras:` key is intentionally skipped during the field-by-field world overlay. Genre uses list-form, world uses dict-form `{exclude, add}`; a naive overlay would replace the genre list with the world dict and silently drop every genre LoRA. The merged dict keeps the *genre* list as `merged["loras"]`, and resolution happens through the dedicated `compose_lora_stack` path. A test pins this down so a future "let me clean this up" refactor doesn't reintroduce the trap. 2. Backward compat over plan-literal: the original Task 4.4 plan called for changing the signature to `(*, genre_dir, world_dir, tier)` — keyword-only, world_dir as Path, tier required. That breaks all three existing callers in this commit. Per spec authority hierarchy (spec > plan when they conflict), the design spec only mandates the schema and the resolved_loras key, not the function signature. The backward-compat keyword-only `tier=` matches the spec without forcing a coordinated multi-script update; the callers migrate in 4.6 alongside their YAML files. Four new tests in tests/test_render_common.py cover: tier-omitted backward compat, genre-only resolution, world-exclude end-to-end, and the loras-key-not-clobbered invariant. 23/23 fast tests pass. The transitional legacy `lora:` / `lora_scale:` flat-key promotion in render_batch (lines ~263-272) stays in place — it'll come off in 4.6 when the YAMLs are migrated to the loras: schema and the callers start passing tier=. * feat(render): Phase 4 wiring — resolve_lora_args helper + POI script tier The follow-on to Task 4.4. load_visual_style now produces `resolved_loras` when `tier=` is passed, but the consumer side (render_batch) was still reading the legacy flat `lora:` key inline. This commit lands the wiring switchover with a small, testable helper. scripts/render_common.py - New `resolve_lora_args(visual_style)` — pure function, returns the `(lora_paths, lora_scales)` tuple to hand to send_render. Three cases: resolved_loras present (preferred), legacy `lora:` flat key (transitional), neither (no LoRA). Hard-fails when both schemas are present in the same dict — the merged shape can't represent both, and silently picking one would mask a YAML mid-migration error. - render_batch now calls the helper instead of inlining the legacy promotion. The transitional comment in render_batch is gone — the transition lives in the helper's docstring where it's testable. scripts/generate_poi_images.py - load_visual_style call now passes tier="landscape" so that, once the spaghetti_western YAMLs migrate to the loras: schema in Task 4.6, POI renders pick up the resolved stack. Pre-migration YAMLs (no loras: block) get an empty resolved_loras list which naturally falls through to the legacy `lora:` path inside resolve_lora_args. No behavior change today, full plumbing for tomorrow. - Portrait and creature scripts will get the same tier= treatment when their YAMLs migrate; not touched here to keep the diff surface aligned with the one caller that's about to start benefiting (POI renders for the spaghetti_western LoRA work). tests/test_render_common.py - 6 tests for resolve_lora_args: no-LoRA, legacy-only, default-scale, resolved-only, both-present-raises, empty-resolved-falls-through. - Rounds the file out to 20 tests covering the full Task 4.3-4.4 surface end-to-end. 29/29 fast tests green. Task 4.5 (lora/ directory rename) and Task 4.6 (YAML schema migration) remain — both blocked: 4.5 conflicts with an in-flight uncommitted local change to dust_and_lead/visual_style.yaml in sidequest-content, and 4.6 needs the Phase 2 trained LoRA that 4.6's schema would reference. Both unblock together. * chore(render): thread tier= through portrait + creature scripts Symmetric with the POI script change in 7340f0f. Each script knows its own tier (portrait, landscape, etc.) and passes it to load_visual_style so resolved_loras populates ahead of the per-render resolve_lora_args call. Pre-migration YAMLs produce an empty resolved_loras list which falls through to the legacy `lora:` path inside the helper — no behavior change today, full plumbing for once Phase 2's portrait LoRAs are trained and the YAMLs migrate to the loras: schema. 29/29 fast tests still green. Smoke-tested against the elemental_harmony genre pack across all three tiers — no regressions. * docs(skills): Phase 5 Task 5.1 — rewrite /sq-lora for the mlx pipeline Drops the Draw Things CLI workflow entirely and points the skill at the new pipeline that landed in Phases 1-4: Step 0 — environment check (mlx-examples checkout + venv + mflux pin) Step 1 — image collection (unchanged in spirit; snake_case dirs) Step 2 — paired .txt captions (matches scripts/lora/train.py preflight) Step 3 — preflight gate (scripts/lora/train.py::preflight_dataset) Step 4 — train: convert paired .txt → train.jsonl, then run ~/Projects/mlx-examples/flux/dreambooth.py (rank 8, 1500 iter) Step 5 — remap (scripts/lora/remap_mlx_to_mflux.py) + roundtrip test (tests/lora/test_remap_roundtrip.py — the silent-fallback detector) + OTEL render.lora.matched_keys check Step 6 — wire into visual_style.yaml under the loras: schema (genre list-form + world {exclude, add} dict-form, matching scripts/render_common.py::compose_lora_stack) Step 7 — re-render POIs and eyeball Also surfaces: - Hard "no Draw Things, no .ckpt for new work" — legacy .ckpts now archived under lora/{genre}/archive/legacy/ (Task 4.5 layout) - The matched-key OTEL counter (Task 4.2b) as the canary for keymap drift and silent no-op LoRAs - The Architect correction #5 file path (compose_lora_stack lives in render_common.py, not a separate compose.py module) - ADR-083 + the design spec + the Phase 0 mlx-examples notes as the authoritative references Reference table updated end-to-end: Draw Things and its model dir are gone, replaced by the actual files in this repo + the mlx-examples checkout path. The "Completed LoRAs" table now distinguishes legacy .ckpt artifacts from the new mlx-pipeline .safetensors slots. Skill description in frontmatter updated so /sq-lora's listing reads "collect, tag, train via mlx-examples, remap to mflux, verify on the daemon" — matches the new flow. * fix(skills): /sq-lora — style LoRA captions are one tag, not many Caught during the first real run-through (genre=spaghetti_western, tier=landscape) by Keith. The Step 2 captioning section I wrote copied the subject-LoRA convention (trigger + style tags + per-image subject tags), which is exactly wrong for a style LoRA. Corrected per kohya-ss / ai-toolkit / civitai community consensus (verified via Perplexity citation chain — see commit conversation): Style LoRA → every .txt is the bare trigger token, identical across the dataset. Genre-level uses the genre name (spaghetti_western); world-level uses the world name (the_real_mccoy). No descriptors, no subject tags. Subject LoRA → detailed natural-language captions per image with trigger + descriptive tags. SideQuest hasn't trained any subject LoRAs yet; flagged in skill as a separate flow if/when one is needed. Also fixed two downstream bugs: - Step 6 visual_style.yaml example showed `trigger: {genre}_{tier}` (e.g., `spaghetti_western_landscape`); should be the bare genre/ world name to match what the model actually trained on. The `name` field still includes the tier suffix because it's the internal handle compose_lora_stack uses to address entries. - Step 3 manual checklist said "every caption starts with the trigger word" — replaced with caption-purity check that explicitly forbids any extra text past the trigger token, with a spot-check one-liner. The why-it-matters: per-image subject tags actively poison a style LoRA. The model starts attributing the trained register's effect to "saloon" or "film_grain" instead of to the bare trigger, and prompts using those tokens with no LoRA loaded will partially fire the style. The legacy `leone_style` LoRA's existing 72 captions show the prior generation of this mistake — 13 tags per image, all identical, mixing trigger + style descriptors. Salvageable as a starting point for the new run but the captions need a full rewrite to a single token before training. Skill description in frontmatter unchanged; the listing already says "collect, tag, train via mlx-examples, remap to mflux, verify on the daemon" which still fits. * fix(render): prefer visual_prompt, drop narrator prose from POI composition scripts/generate_poi_images.py now uses a POI's visual_prompt field verbatim when present (single-trigger + renderable-subject convention per ADR-083), falling back to "{name}: {description}" only when the art-director has not yet authored one. Chapter-level atmosphere and location are narrator-facing prose and no longer leak into Flux prompts — they were the source of 80-130 tokens of unrenderable literary metaphor on every POI. Dry-run against spaghetti_western/dust_and_lead now produces 87-97 token prompts (was 187-234) while preserving the art-director's compositional intent. Per handoff-2026-04-21-poi-prompt-composition-dev.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ps (#110) * docs(spec): LoRA training-to-runtime pipeline design Spec for moving off Draw Things-trained LoRAs (which mflux silently ignores) to mlx-examples + custom key-remapper, with hybrid genre/world LoRA stacking, OTEL-backed verification, and a top-down rollout plan. Covers: dataset layout, trainer wrapper, key remapper (mlx-npz -> mflux-safetensors), storage conventions, visual_style.yaml multi-LoRA schema with extend+exclude inheritance, verification gate (key-match + SSIM + trigger-discrimination), daemon multi-LoRA runtime changes, and a 6-phase migration plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(plan): LoRA training-to-runtime pipeline implementation plan TDD-structured task breakdown for the spec at docs/superpowers/specs/2026-04-20-lora-pipeline-design.md. 6 phases, 23 tasks, 108 steps. Phase 0 captures mlx-examples output as prerequisite knowledge for Phase 1 (the remapper). Phase 2 is the top-down proof: one trained LoRA end-to-end before any infrastructure scaffolding. Phase 3 formalizes the verification gate; no second LoRA ships until it is live. Phase 4 widens daemon protocol to multi-LoRA with OTEL instrumentation. Phase 5 proves the pipeline on the second world (the_real_mccoy) to validate extend+exclude+add inheritance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(adr): ADR-083 multi-LoRA stacking and verification pipeline Captures four architectural decisions extending ADR-032: 1. MLX-native training (mlx-examples/flux) + custom key-remapper as mandatory pipeline stage. Alternatives (Draw Things, Kohya, ai-toolkit) rejected with rationale. 2. Hybrid genre + world LoRA stacking with extend/exclude/add merge semantics in visual_style.yaml. Zero-config for common case; explicit for divergent worlds like the_real_mccoy. 3. Two-layer verification: SSIM-based pre-promotion gate + runtime matched_key_count as attribute on the existing flux_mlx.render OTEL span. Closes the silent-fallback trap from both ends. 4. Protocol widening to lora_paths[]/lora_scales[]; clean cutover, no compat shim. Notes load-bearing constraints (mflux>=0.4,<0.5 pin, empirical SSIM thresholds calibrated in plan Phase 3) and consequences (keymap YAML becomes versioned artifact, mflux private-API dependency). Proposed. Extends ADR-032. Supersedes nothing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.1 — scaffold scripts/lora + pytest infra Establishes orchestrator-root Python project for the LoRA pipeline tooling (per the plan at docs/superpowers/plans/2026-04-20-lora-pipeline.md). Files: - pyproject.toml — minimal declaration (numpy, pyyaml; pytest in dev extras) - scripts/__init__.py + scripts/lora/__init__.py — package markers so `from scripts.lora.X import Y` resolves in tests - tests/__init__.py + tests/lora/__init__.py — test package markers - tests/lora/conftest.py — skeletal toy_npz_path + sample_keymap_path fixtures. Real MLX key patterns filled by Task 1.4 once Phase 0 captures them in docs/superpowers/notes/. Verification: `pytest tests/lora/ --collect-only` runs clean with configfile=pyproject.toml. Refs: ADR-083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.3 — remapper hard-fails on unknown MLX keys Minimal TDD increment: one test, one failure-mode path. scripts/lora/remap_mlx_to_mflux.py - RemapError exception - _load_keymap / _match_rule helpers (ready for Task 1.4 happy path) - remap_npz_to_safetensors detects unmapped keys, raises RemapError with every offending key named. Does not write output. - Happy-path write raises NotImplementedError — Task 1.4 fills it in once Phase 0 keymap data lands. tests/lora/test_remap.py - test_unknown_mlx_key_hard_fails: writes a .npz with one bogus key, asserts RemapError names it, asserts no partial safetensors written. Per the no-silent-fallback rule: mapping gaps are loud, never dropped. Refs: ADR-083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 2 Task 2.1 — training-wrapper dataset preflight scripts/lora/train.py (preflight surface only) - PreflightError exception - preflight_dataset(dir) validates: * directory exists * every image has a same-stem .txt caption (hard-fail on unpaired) * at least MIN_IMAGES=150 pairs (matches /sq-lora Step 3 floor) - CLI + subprocess invocation deferred to Task 2.3 when the first real overnight training exercises them end-to-end. tests/lora/test_train.py - 4 tests cover missing dir, unpaired files, low volume, valid dataset. Full lora test suite: 5/5 GREEN. No regression on Task 1.3. Refs: ADR-083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(lora): Phase 0 notes — mlx-examples Flux observations Closes Phase 0 of the LoRA pipeline plan. Captures: - Commit SHA 796f5b5 of ml-explore/mlx-examples - Trainer entrypoint: flux/dreambooth.py - Output format: CORRECTION — safetensors, NOT .npz as plan assumed. Plan Tasks 1.3, 1.4, 1.5 need fixtures swapped from np.savez/np.load to safetensors.torch. Task 1.4 handles the correction. - Observed 22 distinct module-path patterns, 608 keys at rank 4 / --lora-blocks -1 on flux-dev (19 double + 38 single blocks) - Lowercase .lora_a / .lora_b (not lora_A/B, not lora_down/up) - Shape convention: lora_a = (input_dim, rank); lora_b = (rank, output_dim) ↔ Kohya reverses both axes — every keymap rule needs transpose: true - Complete MLX→Kohya name translation table (26 rows) - Dataset format: train.jsonl (not paired .jpg+.txt) — plan Task 2.1 needs a jsonl conversion step - Empirical: 48.789 GB peak RAM, ~4.4s/iter at rank 4 on M3 Max 128GB Toy artifact preserved at ~/mlx-toy-lora/final_adapters.safetensors for plan Task 1.5's end-to-end render proof. Refs: ADR-083, docs/superpowers/plans/2026-04-20-lora-pipeline.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.2 — mlx→mflux keymap YAML 26 rules covering every MLX-examples Flux LoRA key pattern observed in Phase 0. 13 distinct module types × 2 directions (lora_a→lora_down, lora_b→lora_up): Double blocks (10 modules): img_attn.proj, img_attn.qkv, img_mlp.layers.{0,2}, img_mod.lin, txt_attn.proj, txt_attn.qkv, txt_mlp.layers.{0,2}, txt_mod.lin. Single blocks (3 modules): linear1, linear2, modulation.lin. Every rule has transpose: true — MLX stores lora_a as (input, rank) and lora_b as (rank, output); Kohya/mflux expect the reverse of both axes. Verification against Phase 0 toy safetensors (~/mlx-toy-lora/): 608 keys, 0 unmatched. Per-rule counts: 19 matches for each double-block rule (flux-dev has 19 double blocks), 38 matches for each single-block rule (flux-dev has 38 single blocks). Perfect coverage. Refs: ADR-083, docs/superpowers/notes/2026-04-20-mlx-examples-flux-notes.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.4 — remapper happy-path + safetensors correction Applies the Phase 0 correction: mlx-examples outputs safetensors, not .npz. The remapper function renames + swaps its I/O path accordingly. scripts/lora/remap_mlx_to_mflux.py - remap_npz_to_safetensors → remap_mlx_safetensors (rename reflects corrected input type) - I/O: safetensors.safe_open / safetensors.torch.save_file (was numpy.load / numpy.savez) - Happy-path implementation: iterate input keys, translate via keymap, transpose axes [0, 1] per rule, accumulate into output dict, write via safetensors.torch.save_file - Rank detection by min dim of any lora_down weight - Unknown-key hard-fail preserved (Task 1.3 behavior) tests/lora/conftest.py - Fixture rename: toy_npz_path → toy_safetensors_path with real MLX key patterns at small shapes (rank 4, hidden 32) - New fixture: real_keymap_path points at production keymap so happy-path tests catch keymap drift - Fixture rename: sample_keymap_path → empty_keymap_path (always had zero rules; rename makes intent explicit) tests/lora/test_remap.py - Migrated test_unknown_mlx_key_hard_fails to safetensors I/O - NEW test_happy_path_translates_known_keys: 4 MLX keys translate to 4 correctly-named Kohya keys - NEW test_transpose_flips_axes: verifies (input, rank) → (rank, input) and value-preservation pyproject.toml: add safetensors>=0.4 and torch>=2.0 runtime deps. End-to-end smoke on the real Phase 0 toy artifact (77MB, 608 keys): remaps cleanly to 608 Kohya-named keys, rank=4 detected, first lora_down shape (4, 3072), first lora_up shape (3072, 4). Test suite: 7/7 GREEN. Refs: ADR-083, docs/superpowers/notes/2026-04-20-mlx-examples-flux-notes.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(render): Phase 4 Task 4.1 (pulled forward) — send_render protocol widening Widens scripts/render_common.send_render() from singleton (lora_path: str, lora_scale: float) to arrays (lora_paths: list[str] | None, lora_scales: list[float] | None). No compat shim — per ADR-083, the daemon is a sidecar with a small known consumer surface; a dual-protocol shim would outlast the cutover. Also updates the sole caller (render_batch → send_render) to promote any legacy flat `lora:` / `lora_scale:` visual_style.yaml keys into single-entry arrays. This keeps the existing generate_poi_images.py working throughout the migration; Task 4.4 replaces the transitional shim with compose_lora_stack() once the extend/exclude/add schema lands. Pulled forward from Phase 4 per Architect correction #4 — Task 1.5's end-to-end remapper proof needs this protocol to test against. Validation: - asymmetric length check: lora_scales count must match lora_paths - legacy params lora_path/lora_scale removed from signature Tests: 3 new tests in tests/test_render_common.py, 10/10 suite GREEN. Refs: ADR-083, plan docs/superpowers/plans/2026-04-20-lora-pipeline.md §Task 4.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(lora): Phase 1 Task 1.5 — end-to-end remapper proof + fused-QKV fix The end-to-end roundtrip test caught a real silent fallback in the v1 keymap: mflux's BFL loader splits the lora_down rank-axis into N chunks (3 for double_blocks img/txt_attn_qkv, 4 for single_blocks linear1) and applies them per Q/K/V/MLP target. mlx-examples trains those layers with a SHARED rank-r down matrix, so the v1 keymap's transpose-only output gave mflux a rank-4 down that it sliced into rank-1 chunks — while the matching lora_up stayed rank-4. Result: a (1, seq, 1) @ (4, 3072) matmul fail at runtime, exactly the silent-fallback the test was built to detect. Fix: keymap gains a `replicate: N` field (default 1). The remapper tiles the transposed lora_down N times along axis 0 for fused layers, so each slice mflux extracts equals the original shared D — mathematically the same delta as the fused mlx form, because (x @ D) @ U_split equals split[(x @ D) @ U]. Non-fused layers (proj, mod, mlp_0/2, linear2, modulation_lin) keep replicate=1 — the old transpose path is unchanged. Source-rank inference moved off the output tensor (which is now tiled) and onto the source lora_a, where min(shape) is the true rank. Tests: two new unit tests verify the tile shape and per-chunk equality to the original transposed D; the roundtrip test renders the toy LoRA twice (with/without) and asserts visible pixel divergence — 169.83s wall clock against a warm daemon, 608/608 keys matched, 494 layers patched with no errors. Keymap version bumped 1 → 2. * feat(lora): Phase 4 Task 4.3 — compose_lora_stack pure resolver Adds the pure-function resolver that merges genre+world visual_style.yaml LoRA blocks into a flat per-render stack, filtered by tier. Per Architect correction #5, lives in scripts/render_common.py rather than a separate scripts/lora/compose.py — it's small, has no I/O, and the call sites (load_visual_style, render_batch) all live in render_common already. Schema (matches docs/superpowers/specs/2026-04-20-lora-pipeline-design.md): # genre — list of entries loras: - name: sw_landscape file: lora/spaghetti_western/sw_landscape.safetensors scale: 0.8 applies_to: [landscape, scene] trigger: sw_landscape # optional # world — dict with exclude/add loras: exclude: [sw_landscape] # drop inherited entries by name add: # append world-only entries - name: ... Composition order: genre entries (after exclude filter) come first, then world `add:` entries in declared order, then the whole flat list is filtered by `tier in entry["applies_to"]`. Hard-fails (per "no silent fallback") on: - missing required fields (name, file, scale, applies_to) - empty applies_to (an entry that fires on no tier is always misconfiguration — never useful, always a typo) - world.loras given as a list (legacy v1-schema file that hasn't been migrated; refusing to silently inherit it forces explicit migration) - world.add reusing a genre name without first excluding it (ambiguous intent — operator must say which one wins) Seven tests in tests/test_render_common.py cover: no-overrides inheritance, exclude-drops, add-appends, tier filtering, duplicate-name rejection, empty-applies_to rejection, and legacy list-form rejection. All green. Wiring lands in Task 4.4 (load_visual_style returns resolved_loras) and Task 4.6 (visual_style.yaml migration to the loras: schema). The transitional legacy promotion in render_batch (lines ~263-272) stays in place until 4.4 lands the wiring switchover. * feat(render): Phase 4 Task 4.4 — wire compose_lora_stack into load_visual_style Adds an optional `tier=` keyword to load_visual_style. When supplied, the function calls compose_lora_stack(genre_style, world_style, tier) and exposes the result as `merged["resolved_loras"]`. Without `tier=`, the return shape is unchanged from before — the three existing callers (generate_portrait_images.py, generate_creature_images.py, generate_poi_images.py) keep working without modification until Task 4.6 migrates the YAML files and updates them to opt into the new path. Two semantic changes worth pinning: 1. The `loras:` key is intentionally skipped during the field-by-field world overlay. Genre uses list-form, world uses dict-form `{exclude, add}`; a naive overlay would replace the genre list with the world dict and silently drop every genre LoRA. The merged dict keeps the *genre* list as `merged["loras"]`, and resolution happens through the dedicated `compose_lora_stack` path. A test pins this down so a future "let me clean this up" refactor doesn't reintroduce the trap. 2. Backward compat over plan-literal: the original Task 4.4 plan called for changing the signature to `(*, genre_dir, world_dir, tier)` — keyword-only, world_dir as Path, tier required. That breaks all three existing callers in this commit. Per spec authority hierarchy (spec > plan when they conflict), the design spec only mandates the schema and the resolved_loras key, not the function signature. The backward-compat keyword-only `tier=` matches the spec without forcing a coordinated multi-script update; the callers migrate in 4.6 alongside their YAML files. Four new tests in tests/test_render_common.py cover: tier-omitted backward compat, genre-only resolution, world-exclude end-to-end, and the loras-key-not-clobbered invariant. 23/23 fast tests pass. The transitional legacy `lora:` / `lora_scale:` flat-key promotion in render_batch (lines ~263-272) stays in place — it'll come off in 4.6 when the YAMLs are migrated to the loras: schema and the callers start passing tier=. * feat(render): Phase 4 wiring — resolve_lora_args helper + POI script tier The follow-on to Task 4.4. load_visual_style now produces `resolved_loras` when `tier=` is passed, but the consumer side (render_batch) was still reading the legacy flat `lora:` key inline. This commit lands the wiring switchover with a small, testable helper. scripts/render_common.py - New `resolve_lora_args(visual_style)` — pure function, returns the `(lora_paths, lora_scales)` tuple to hand to send_render. Three cases: resolved_loras present (preferred), legacy `lora:` flat key (transitional), neither (no LoRA). Hard-fails when both schemas are present in the same dict — the merged shape can't represent both, and silently picking one would mask a YAML mid-migration error. - render_batch now calls the helper instead of inlining the legacy promotion. The transitional comment in render_batch is gone — the transition lives in the helper's docstring where it's testable. scripts/generate_poi_images.py - load_visual_style call now passes tier="landscape" so that, once the spaghetti_western YAMLs migrate to the loras: schema in Task 4.6, POI renders pick up the resolved stack. Pre-migration YAMLs (no loras: block) get an empty resolved_loras list which naturally falls through to the legacy `lora:` path inside resolve_lora_args. No behavior change today, full plumbing for tomorrow. - Portrait and creature scripts will get the same tier= treatment when their YAMLs migrate; not touched here to keep the diff surface aligned with the one caller that's about to start benefiting (POI renders for the spaghetti_western LoRA work). tests/test_render_common.py - 6 tests for resolve_lora_args: no-LoRA, legacy-only, default-scale, resolved-only, both-present-raises, empty-resolved-falls-through. - Rounds the file out to 20 tests covering the full Task 4.3-4.4 surface end-to-end. 29/29 fast tests green. Task 4.5 (lora/ directory rename) and Task 4.6 (YAML schema migration) remain — both blocked: 4.5 conflicts with an in-flight uncommitted local change to dust_and_lead/visual_style.yaml in sidequest-content, and 4.6 needs the Phase 2 trained LoRA that 4.6's schema would reference. Both unblock together. * chore(render): thread tier= through portrait + creature scripts Symmetric with the POI script change in 7340f0f. Each script knows its own tier (portrait, landscape, etc.) and passes it to load_visual_style so resolved_loras populates ahead of the per-render resolve_lora_args call. Pre-migration YAMLs produce an empty resolved_loras list which falls through to the legacy `lora:` path inside the helper — no behavior change today, full plumbing for once Phase 2's portrait LoRAs are trained and the YAMLs migrate to the loras: schema. 29/29 fast tests still green. Smoke-tested against the elemental_harmony genre pack across all three tiers — no regressions. * docs(skills): Phase 5 Task 5.1 — rewrite /sq-lora for the mlx pipeline Drops the Draw Things CLI workflow entirely and points the skill at the new pipeline that landed in Phases 1-4: Step 0 — environment check (mlx-examples checkout + venv + mflux pin) Step 1 — image collection (unchanged in spirit; snake_case dirs) Step 2 — paired .txt captions (matches scripts/lora/train.py preflight) Step 3 — preflight gate (scripts/lora/train.py::preflight_dataset) Step 4 — train: convert paired .txt → train.jsonl, then run ~/Projects/mlx-examples/flux/dreambooth.py (rank 8, 1500 iter) Step 5 — remap (scripts/lora/remap_mlx_to_mflux.py) + roundtrip test (tests/lora/test_remap_roundtrip.py — the silent-fallback detector) + OTEL render.lora.matched_keys check Step 6 — wire into visual_style.yaml under the loras: schema (genre list-form + world {exclude, add} dict-form, matching scripts/render_common.py::compose_lora_stack) Step 7 — re-render POIs and eyeball Also surfaces: - Hard "no Draw Things, no .ckpt for new work" — legacy .ckpts now archived under lora/{genre}/archive/legacy/ (Task 4.5 layout) - The matched-key OTEL counter (Task 4.2b) as the canary for keymap drift and silent no-op LoRAs - The Architect correction #5 file path (compose_lora_stack lives in render_common.py, not a separate compose.py module) - ADR-083 + the design spec + the Phase 0 mlx-examples notes as the authoritative references Reference table updated end-to-end: Draw Things and its model dir are gone, replaced by the actual files in this repo + the mlx-examples checkout path. The "Completed LoRAs" table now distinguishes legacy .ckpt artifacts from the new mlx-pipeline .safetensors slots. Skill description in frontmatter updated so /sq-lora's listing reads "collect, tag, train via mlx-examples, remap to mflux, verify on the daemon" — matches the new flow. * fix(skills): /sq-lora — style LoRA captions are one tag, not many Caught during the first real run-through (genre=spaghetti_western, tier=landscape) by Keith. The Step 2 captioning section I wrote copied the subject-LoRA convention (trigger + style tags + per-image subject tags), which is exactly wrong for a style LoRA. Corrected per kohya-ss / ai-toolkit / civitai community consensus (verified via Perplexity citation chain — see commit conversation): Style LoRA → every .txt is the bare trigger token, identical across the dataset. Genre-level uses the genre name (spaghetti_western); world-level uses the world name (the_real_mccoy). No descriptors, no subject tags. Subject LoRA → detailed natural-language captions per image with trigger + descriptive tags. SideQuest hasn't trained any subject LoRAs yet; flagged in skill as a separate flow if/when one is needed. Also fixed two downstream bugs: - Step 6 visual_style.yaml example showed `trigger: {genre}_{tier}` (e.g., `spaghetti_western_landscape`); should be the bare genre/ world name to match what the model actually trained on. The `name` field still includes the tier suffix because it's the internal handle compose_lora_stack uses to address entries. - Step 3 manual checklist said "every caption starts with the trigger word" — replaced with caption-purity check that explicitly forbids any extra text past the trigger token, with a spot-check one-liner. The why-it-matters: per-image subject tags actively poison a style LoRA. The model starts attributing the trained register's effect to "saloon" or "film_grain" instead of to the bare trigger, and prompts using those tokens with no LoRA loaded will partially fire the style. The legacy `leone_style` LoRA's existing 72 captions show the prior generation of this mistake — 13 tags per image, all identical, mixing trigger + style descriptors. Salvageable as a starting point for the new run but the captions need a full rewrite to a single token before training. Skill description in frontmatter unchanged; the listing already says "collect, tag, train via mlx-examples, remap to mflux, verify on the daemon" which still fits. * feat(design): sidequest gameboard prototype + frontend-redesign mockups - design_handoff_sidequest_gameboard/ — React prototype (app.jsx, components.jsx, styles.css, mock data) plus README documenting the gameboard concept handoff. - docs/design/frontend-redesign-mockups.md — design document. - docs/design/mockups/ — three HTML direction explorations (DM screen, folio, stage) plus an index. Design-only; no app wiring changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…l widening Widens scripts/render_common.send_render() from singleton (lora_path: str, lora_scale: float) to arrays (lora_paths: list[str] | None, lora_scales: list[float] | None). No compat shim — per ADR-083, the daemon is a sidecar with a small known consumer surface; a dual-protocol shim would outlast the cutover. Also updates the sole caller (render_batch → send_render) to promote any legacy flat `lora:` / `lora_scale:` visual_style.yaml keys into single-entry arrays. This keeps the existing generate_poi_images.py working throughout the migration; Task 4.4 replaces the transitional shim with compose_lora_stack() once the extend/exclude/add schema lands. Pulled forward from Phase 4 per Architect correction #4 — Task 1.5's end-to-end remapper proof needs this protocol to test against. Validation: - asymmetric length check: lora_scales count must match lora_paths - legacy params lora_path/lora_scale removed from signature Tests: 3 new tests in tests/test_render_common.py, 10/10 suite GREEN. Refs: ADR-083, plan docs/superpowers/plans/2026-04-20-lora-pipeline.md §Task 4.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(sprint): mark story 1-9 as done

066e0c8

Co-Authored-By: Claude <noreply@anthropic.com>

slabgorb merged commit ced3639 into main Mar 25, 2026

slabgorb deleted the chore/sprint-1-9-done branch March 25, 2026 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(sprint): mark story 1-9 as done#4

chore(sprint): mark story 1-9 as done#4
slabgorb merged 1 commit intomainfrom
chore/sprint-1-9-done

slabgorb commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

slabgorb commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant