fix(render): prefer visual_prompt, drop narrator prose from POI composition by slabgorb · Pull Request #108 · slabgorb/sidequest

slabgorb · 2026-04-21T19:08:43Z

Summary

scripts/generate_poi_images.py now uses a POI's visual_prompt field verbatim when present, falling back to {name}: {description} when absent.
Chapter-level atmosphere and location (narrator-facing prose) no longer leak into Flux prompts — they were the source of 80-130 tokens of unrenderable literary metaphor on every POI.
Dry-run against spaghetti_western/dust_and_lead now produces 87-97 token prompts (was 187-234) while preserving the art-director's compositional intent.

Per .session/handoff-2026-04-21-poi-prompt-composition-dev.md. Related ADRs: 083, 084.

Test plan

AC feat: add genre packs and daemon orchestrator recipes #1 — visual_prompt preference: Sangre del Paso Main Street renders as bare spaghetti_western, ... with no name prefix, no atmosphere, no slug
AC chore(sprint): mark stories 1-1 and 1-6 as done #2 — fallback path for POIs without visual_prompt (the_real_mccoy) composes cleanly
AC chore(sprint): mark story 1-13 as done #3 — token regression: all 16 dust_and_lead POIs in 87-97 token range
AC docs: add top-level README #5 — live-daemon render with OTEL render.lora.matched_keys=[608] — requires workstation with daemon up

Spec for moving off Draw Things-trained LoRAs (which mflux silently ignores) to mlx-examples + custom key-remapper, with hybrid genre/world LoRA stacking, OTEL-backed verification, and a top-down rollout plan. Covers: dataset layout, trainer wrapper, key remapper (mlx-npz -> mflux-safetensors), storage conventions, visual_style.yaml multi-LoRA schema with extend+exclude inheritance, verification gate (key-match + SSIM + trigger-discrimination), daemon multi-LoRA runtime changes, and a 6-phase migration plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

TDD-structured task breakdown for the spec at docs/superpowers/specs/2026-04-20-lora-pipeline-design.md. 6 phases, 23 tasks, 108 steps. Phase 0 captures mlx-examples output as prerequisite knowledge for Phase 1 (the remapper). Phase 2 is the top-down proof: one trained LoRA end-to-end before any infrastructure scaffolding. Phase 3 formalizes the verification gate; no second LoRA ships until it is live. Phase 4 widens daemon protocol to multi-LoRA with OTEL instrumentation. Phase 5 proves the pipeline on the second world (the_real_mccoy) to validate extend+exclude+add inheritance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures four architectural decisions extending ADR-032: 1. MLX-native training (mlx-examples/flux) + custom key-remapper as mandatory pipeline stage. Alternatives (Draw Things, Kohya, ai-toolkit) rejected with rationale. 2. Hybrid genre + world LoRA stacking with extend/exclude/add merge semantics in visual_style.yaml. Zero-config for common case; explicit for divergent worlds like the_real_mccoy. 3. Two-layer verification: SSIM-based pre-promotion gate + runtime matched_key_count as attribute on the existing flux_mlx.render OTEL span. Closes the silent-fallback trap from both ends. 4. Protocol widening to lora_paths[]/lora_scales[]; clean cutover, no compat shim. Notes load-bearing constraints (mflux>=0.4,<0.5 pin, empirical SSIM thresholds calibrated in plan Phase 3) and consequences (keymap YAML becomes versioned artifact, mflux private-API dependency). Proposed. Extends ADR-032. Supersedes nothing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Establishes orchestrator-root Python project for the LoRA pipeline tooling (per the plan at docs/superpowers/plans/2026-04-20-lora-pipeline.md). Files: - pyproject.toml — minimal declaration (numpy, pyyaml; pytest in dev extras) - scripts/__init__.py + scripts/lora/__init__.py — package markers so `from scripts.lora.X import Y` resolves in tests - tests/__init__.py + tests/lora/__init__.py — test package markers - tests/lora/conftest.py — skeletal toy_npz_path + sample_keymap_path fixtures. Real MLX key patterns filled by Task 1.4 once Phase 0 captures them in docs/superpowers/notes/. Verification: `pytest tests/lora/ --collect-only` runs clean with configfile=pyproject.toml. Refs: ADR-083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Minimal TDD increment: one test, one failure-mode path. scripts/lora/remap_mlx_to_mflux.py - RemapError exception - _load_keymap / _match_rule helpers (ready for Task 1.4 happy path) - remap_npz_to_safetensors detects unmapped keys, raises RemapError with every offending key named. Does not write output. - Happy-path write raises NotImplementedError — Task 1.4 fills it in once Phase 0 keymap data lands. tests/lora/test_remap.py - test_unknown_mlx_key_hard_fails: writes a .npz with one bogus key, asserts RemapError names it, asserts no partial safetensors written. Per the no-silent-fallback rule: mapping gaps are loud, never dropped. Refs: ADR-083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

scripts/lora/train.py (preflight surface only) - PreflightError exception - preflight_dataset(dir) validates: * directory exists * every image has a same-stem .txt caption (hard-fail on unpaired) * at least MIN_IMAGES=150 pairs (matches /sq-lora Step 3 floor) - CLI + subprocess invocation deferred to Task 2.3 when the first real overnight training exercises them end-to-end. tests/lora/test_train.py - 4 tests cover missing dir, unpaired files, low volume, valid dataset. Full lora test suite: 5/5 GREEN. No regression on Task 1.3. Refs: ADR-083. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes Phase 0 of the LoRA pipeline plan. Captures: - Commit SHA 796f5b5 of ml-explore/mlx-examples - Trainer entrypoint: flux/dreambooth.py - Output format: CORRECTION — safetensors, NOT .npz as plan assumed. Plan Tasks 1.3, 1.4, 1.5 need fixtures swapped from np.savez/np.load to safetensors.torch. Task 1.4 handles the correction. - Observed 22 distinct module-path patterns, 608 keys at rank 4 / --lora-blocks -1 on flux-dev (19 double + 38 single blocks) - Lowercase .lora_a / .lora_b (not lora_A/B, not lora_down/up) - Shape convention: lora_a = (input_dim, rank); lora_b = (rank, output_dim) ↔ Kohya reverses both axes — every keymap rule needs transpose: true - Complete MLX→Kohya name translation table (26 rows) - Dataset format: train.jsonl (not paired .jpg+.txt) — plan Task 2.1 needs a jsonl conversion step - Empirical: 48.789 GB peak RAM, ~4.4s/iter at rank 4 on M3 Max 128GB Toy artifact preserved at ~/mlx-toy-lora/final_adapters.safetensors for plan Task 1.5's end-to-end render proof. Refs: ADR-083, docs/superpowers/plans/2026-04-20-lora-pipeline.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

26 rules covering every MLX-examples Flux LoRA key pattern observed in Phase 0. 13 distinct module types × 2 directions (lora_a→lora_down, lora_b→lora_up): Double blocks (10 modules): img_attn.proj, img_attn.qkv, img_mlp.layers.{0,2}, img_mod.lin, txt_attn.proj, txt_attn.qkv, txt_mlp.layers.{0,2}, txt_mod.lin. Single blocks (3 modules): linear1, linear2, modulation.lin. Every rule has transpose: true — MLX stores lora_a as (input, rank) and lora_b as (rank, output); Kohya/mflux expect the reverse of both axes. Verification against Phase 0 toy safetensors (~/mlx-toy-lora/): 608 keys, 0 unmatched. Per-rule counts: 19 matches for each double-block rule (flux-dev has 19 double blocks), 38 matches for each single-block rule (flux-dev has 38 single blocks). Perfect coverage. Refs: ADR-083, docs/superpowers/notes/2026-04-20-mlx-examples-flux-notes.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ection Applies the Phase 0 correction: mlx-examples outputs safetensors, not .npz. The remapper function renames + swaps its I/O path accordingly. scripts/lora/remap_mlx_to_mflux.py - remap_npz_to_safetensors → remap_mlx_safetensors (rename reflects corrected input type) - I/O: safetensors.safe_open / safetensors.torch.save_file (was numpy.load / numpy.savez) - Happy-path implementation: iterate input keys, translate via keymap, transpose axes [0, 1] per rule, accumulate into output dict, write via safetensors.torch.save_file - Rank detection by min dim of any lora_down weight - Unknown-key hard-fail preserved (Task 1.3 behavior) tests/lora/conftest.py - Fixture rename: toy_npz_path → toy_safetensors_path with real MLX key patterns at small shapes (rank 4, hidden 32) - New fixture: real_keymap_path points at production keymap so happy-path tests catch keymap drift - Fixture rename: sample_keymap_path → empty_keymap_path (always had zero rules; rename makes intent explicit) tests/lora/test_remap.py - Migrated test_unknown_mlx_key_hard_fails to safetensors I/O - NEW test_happy_path_translates_known_keys: 4 MLX keys translate to 4 correctly-named Kohya keys - NEW test_transpose_flips_axes: verifies (input, rank) → (rank, input) and value-preservation pyproject.toml: add safetensors>=0.4 and torch>=2.0 runtime deps. End-to-end smoke on the real Phase 0 toy artifact (77MB, 608 keys): remaps cleanly to 608 Kohya-named keys, rank=4 detected, first lora_down shape (4, 3072), first lora_up shape (3072, 4). Test suite: 7/7 GREEN. Refs: ADR-083, docs/superpowers/notes/2026-04-20-mlx-examples-flux-notes.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…l widening Widens scripts/render_common.send_render() from singleton (lora_path: str, lora_scale: float) to arrays (lora_paths: list[str] | None, lora_scales: list[float] | None). No compat shim — per ADR-083, the daemon is a sidecar with a small known consumer surface; a dual-protocol shim would outlast the cutover. Also updates the sole caller (render_batch → send_render) to promote any legacy flat `lora:` / `lora_scale:` visual_style.yaml keys into single-entry arrays. This keeps the existing generate_poi_images.py working throughout the migration; Task 4.4 replaces the transitional shim with compose_lora_stack() once the extend/exclude/add schema lands. Pulled forward from Phase 4 per Architect correction #4 — Task 1.5's end-to-end remapper proof needs this protocol to test against. Validation: - asymmetric length check: lora_scales count must match lora_paths - legacy params lora_path/lora_scale removed from signature Tests: 3 new tests in tests/test_render_common.py, 10/10 suite GREEN. Refs: ADR-083, plan docs/superpowers/plans/2026-04-20-lora-pipeline.md §Task 4.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The end-to-end roundtrip test caught a real silent fallback in the v1 keymap: mflux's BFL loader splits the lora_down rank-axis into N chunks (3 for double_blocks img/txt_attn_qkv, 4 for single_blocks linear1) and applies them per Q/K/V/MLP target. mlx-examples trains those layers with a SHARED rank-r down matrix, so the v1 keymap's transpose-only output gave mflux a rank-4 down that it sliced into rank-1 chunks — while the matching lora_up stayed rank-4. Result: a (1, seq, 1) @ (4, 3072) matmul fail at runtime, exactly the silent-fallback the test was built to detect. Fix: keymap gains a `replicate: N` field (default 1). The remapper tiles the transposed lora_down N times along axis 0 for fused layers, so each slice mflux extracts equals the original shared D — mathematically the same delta as the fused mlx form, because (x @ D) @ U_split equals split[(x @ D) @ U]. Non-fused layers (proj, mod, mlp_0/2, linear2, modulation_lin) keep replicate=1 — the old transpose path is unchanged. Source-rank inference moved off the output tensor (which is now tiled) and onto the source lora_a, where min(shape) is the true rank. Tests: two new unit tests verify the tile shape and per-chunk equality to the original transposed D; the roundtrip test renders the toy LoRA twice (with/without) and asserts visible pixel divergence — 169.83s wall clock against a warm daemon, 608/608 keys matched, 494 layers patched with no errors. Keymap version bumped 1 → 2.

Adds the pure-function resolver that merges genre+world visual_style.yaml LoRA blocks into a flat per-render stack, filtered by tier. Per Architect correction #5, lives in scripts/render_common.py rather than a separate scripts/lora/compose.py — it's small, has no I/O, and the call sites (load_visual_style, render_batch) all live in render_common already. Schema (matches docs/superpowers/specs/2026-04-20-lora-pipeline-design.md): # genre — list of entries loras: - name: sw_landscape file: lora/spaghetti_western/sw_landscape.safetensors scale: 0.8 applies_to: [landscape, scene] trigger: sw_landscape # optional # world — dict with exclude/add loras: exclude: [sw_landscape] # drop inherited entries by name add: # append world-only entries - name: ... Composition order: genre entries (after exclude filter) come first, then world `add:` entries in declared order, then the whole flat list is filtered by `tier in entry["applies_to"]`. Hard-fails (per "no silent fallback") on: - missing required fields (name, file, scale, applies_to) - empty applies_to (an entry that fires on no tier is always misconfiguration — never useful, always a typo) - world.loras given as a list (legacy v1-schema file that hasn't been migrated; refusing to silently inherit it forces explicit migration) - world.add reusing a genre name without first excluding it (ambiguous intent — operator must say which one wins) Seven tests in tests/test_render_common.py cover: no-overrides inheritance, exclude-drops, add-appends, tier filtering, duplicate-name rejection, empty-applies_to rejection, and legacy list-form rejection. All green. Wiring lands in Task 4.4 (load_visual_style returns resolved_loras) and Task 4.6 (visual_style.yaml migration to the loras: schema). The transitional legacy promotion in render_batch (lines ~263-272) stays in place until 4.4 lands the wiring switchover.

…sual_style Adds an optional `tier=` keyword to load_visual_style. When supplied, the function calls compose_lora_stack(genre_style, world_style, tier) and exposes the result as `merged["resolved_loras"]`. Without `tier=`, the return shape is unchanged from before — the three existing callers (generate_portrait_images.py, generate_creature_images.py, generate_poi_images.py) keep working without modification until Task 4.6 migrates the YAML files and updates them to opt into the new path. Two semantic changes worth pinning: 1. The `loras:` key is intentionally skipped during the field-by-field world overlay. Genre uses list-form, world uses dict-form `{exclude, add}`; a naive overlay would replace the genre list with the world dict and silently drop every genre LoRA. The merged dict keeps the *genre* list as `merged["loras"]`, and resolution happens through the dedicated `compose_lora_stack` path. A test pins this down so a future "let me clean this up" refactor doesn't reintroduce the trap. 2. Backward compat over plan-literal: the original Task 4.4 plan called for changing the signature to `(*, genre_dir, world_dir, tier)` — keyword-only, world_dir as Path, tier required. That breaks all three existing callers in this commit. Per spec authority hierarchy (spec > plan when they conflict), the design spec only mandates the schema and the resolved_loras key, not the function signature. The backward-compat keyword-only `tier=` matches the spec without forcing a coordinated multi-script update; the callers migrate in 4.6 alongside their YAML files. Four new tests in tests/test_render_common.py cover: tier-omitted backward compat, genre-only resolution, world-exclude end-to-end, and the loras-key-not-clobbered invariant. 23/23 fast tests pass. The transitional legacy `lora:` / `lora_scale:` flat-key promotion in render_batch (lines ~263-272) stays in place — it'll come off in 4.6 when the YAMLs are migrated to the loras: schema and the callers start passing tier=.

…tier The follow-on to Task 4.4. load_visual_style now produces `resolved_loras` when `tier=` is passed, but the consumer side (render_batch) was still reading the legacy flat `lora:` key inline. This commit lands the wiring switchover with a small, testable helper. scripts/render_common.py - New `resolve_lora_args(visual_style)` — pure function, returns the `(lora_paths, lora_scales)` tuple to hand to send_render. Three cases: resolved_loras present (preferred), legacy `lora:` flat key (transitional), neither (no LoRA). Hard-fails when both schemas are present in the same dict — the merged shape can't represent both, and silently picking one would mask a YAML mid-migration error. - render_batch now calls the helper instead of inlining the legacy promotion. The transitional comment in render_batch is gone — the transition lives in the helper's docstring where it's testable. scripts/generate_poi_images.py - load_visual_style call now passes tier="landscape" so that, once the spaghetti_western YAMLs migrate to the loras: schema in Task 4.6, POI renders pick up the resolved stack. Pre-migration YAMLs (no loras: block) get an empty resolved_loras list which naturally falls through to the legacy `lora:` path inside resolve_lora_args. No behavior change today, full plumbing for tomorrow. - Portrait and creature scripts will get the same tier= treatment when their YAMLs migrate; not touched here to keep the diff surface aligned with the one caller that's about to start benefiting (POI renders for the spaghetti_western LoRA work). tests/test_render_common.py - 6 tests for resolve_lora_args: no-LoRA, legacy-only, default-scale, resolved-only, both-present-raises, empty-resolved-falls-through. - Rounds the file out to 20 tests covering the full Task 4.3-4.4 surface end-to-end. 29/29 fast tests green. Task 4.5 (lora/ directory rename) and Task 4.6 (YAML schema migration) remain — both blocked: 4.5 conflicts with an in-flight uncommitted local change to dust_and_lead/visual_style.yaml in sidequest-content, and 4.6 needs the Phase 2 trained LoRA that 4.6's schema would reference. Both unblock together.

Symmetric with the POI script change in 7340f0f. Each script knows its own tier (portrait, landscape, etc.) and passes it to load_visual_style so resolved_loras populates ahead of the per-render resolve_lora_args call. Pre-migration YAMLs produce an empty resolved_loras list which falls through to the legacy `lora:` path inside the helper — no behavior change today, full plumbing for once Phase 2's portrait LoRAs are trained and the YAMLs migrate to the loras: schema. 29/29 fast tests still green. Smoke-tested against the elemental_harmony genre pack across all three tiers — no regressions.

Drops the Draw Things CLI workflow entirely and points the skill at the new pipeline that landed in Phases 1-4: Step 0 — environment check (mlx-examples checkout + venv + mflux pin) Step 1 — image collection (unchanged in spirit; snake_case dirs) Step 2 — paired .txt captions (matches scripts/lora/train.py preflight) Step 3 — preflight gate (scripts/lora/train.py::preflight_dataset) Step 4 — train: convert paired .txt → train.jsonl, then run ~/Projects/mlx-examples/flux/dreambooth.py (rank 8, 1500 iter) Step 5 — remap (scripts/lora/remap_mlx_to_mflux.py) + roundtrip test (tests/lora/test_remap_roundtrip.py — the silent-fallback detector) + OTEL render.lora.matched_keys check Step 6 — wire into visual_style.yaml under the loras: schema (genre list-form + world {exclude, add} dict-form, matching scripts/render_common.py::compose_lora_stack) Step 7 — re-render POIs and eyeball Also surfaces: - Hard "no Draw Things, no .ckpt for new work" — legacy .ckpts now archived under lora/{genre}/archive/legacy/ (Task 4.5 layout) - The matched-key OTEL counter (Task 4.2b) as the canary for keymap drift and silent no-op LoRAs - The Architect correction #5 file path (compose_lora_stack lives in render_common.py, not a separate compose.py module) - ADR-083 + the design spec + the Phase 0 mlx-examples notes as the authoritative references Reference table updated end-to-end: Draw Things and its model dir are gone, replaced by the actual files in this repo + the mlx-examples checkout path. The "Completed LoRAs" table now distinguishes legacy .ckpt artifacts from the new mlx-pipeline .safetensors slots. Skill description in frontmatter updated so /sq-lora's listing reads "collect, tag, train via mlx-examples, remap to mflux, verify on the daemon" — matches the new flow.

Caught during the first real run-through (genre=spaghetti_western, tier=landscape) by Keith. The Step 2 captioning section I wrote copied the subject-LoRA convention (trigger + style tags + per-image subject tags), which is exactly wrong for a style LoRA. Corrected per kohya-ss / ai-toolkit / civitai community consensus (verified via Perplexity citation chain — see commit conversation): Style LoRA → every .txt is the bare trigger token, identical across the dataset. Genre-level uses the genre name (spaghetti_western); world-level uses the world name (the_real_mccoy). No descriptors, no subject tags. Subject LoRA → detailed natural-language captions per image with trigger + descriptive tags. SideQuest hasn't trained any subject LoRAs yet; flagged in skill as a separate flow if/when one is needed. Also fixed two downstream bugs: - Step 6 visual_style.yaml example showed `trigger: {genre}_{tier}` (e.g., `spaghetti_western_landscape`); should be the bare genre/ world name to match what the model actually trained on. The `name` field still includes the tier suffix because it's the internal handle compose_lora_stack uses to address entries. - Step 3 manual checklist said "every caption starts with the trigger word" — replaced with caption-purity check that explicitly forbids any extra text past the trigger token, with a spot-check one-liner. The why-it-matters: per-image subject tags actively poison a style LoRA. The model starts attributing the trained register's effect to "saloon" or "film_grain" instead of to the bare trigger, and prompts using those tokens with no LoRA loaded will partially fire the style. The legacy `leone_style` LoRA's existing 72 captions show the prior generation of this mistake — 13 tags per image, all identical, mixing trigger + style descriptors. Salvageable as a starting point for the new run but the captions need a full rewrite to a single token before training. Skill description in frontmatter unchanged; the listing already says "collect, tag, train via mlx-examples, remap to mflux, verify on the daemon" which still fits.

…sition scripts/generate_poi_images.py now uses a POI's visual_prompt field verbatim when present (single-trigger + renderable-subject convention per ADR-083), falling back to "{name}: {description}" only when the art-director has not yet authored one. Chapter-level atmosphere and location are narrator-facing prose and no longer leak into Flux prompts — they were the source of 80-130 tokens of unrenderable literary metaphor on every POI. Dry-run against spaghetti_western/dust_and_lead now produces 87-97 token prompts (was 187-234) while preserving the art-director's compositional intent. Per handoff-2026-04-21-poi-prompt-composition-dev.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

slabgorb and others added 18 commits April 20, 2026 17:27

slabgorb merged commit 940652e into main Apr 21, 2026

slabgorb deleted the feat/poi-prompt-composition branch April 21, 2026 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(render): prefer visual_prompt, drop narrator prose from POI composition#108

fix(render): prefer visual_prompt, drop narrator prose from POI composition#108
slabgorb merged 18 commits intomainfrom
feat/poi-prompt-composition

slabgorb commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

slabgorb commented Apr 21, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant