Skip to content

fix(render): prefer visual_prompt, drop narrator prose from POI composition#108

Merged
slabgorb merged 18 commits intomainfrom
feat/poi-prompt-composition
Apr 21, 2026
Merged

fix(render): prefer visual_prompt, drop narrator prose from POI composition#108
slabgorb merged 18 commits intomainfrom
feat/poi-prompt-composition

Conversation

@slabgorb
Copy link
Copy Markdown
Owner

Summary

  • scripts/generate_poi_images.py now uses a POI's visual_prompt field verbatim when present, falling back to {name}: {description} when absent.
  • Chapter-level atmosphere and location (narrator-facing prose) no longer leak into Flux prompts — they were the source of 80-130 tokens of unrenderable literary metaphor on every POI.
  • Dry-run against spaghetti_western/dust_and_lead now produces 87-97 token prompts (was 187-234) while preserving the art-director's compositional intent.

Per .session/handoff-2026-04-21-poi-prompt-composition-dev.md. Related ADRs: 083, 084.

Test plan

slabgorb and others added 18 commits April 20, 2026 17:27
Spec for moving off Draw Things-trained LoRAs (which mflux silently
ignores) to mlx-examples + custom key-remapper, with hybrid genre/world
LoRA stacking, OTEL-backed verification, and a top-down rollout plan.

Covers: dataset layout, trainer wrapper, key remapper (mlx-npz ->
mflux-safetensors), storage conventions, visual_style.yaml multi-LoRA
schema with extend+exclude inheritance, verification gate (key-match +
SSIM + trigger-discrimination), daemon multi-LoRA runtime changes, and
a 6-phase migration plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TDD-structured task breakdown for the spec at
docs/superpowers/specs/2026-04-20-lora-pipeline-design.md.

6 phases, 23 tasks, 108 steps. Phase 0 captures mlx-examples output
as prerequisite knowledge for Phase 1 (the remapper). Phase 2 is the
top-down proof: one trained LoRA end-to-end before any infrastructure
scaffolding. Phase 3 formalizes the verification gate; no second LoRA
ships until it is live. Phase 4 widens daemon protocol to multi-LoRA
with OTEL instrumentation. Phase 5 proves the pipeline on the second
world (the_real_mccoy) to validate extend+exclude+add inheritance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures four architectural decisions extending ADR-032:

1. MLX-native training (mlx-examples/flux) + custom key-remapper as
   mandatory pipeline stage. Alternatives (Draw Things, Kohya,
   ai-toolkit) rejected with rationale.
2. Hybrid genre + world LoRA stacking with extend/exclude/add merge
   semantics in visual_style.yaml. Zero-config for common case;
   explicit for divergent worlds like the_real_mccoy.
3. Two-layer verification: SSIM-based pre-promotion gate + runtime
   matched_key_count as attribute on the existing flux_mlx.render
   OTEL span. Closes the silent-fallback trap from both ends.
4. Protocol widening to lora_paths[]/lora_scales[]; clean cutover,
   no compat shim.

Notes load-bearing constraints (mflux>=0.4,<0.5 pin, empirical SSIM
thresholds calibrated in plan Phase 3) and consequences (keymap YAML
becomes versioned artifact, mflux private-API dependency).

Proposed. Extends ADR-032. Supersedes nothing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes orchestrator-root Python project for the LoRA pipeline
tooling (per the plan at docs/superpowers/plans/2026-04-20-lora-pipeline.md).

Files:
- pyproject.toml — minimal declaration (numpy, pyyaml; pytest in dev extras)
- scripts/__init__.py + scripts/lora/__init__.py — package markers so
  `from scripts.lora.X import Y` resolves in tests
- tests/__init__.py + tests/lora/__init__.py — test package markers
- tests/lora/conftest.py — skeletal toy_npz_path + sample_keymap_path
  fixtures. Real MLX key patterns filled by Task 1.4 once Phase 0
  captures them in docs/superpowers/notes/.

Verification: `pytest tests/lora/ --collect-only` runs clean with
configfile=pyproject.toml.

Refs: ADR-083.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Minimal TDD increment: one test, one failure-mode path.

scripts/lora/remap_mlx_to_mflux.py
- RemapError exception
- _load_keymap / _match_rule helpers (ready for Task 1.4 happy path)
- remap_npz_to_safetensors detects unmapped keys, raises RemapError
  with every offending key named. Does not write output.
- Happy-path write raises NotImplementedError — Task 1.4 fills it in
  once Phase 0 keymap data lands.

tests/lora/test_remap.py
- test_unknown_mlx_key_hard_fails: writes a .npz with one bogus key,
  asserts RemapError names it, asserts no partial safetensors written.

Per the no-silent-fallback rule: mapping gaps are loud, never dropped.

Refs: ADR-083.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/lora/train.py (preflight surface only)
- PreflightError exception
- preflight_dataset(dir) validates:
    * directory exists
    * every image has a same-stem .txt caption (hard-fail on unpaired)
    * at least MIN_IMAGES=150 pairs (matches /sq-lora Step 3 floor)
- CLI + subprocess invocation deferred to Task 2.3 when the first
  real overnight training exercises them end-to-end.

tests/lora/test_train.py
- 4 tests cover missing dir, unpaired files, low volume, valid dataset.

Full lora test suite: 5/5 GREEN. No regression on Task 1.3.

Refs: ADR-083.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes Phase 0 of the LoRA pipeline plan. Captures:

- Commit SHA 796f5b5 of ml-explore/mlx-examples
- Trainer entrypoint: flux/dreambooth.py
- Output format: CORRECTION — safetensors, NOT .npz as plan assumed.
  Plan Tasks 1.3, 1.4, 1.5 need fixtures swapped from np.savez/np.load
  to safetensors.torch. Task 1.4 handles the correction.
- Observed 22 distinct module-path patterns, 608 keys at rank 4 /
  --lora-blocks -1 on flux-dev (19 double + 38 single blocks)
- Lowercase .lora_a / .lora_b (not lora_A/B, not lora_down/up)
- Shape convention: lora_a = (input_dim, rank); lora_b = (rank, output_dim)
  ↔ Kohya reverses both axes — every keymap rule needs transpose: true
- Complete MLX→Kohya name translation table (26 rows)
- Dataset format: train.jsonl (not paired .jpg+.txt) — plan Task 2.1
  needs a jsonl conversion step
- Empirical: 48.789 GB peak RAM, ~4.4s/iter at rank 4 on M3 Max 128GB

Toy artifact preserved at ~/mlx-toy-lora/final_adapters.safetensors
for plan Task 1.5's end-to-end render proof.

Refs: ADR-083, docs/superpowers/plans/2026-04-20-lora-pipeline.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
26 rules covering every MLX-examples Flux LoRA key pattern observed in
Phase 0. 13 distinct module types × 2 directions (lora_a→lora_down,
lora_b→lora_up):

Double blocks (10 modules): img_attn.proj, img_attn.qkv,
img_mlp.layers.{0,2}, img_mod.lin, txt_attn.proj, txt_attn.qkv,
txt_mlp.layers.{0,2}, txt_mod.lin.

Single blocks (3 modules): linear1, linear2, modulation.lin.

Every rule has transpose: true — MLX stores lora_a as (input, rank)
and lora_b as (rank, output); Kohya/mflux expect the reverse of both
axes.

Verification against Phase 0 toy safetensors (~/mlx-toy-lora/):
608 keys, 0 unmatched. Per-rule counts: 19 matches for each
double-block rule (flux-dev has 19 double blocks), 38 matches for
each single-block rule (flux-dev has 38 single blocks). Perfect
coverage.

Refs: ADR-083, docs/superpowers/notes/2026-04-20-mlx-examples-flux-notes.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ection

Applies the Phase 0 correction: mlx-examples outputs safetensors, not
.npz. The remapper function renames + swaps its I/O path accordingly.

scripts/lora/remap_mlx_to_mflux.py
- remap_npz_to_safetensors → remap_mlx_safetensors (rename reflects
  corrected input type)
- I/O: safetensors.safe_open / safetensors.torch.save_file
  (was numpy.load / numpy.savez)
- Happy-path implementation: iterate input keys, translate via keymap,
  transpose axes [0, 1] per rule, accumulate into output dict, write
  via safetensors.torch.save_file
- Rank detection by min dim of any lora_down weight
- Unknown-key hard-fail preserved (Task 1.3 behavior)

tests/lora/conftest.py
- Fixture rename: toy_npz_path → toy_safetensors_path with real MLX
  key patterns at small shapes (rank 4, hidden 32)
- New fixture: real_keymap_path points at production keymap so
  happy-path tests catch keymap drift
- Fixture rename: sample_keymap_path → empty_keymap_path (always had
  zero rules; rename makes intent explicit)

tests/lora/test_remap.py
- Migrated test_unknown_mlx_key_hard_fails to safetensors I/O
- NEW test_happy_path_translates_known_keys: 4 MLX keys translate to
  4 correctly-named Kohya keys
- NEW test_transpose_flips_axes: verifies (input, rank) → (rank, input)
  and value-preservation

pyproject.toml: add safetensors>=0.4 and torch>=2.0 runtime deps.

End-to-end smoke on the real Phase 0 toy artifact (77MB, 608 keys):
remaps cleanly to 608 Kohya-named keys, rank=4 detected, first
lora_down shape (4, 3072), first lora_up shape (3072, 4).

Test suite: 7/7 GREEN.

Refs: ADR-083, docs/superpowers/notes/2026-04-20-mlx-examples-flux-notes.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l widening

Widens scripts/render_common.send_render() from singleton
(lora_path: str, lora_scale: float) to arrays (lora_paths: list[str] | None,
lora_scales: list[float] | None). No compat shim — per ADR-083, the
daemon is a sidecar with a small known consumer surface; a dual-protocol
shim would outlast the cutover.

Also updates the sole caller (render_batch → send_render) to promote
any legacy flat `lora:` / `lora_scale:` visual_style.yaml keys into
single-entry arrays. This keeps the existing generate_poi_images.py
working throughout the migration; Task 4.4 replaces the transitional
shim with compose_lora_stack() once the extend/exclude/add schema lands.

Pulled forward from Phase 4 per Architect correction #4 — Task 1.5's
end-to-end remapper proof needs this protocol to test against.

Validation:
- asymmetric length check: lora_scales count must match lora_paths
- legacy params lora_path/lora_scale removed from signature

Tests: 3 new tests in tests/test_render_common.py, 10/10 suite GREEN.

Refs: ADR-083, plan docs/superpowers/plans/2026-04-20-lora-pipeline.md §Task 4.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The end-to-end roundtrip test caught a real silent fallback in the v1
keymap: mflux's BFL loader splits the lora_down rank-axis into N chunks
(3 for double_blocks img/txt_attn_qkv, 4 for single_blocks linear1) and
applies them per Q/K/V/MLP target. mlx-examples trains those layers with
a SHARED rank-r down matrix, so the v1 keymap's transpose-only output
gave mflux a rank-4 down that it sliced into rank-1 chunks — while the
matching lora_up stayed rank-4. Result: a (1, seq, 1) @ (4, 3072) matmul
fail at runtime, exactly the silent-fallback the test was built to detect.

Fix: keymap gains a `replicate: N` field (default 1). The remapper tiles
the transposed lora_down N times along axis 0 for fused layers, so each
slice mflux extracts equals the original shared D — mathematically the
same delta as the fused mlx form, because (x @ D) @ U_split equals
split[(x @ D) @ U]. Non-fused layers (proj, mod, mlp_0/2, linear2,
modulation_lin) keep replicate=1 — the old transpose path is unchanged.

Source-rank inference moved off the output tensor (which is now tiled)
and onto the source lora_a, where min(shape) is the true rank.

Tests: two new unit tests verify the tile shape and per-chunk equality
to the original transposed D; the roundtrip test renders the toy LoRA
twice (with/without) and asserts visible pixel divergence — 169.83s
wall clock against a warm daemon, 608/608 keys matched, 494 layers
patched with no errors.

Keymap version bumped 1 → 2.
Adds the pure-function resolver that merges genre+world visual_style.yaml
LoRA blocks into a flat per-render stack, filtered by tier. Per Architect
correction #5, lives in scripts/render_common.py rather than a separate
scripts/lora/compose.py — it's small, has no I/O, and the call sites
(load_visual_style, render_batch) all live in render_common already.

Schema (matches docs/superpowers/specs/2026-04-20-lora-pipeline-design.md):

  # genre — list of entries
  loras:
    - name: sw_landscape
      file: lora/spaghetti_western/sw_landscape.safetensors
      scale: 0.8
      applies_to: [landscape, scene]
      trigger: sw_landscape   # optional

  # world — dict with exclude/add
  loras:
    exclude: [sw_landscape]   # drop inherited entries by name
    add:                      # append world-only entries
      - name: ...

Composition order: genre entries (after exclude filter) come first, then
world `add:` entries in declared order, then the whole flat list is
filtered by `tier in entry["applies_to"]`.

Hard-fails (per "no silent fallback") on:
  - missing required fields (name, file, scale, applies_to)
  - empty applies_to (an entry that fires on no tier is always
    misconfiguration — never useful, always a typo)
  - world.loras given as a list (legacy v1-schema file that hasn't been
    migrated; refusing to silently inherit it forces explicit migration)
  - world.add reusing a genre name without first excluding it (ambiguous
    intent — operator must say which one wins)

Seven tests in tests/test_render_common.py cover: no-overrides inheritance,
exclude-drops, add-appends, tier filtering, duplicate-name rejection,
empty-applies_to rejection, and legacy list-form rejection. All green.

Wiring lands in Task 4.4 (load_visual_style returns resolved_loras) and
Task 4.6 (visual_style.yaml migration to the loras: schema). The
transitional legacy promotion in render_batch (lines ~263-272) stays in
place until 4.4 lands the wiring switchover.
…sual_style

Adds an optional `tier=` keyword to load_visual_style. When supplied, the
function calls compose_lora_stack(genre_style, world_style, tier) and
exposes the result as `merged["resolved_loras"]`. Without `tier=`, the
return shape is unchanged from before — the three existing callers
(generate_portrait_images.py, generate_creature_images.py,
generate_poi_images.py) keep working without modification until Task 4.6
migrates the YAML files and updates them to opt into the new path.

Two semantic changes worth pinning:

1. The `loras:` key is intentionally skipped during the field-by-field
   world overlay. Genre uses list-form, world uses dict-form
   `{exclude, add}`; a naive overlay would replace the genre list with
   the world dict and silently drop every genre LoRA. The merged dict
   keeps the *genre* list as `merged["loras"]`, and resolution happens
   through the dedicated `compose_lora_stack` path. A test pins this
   down so a future "let me clean this up" refactor doesn't reintroduce
   the trap.

2. Backward compat over plan-literal: the original Task 4.4 plan called
   for changing the signature to `(*, genre_dir, world_dir, tier)` —
   keyword-only, world_dir as Path, tier required. That breaks all
   three existing callers in this commit. Per spec authority hierarchy
   (spec > plan when they conflict), the design spec only mandates the
   schema and the resolved_loras key, not the function signature. The
   backward-compat keyword-only `tier=` matches the spec without
   forcing a coordinated multi-script update; the callers migrate in
   4.6 alongside their YAML files.

Four new tests in tests/test_render_common.py cover: tier-omitted
backward compat, genre-only resolution, world-exclude end-to-end, and
the loras-key-not-clobbered invariant. 23/23 fast tests pass.

The transitional legacy `lora:` / `lora_scale:` flat-key promotion in
render_batch (lines ~263-272) stays in place — it'll come off in 4.6
when the YAMLs are migrated to the loras: schema and the callers
start passing tier=.
…tier

The follow-on to Task 4.4. load_visual_style now produces
`resolved_loras` when `tier=` is passed, but the consumer side
(render_batch) was still reading the legacy flat `lora:` key inline.
This commit lands the wiring switchover with a small, testable helper.

scripts/render_common.py
  - New `resolve_lora_args(visual_style)` — pure function, returns the
    `(lora_paths, lora_scales)` tuple to hand to send_render. Three
    cases: resolved_loras present (preferred), legacy `lora:` flat
    key (transitional), neither (no LoRA). Hard-fails when both
    schemas are present in the same dict — the merged shape can't
    represent both, and silently picking one would mask a YAML
    mid-migration error.
  - render_batch now calls the helper instead of inlining the legacy
    promotion. The transitional comment in render_batch is gone — the
    transition lives in the helper's docstring where it's testable.

scripts/generate_poi_images.py
  - load_visual_style call now passes tier="landscape" so that, once
    the spaghetti_western YAMLs migrate to the loras: schema in
    Task 4.6, POI renders pick up the resolved stack. Pre-migration
    YAMLs (no loras: block) get an empty resolved_loras list which
    naturally falls through to the legacy `lora:` path inside
    resolve_lora_args. No behavior change today, full plumbing for
    tomorrow.
  - Portrait and creature scripts will get the same tier= treatment
    when their YAMLs migrate; not touched here to keep the diff
    surface aligned with the one caller that's about to start
    benefiting (POI renders for the spaghetti_western LoRA work).

tests/test_render_common.py
  - 6 tests for resolve_lora_args: no-LoRA, legacy-only, default-scale,
    resolved-only, both-present-raises, empty-resolved-falls-through.
  - Rounds the file out to 20 tests covering the full Task 4.3-4.4
    surface end-to-end.

29/29 fast tests green. Task 4.5 (lora/ directory rename) and Task 4.6
(YAML schema migration) remain — both blocked: 4.5 conflicts with an
in-flight uncommitted local change to dust_and_lead/visual_style.yaml
in sidequest-content, and 4.6 needs the Phase 2 trained LoRA that
4.6's schema would reference. Both unblock together.
Symmetric with the POI script change in 7340f0f. Each script knows its
own tier (portrait, landscape, etc.) and passes it to load_visual_style
so resolved_loras populates ahead of the per-render resolve_lora_args
call. Pre-migration YAMLs produce an empty resolved_loras list which
falls through to the legacy `lora:` path inside the helper — no
behavior change today, full plumbing for once Phase 2's portrait LoRAs
are trained and the YAMLs migrate to the loras: schema.

29/29 fast tests still green. Smoke-tested against the elemental_harmony
genre pack across all three tiers — no regressions.
Drops the Draw Things CLI workflow entirely and points the skill at the
new pipeline that landed in Phases 1-4:

  Step 0  — environment check (mlx-examples checkout + venv + mflux pin)
  Step 1  — image collection (unchanged in spirit; snake_case dirs)
  Step 2  — paired .txt captions (matches scripts/lora/train.py preflight)
  Step 3  — preflight gate (scripts/lora/train.py::preflight_dataset)
  Step 4  — train: convert paired .txt → train.jsonl, then run
            ~/Projects/mlx-examples/flux/dreambooth.py (rank 8, 1500 iter)
  Step 5  — remap (scripts/lora/remap_mlx_to_mflux.py) + roundtrip test
            (tests/lora/test_remap_roundtrip.py — the silent-fallback
            detector) + OTEL render.lora.matched_keys check
  Step 6  — wire into visual_style.yaml under the loras: schema
            (genre list-form + world {exclude, add} dict-form, matching
            scripts/render_common.py::compose_lora_stack)
  Step 7  — re-render POIs and eyeball

Also surfaces:
  - Hard "no Draw Things, no .ckpt for new work" — legacy .ckpts now
    archived under lora/{genre}/archive/legacy/ (Task 4.5 layout)
  - The matched-key OTEL counter (Task 4.2b) as the canary for keymap
    drift and silent no-op LoRAs
  - The Architect correction #5 file path (compose_lora_stack lives in
    render_common.py, not a separate compose.py module)
  - ADR-083 + the design spec + the Phase 0 mlx-examples notes as the
    authoritative references

Reference table updated end-to-end: Draw Things and its model dir are
gone, replaced by the actual files in this repo + the mlx-examples
checkout path. The "Completed LoRAs" table now distinguishes legacy
.ckpt artifacts from the new mlx-pipeline .safetensors slots.

Skill description in frontmatter updated so /sq-lora's listing reads
"collect, tag, train via mlx-examples, remap to mflux, verify on the
daemon" — matches the new flow.
Caught during the first real run-through (genre=spaghetti_western,
tier=landscape) by Keith. The Step 2 captioning section I wrote
copied the subject-LoRA convention (trigger + style tags + per-image
subject tags), which is exactly wrong for a style LoRA.

Corrected per kohya-ss / ai-toolkit / civitai community consensus
(verified via Perplexity citation chain — see commit conversation):

  Style LoRA  → every .txt is the bare trigger token, identical
                across the dataset. Genre-level uses the genre name
                (spaghetti_western); world-level uses the world name
                (the_real_mccoy). No descriptors, no subject tags.

  Subject LoRA → detailed natural-language captions per image with
                trigger + descriptive tags. SideQuest hasn't trained
                any subject LoRAs yet; flagged in skill as a separate
                flow if/when one is needed.

Also fixed two downstream bugs:
  - Step 6 visual_style.yaml example showed `trigger: {genre}_{tier}`
    (e.g., `spaghetti_western_landscape`); should be the bare genre/
    world name to match what the model actually trained on. The
    `name` field still includes the tier suffix because it's the
    internal handle compose_lora_stack uses to address entries.
  - Step 3 manual checklist said "every caption starts with the
    trigger word" — replaced with caption-purity check that explicitly
    forbids any extra text past the trigger token, with a spot-check
    one-liner.

The why-it-matters: per-image subject tags actively poison a style
LoRA. The model starts attributing the trained register's effect to
"saloon" or "film_grain" instead of to the bare trigger, and prompts
using those tokens with no LoRA loaded will partially fire the
style. The legacy `leone_style` LoRA's existing 72 captions show the
prior generation of this mistake — 13 tags per image, all identical,
mixing trigger + style descriptors. Salvageable as a starting point
for the new run but the captions need a full rewrite to a single
token before training.

Skill description in frontmatter unchanged; the listing already says
"collect, tag, train via mlx-examples, remap to mflux, verify on the
daemon" which still fits.
…sition

scripts/generate_poi_images.py now uses a POI's visual_prompt field verbatim
when present (single-trigger + renderable-subject convention per ADR-083),
falling back to "{name}: {description}" only when the art-director has not
yet authored one. Chapter-level atmosphere and location are narrator-facing
prose and no longer leak into Flux prompts — they were the source of 80-130
tokens of unrenderable literary metaphor on every POI.

Dry-run against spaghetti_western/dust_and_lead now produces 87-97 token
prompts (was 187-234) while preserving the art-director's compositional
intent. Per handoff-2026-04-21-poi-prompt-composition-dev.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@slabgorb slabgorb merged commit 940652e into main Apr 21, 2026
@slabgorb slabgorb deleted the feat/poi-prompt-composition branch April 21, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant