Release Phosphene 3.0 — characters, voice, Image Studio, A2V · mrbizarro/phosphene

⚠️ Updating from v2.x? Click Update TWICE.

Pinokio runs your existing v2 update.js on the first click — it pulls the new code but doesn't install 3.0's new Python deps (ltx-trainer, mlx-vlm for Gemma auto-caption, mflux 0.17.5, plus a handful of transitive packages). The panel will boot to errors after the first click. Click Update again and Pinokio runs the new 3.0 script, installs everything, and the panel boots clean. Fresh installs are unaffected.

What's in 3.0

This is the biggest release since the project started. 341 commits since v2.0.5.

Character training (in-panel)

Drop 30 to 80 photos, click Train, get a face LoRA back
Add a voice clip and get a voice LoRA stacked with it
Gemma 3 12B auto-captions the dataset locally
Letterbox crop preserves wide and portrait sources
~3 hours per character on M4 Max 64 GB
Validated recipe: rank 32, alpha 32, 100 epochs, lr 1e-4, 512² square

Image Studio

Three native-MLX engines: Qwen-Image-Edit-2511, our own MLX port of HiDream-O1, and the FLUX.1 family via mflux
Multi-reference composition up to 3 subjects
HiDream-O1 ported in 5 days after upstream release. ~67 s per 1024² on 64 GB
Family-install gate at the panel level — refuses to submit if the engine binary isn't installed

Audio-to-Video

New A2V mode drives video motion from an audio reference
Works with character LoRAs

Joint audio+video stays the differentiator

LTX-2 emits a synced audio track alongside the frames in the same model pass
Most local video models (Wan, Hunyuan, Mochi, CogVideoX) are silent
Ambient/diegetic audio is where it shines; dialogue is hit-or-miss

Full panel redesign

Three top-level tabs: Video, Images, Train Character
Capability tier auto-detection. Low-RAM Macs see a clean limited surface; 64 GB+ sees everything
Character is a first-class mode pill on the Video tab, not a buried chip
Round avatars, click-to-switch, voice badge, rename/delete in one click
Vertical-player chrome moved outside the right edge so 9:16 clips aren't occluded

Performance

Q8 HQ default for character renders. ~6 min for a 7-second clip at 1024×576 on M4 Max 64 GB (was ~15 min in 2.x)
Codex skip-step optimization on Q8 HQ (~12% faster)
Adaptive wall-time estimates that learn from your machine after two renders
TeaCache wired through both Extend and A2V stage 1
Server-side panel watchdog rescues stuck MLX deallocators (was a 10+ min freeze post-decode on Balanced)

Stability + UX

Eleven UX contradictions audited and fixed (Codex C+ pass)
Load Params on any clip restores the actual seed used and reopens in the right mode (Character / Video / Image)
Mac memory pressure shown as actual pressure %, not sticky swap
50+ small fixes across the panel, helper, and installer

Engineering

341 commits since v2.0.5
ltx-2-mlx pinned to v0.14.0 (dgrauet's request — upstream breaking changes deferred)
Server-side enforcement that Q4 + character_id is rejected (was producing identity-degraded output)
New post-decode SIGKILL rescue in the panel for stuck helper processes

Hardware

Apple Silicon only. No Intel Mac, no Linux, no Windows.

16 / 24 GB: 512 px video, image gen works
32 GB: 768 px
64 GB+: 1024×576 video, full HD image, character training
96 / 128 GB: same caps. Model is the bottleneck, not RAM.

Install

One-click via Pinokio (search Phosphene). Or clone the repo and follow the README.

Credits

Lightricks for LTX-Video 2.3 (their license applies to the weights)
dgrauet/ltx-2-mlx for the MLX port that makes this possible
HiDream AI for HiDream-O1
Qwen team + mflux for Qwen-Image-Edit
Apple for MLX
Phosphene the panel is MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phosphene 3.0 — characters, voice, Image Studio, A2V

Choose a tag to compare

Sorry, something went wrong.