Phosphene 3.0 — characters, voice, Image Studio, A2V
⚠️ Updating from v2.x? Click Update TWICE.Pinokio runs your existing v2
update.json the first click — it pulls the new code but doesn't install 3.0's new Python deps (ltx-trainer, mlx-vlm for Gemma auto-caption, mflux 0.17.5, plus a handful of transitive packages). The panel will boot to errors after the first click. Click Update again and Pinokio runs the new 3.0 script, installs everything, and the panel boots clean. Fresh installs are unaffected.
What's in 3.0
This is the biggest release since the project started. 341 commits since v2.0.5.
Character training (in-panel)
- Drop 30 to 80 photos, click Train, get a face LoRA back
- Add a voice clip and get a voice LoRA stacked with it
- Gemma 3 12B auto-captions the dataset locally
- Letterbox crop preserves wide and portrait sources
- ~3 hours per character on M4 Max 64 GB
- Validated recipe: rank 32, alpha 32, 100 epochs, lr 1e-4, 512² square
Image Studio
- Three native-MLX engines: Qwen-Image-Edit-2511, our own MLX port of HiDream-O1, and the FLUX.1 family via mflux
- Multi-reference composition up to 3 subjects
- HiDream-O1 ported in 5 days after upstream release. ~67 s per 1024² on 64 GB
- Family-install gate at the panel level — refuses to submit if the engine binary isn't installed
Audio-to-Video
- New A2V mode drives video motion from an audio reference
- Works with character LoRAs
Joint audio+video stays the differentiator
- LTX-2 emits a synced audio track alongside the frames in the same model pass
- Most local video models (Wan, Hunyuan, Mochi, CogVideoX) are silent
- Ambient/diegetic audio is where it shines; dialogue is hit-or-miss
Full panel redesign
- Three top-level tabs: Video, Images, Train Character
- Capability tier auto-detection. Low-RAM Macs see a clean limited surface; 64 GB+ sees everything
- Character is a first-class mode pill on the Video tab, not a buried chip
- Round avatars, click-to-switch, voice badge, rename/delete in one click
- Vertical-player chrome moved outside the right edge so 9:16 clips aren't occluded
Performance
- Q8 HQ default for character renders. ~6 min for a 7-second clip at 1024×576 on M4 Max 64 GB (was ~15 min in 2.x)
- Codex skip-step optimization on Q8 HQ (~12% faster)
- Adaptive wall-time estimates that learn from your machine after two renders
- TeaCache wired through both Extend and A2V stage 1
- Server-side panel watchdog rescues stuck MLX deallocators (was a 10+ min freeze post-decode on Balanced)
Stability + UX
- Eleven UX contradictions audited and fixed (Codex C+ pass)
- Load Params on any clip restores the actual seed used and reopens in the right mode (Character / Video / Image)
- Mac memory pressure shown as actual pressure %, not sticky swap
- 50+ small fixes across the panel, helper, and installer
Engineering
- 341 commits since v2.0.5
- ltx-2-mlx pinned to v0.14.0 (dgrauet's request — upstream breaking changes deferred)
- Server-side enforcement that Q4 + character_id is rejected (was producing identity-degraded output)
- New post-decode SIGKILL rescue in the panel for stuck helper processes
Hardware
Apple Silicon only. No Intel Mac, no Linux, no Windows.
- 16 / 24 GB: 512 px video, image gen works
- 32 GB: 768 px
- 64 GB+: 1024×576 video, full HD image, character training
- 96 / 128 GB: same caps. Model is the bottleneck, not RAM.
Install
One-click via Pinokio (search Phosphene). Or clone the repo and follow the README.
Credits
- Lightricks for LTX-Video 2.3 (their license applies to the weights)
- dgrauet/ltx-2-mlx for the MLX port that makes this possible
- HiDream AI for HiDream-O1
- Qwen team + mflux for Qwen-Image-Edit
- Apple for MLX
- Phosphene the panel is MIT