Skip to content

Phosphene 3.0 — characters, voice, Image Studio, A2V

Choose a tag to compare

@mrbizarro mrbizarro released this 23 May 06:10
· 34 commits to main since this release

⚠️ Updating from v2.x? Click Update TWICE.

Pinokio runs your existing v2 update.js on the first click — it pulls the new code but doesn't install 3.0's new Python deps (ltx-trainer, mlx-vlm for Gemma auto-caption, mflux 0.17.5, plus a handful of transitive packages). The panel will boot to errors after the first click. Click Update again and Pinokio runs the new 3.0 script, installs everything, and the panel boots clean. Fresh installs are unaffected.

What's in 3.0

This is the biggest release since the project started. 341 commits since v2.0.5.

Character training (in-panel)

  • Drop 30 to 80 photos, click Train, get a face LoRA back
  • Add a voice clip and get a voice LoRA stacked with it
  • Gemma 3 12B auto-captions the dataset locally
  • Letterbox crop preserves wide and portrait sources
  • ~3 hours per character on M4 Max 64 GB
  • Validated recipe: rank 32, alpha 32, 100 epochs, lr 1e-4, 512² square

Image Studio

  • Three native-MLX engines: Qwen-Image-Edit-2511, our own MLX port of HiDream-O1, and the FLUX.1 family via mflux
  • Multi-reference composition up to 3 subjects
  • HiDream-O1 ported in 5 days after upstream release. ~67 s per 1024² on 64 GB
  • Family-install gate at the panel level — refuses to submit if the engine binary isn't installed

Audio-to-Video

  • New A2V mode drives video motion from an audio reference
  • Works with character LoRAs

Joint audio+video stays the differentiator

  • LTX-2 emits a synced audio track alongside the frames in the same model pass
  • Most local video models (Wan, Hunyuan, Mochi, CogVideoX) are silent
  • Ambient/diegetic audio is where it shines; dialogue is hit-or-miss

Full panel redesign

  • Three top-level tabs: Video, Images, Train Character
  • Capability tier auto-detection. Low-RAM Macs see a clean limited surface; 64 GB+ sees everything
  • Character is a first-class mode pill on the Video tab, not a buried chip
  • Round avatars, click-to-switch, voice badge, rename/delete in one click
  • Vertical-player chrome moved outside the right edge so 9:16 clips aren't occluded

Performance

  • Q8 HQ default for character renders. ~6 min for a 7-second clip at 1024×576 on M4 Max 64 GB (was ~15 min in 2.x)
  • Codex skip-step optimization on Q8 HQ (~12% faster)
  • Adaptive wall-time estimates that learn from your machine after two renders
  • TeaCache wired through both Extend and A2V stage 1
  • Server-side panel watchdog rescues stuck MLX deallocators (was a 10+ min freeze post-decode on Balanced)

Stability + UX

  • Eleven UX contradictions audited and fixed (Codex C+ pass)
  • Load Params on any clip restores the actual seed used and reopens in the right mode (Character / Video / Image)
  • Mac memory pressure shown as actual pressure %, not sticky swap
  • 50+ small fixes across the panel, helper, and installer

Engineering

  • 341 commits since v2.0.5
  • ltx-2-mlx pinned to v0.14.0 (dgrauet's request — upstream breaking changes deferred)
  • Server-side enforcement that Q4 + character_id is rejected (was producing identity-degraded output)
  • New post-decode SIGKILL rescue in the panel for stuck helper processes

Hardware

Apple Silicon only. No Intel Mac, no Linux, no Windows.

  • 16 / 24 GB: 512 px video, image gen works
  • 32 GB: 768 px
  • 64 GB+: 1024×576 video, full HD image, character training
  • 96 / 128 GB: same caps. Model is the bottleneck, not RAM.

Install

One-click via Pinokio (search Phosphene). Or clone the repo and follow the README.

Credits

  • Lightricks for LTX-Video 2.3 (their license applies to the weights)
  • dgrauet/ltx-2-mlx for the MLX port that makes this possible
  • HiDream AI for HiDream-O1
  • Qwen team + mflux for Qwen-Image-Edit
  • Apple for MLX
  • Phosphene the panel is MIT