Skip to content

v1.1.0 — performance research, batch overlap, OOM fix

Latest

Choose a tag to compare

@patraxo patraxo released this 11 Jun 07:03

What's new

  • Batch decode/encode overlap: 3-clip batch wall down 20.4% (92.5s → 73.7s). Finalize of clip N overlaps denoise of clip N+1.
  • Warm-container OOM fix: first new-prompt request no longer dies at 93.7GB — text encoder streams layer-wise when VRAM is tight (LTX_EMB_STREAM_FREE_GB, identical embeddings).
  • Reference-exact VAE decode: LTX_VAE_TILE_PX=0 disables tiling entirely (50-51dB vs tiled, latency-neutral).
  • Performance research published: three findings write-ups in docs/ (SageAttention +5% slower under torch.compile, NVENC vs memory-snapshot containers, the text-encoder OOM benchmarks miss) + full measured record in references/.
  • Re-measured performance table: 5s clip 23s warm ($0.02), 10s 45s ($0.04).
  • Filed upstream: Lightricks/LTX-2#232 (per-component offload).

Full details: README Performance research section.