Implement framewise encoding/decoding in LTX Video VAE

Currently, we do not implement framewise encoding/decoding in the LTX Video VAE. This leads to an opportunity for reducing memory usage, which will be beneficial for both inference and training. 

LoRA finetuning LTX Video on 49x512x768 videos can be done in under 6 GB if prompts and latents are pre-computed, but the pre-computation requires about 12 GB of memory because of the VAE encode/decode. This can be reduced by a considerable amount and lower the bar for entry into video model finetuning. Our friends with potatoes need you!

https://github.com/huggingface/diffusers/blob/d41388145e7fa7fac5e75047bcbd19eb9276cb64/src/diffusers/models/autoencoders/autoencoder_kl_ltx.py#L949

As always, contributions are welcome 🤗 Happy new year!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement framewise encoding/decoding in LTX Video VAE #10333

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement framewise encoding/decoding in LTX Video VAE #10333

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions