Skip to content

Implement framewise encoding/decoding in LTX Video VAE #10333

Closed
@a-r-r-o-w

Description

@a-r-r-o-w

Currently, we do not implement framewise encoding/decoding in the LTX Video VAE. This leads to an opportunity for reducing memory usage, which will be beneficial for both inference and training.

LoRA finetuning LTX Video on 49x512x768 videos can be done in under 6 GB if prompts and latents are pre-computed, but the pre-computation requires about 12 GB of memory because of the VAE encode/decode. This can be reduced by a considerable amount and lower the bar for entry into video model finetuning. Our friends with potatoes need you!

As always, contributions are welcome 🤗 Happy new year!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions