[feature request] add pre-calculating latent / text encoder outputs

precalculating the text encoder embeddings can improve vram usage by only loading the text encoders when the dataset needs to be preprocessed, this also can apply to the vae, so that the only thing that needs to be loaded when training is the main diffusion model / unet / dit / that thing.
something like making a modified metadata.csv that includes the text encoder embed path and the latent path relating to each video/image name so that the trainer can find the embed / latent

this *should* apply to all models, so it can benefit the entire repo (but notably helps the models with t5-xxl / umt5-xxl, as it is a very large model), the only flaws with it could be a lack of dynamic tag-based dropout, but entire dropout could work by having a precalculated empty string embedding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature request] add pre-calculating latent / text encoder outputs #1070

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature request] add pre-calculating latent / text encoder outputs #1070

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions