Skip to content

[feature request] add pre-calculating latent / text encoder outputs #1070

@yoinked-h

Description

@yoinked-h

precalculating the text encoder embeddings can improve vram usage by only loading the text encoders when the dataset needs to be preprocessed, this also can apply to the vae, so that the only thing that needs to be loaded when training is the main diffusion model / unet / dit / that thing.
something like making a modified metadata.csv that includes the text encoder embed path and the latent path relating to each video/image name so that the trainer can find the embed / latent

this should apply to all models, so it can benefit the entire repo (but notably helps the models with t5-xxl / umt5-xxl, as it is a very large model), the only flaws with it could be a lack of dynamic tag-based dropout, but entire dropout could work by having a precalculated empty string embedding

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions