Document Flux2Pipeline latents shape #12807
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #12755.
This PR documents the expected shape of the
latentsargument inFlux2Pipeline.__call__.For the default
AutoencoderKLFlux2VAE used by FLUX.2, the pipeline first applies 8× spatial compression in the VAE,and then a 2×2 patch packing step in the pipeline. This results in:
The expected shape for user-provided latents is therefore:
(batch_size, 128, height // 16, width // 16)where
heightandwidthare the requested output image size. Passing latents with a different shape leads to shapemismatches inside the VAE and transformer.
Tests
(1, 128, H // 16, W // 16)runs end-to-end with the FLUX.2-dev checkpoint.