From d67491e7dcc58610625f3cd7a2e3f55c038c81d8 Mon Sep 17 00:00:00 2001 From: Nazar Kozak Date: Fri, 24 Apr 2026 23:34:53 -0700 Subject: [PATCH 1/2] docs(flux2): clarify image= is reference conditioning, not img2img --- docs/source/en/api/pipelines/flux2.md | 20 +++++++++++++++++++ .../pipelines/flux2/pipeline_flux2.py | 14 ++++++++----- .../pipelines/flux2/pipeline_flux2_klein.py | 14 ++++++++----- 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/docs/source/en/api/pipelines/flux2.md b/docs/source/en/api/pipelines/flux2.md index 2a2b39b95630..e62e877bf1c5 100644 --- a/docs/source/en/api/pipelines/flux2.md +++ b/docs/source/en/api/pipelines/flux2.md @@ -32,6 +32,26 @@ Flux.2 can potentially generate better better outputs with better prompts. We ca an input prompt by setting the `caption_upsample_temperature` argument in the pipeline call arguments. The [official implementation](https://github.com/black-forest-labs/flux2/blob/5a5d316b1b42f6b59a8c9194b77c8256be848432/src/flux2/text_encoder.py#L140) recommends this value to be 0.15. +## Reference conditioning vs. img2img + +The `image` argument on `Flux2Pipeline` and `Flux2KleinPipeline` is **reference conditioning**, not +img2img. Reference images are encoded into additional attention tokens that flow through the +transformer alongside the text prompt — there is no noisy latent initialization, and so no `strength` +parameter to scale. + +This differs from `StableDiffusionImg2ImgPipeline`, `FluxImg2ImgPipeline`, and +`FluxKontextInpaintPipeline`, which add noise to a latent encoding of the input image and then +partially denoise it. If you port code from those pipelines and pass `strength=...` to a Flux.2 +pipeline, you will see: + +``` +TypeError: Flux2Pipeline.__call__() got an unexpected keyword argument 'strength' +``` + +Drop the `strength` kwarg and pass references via `image=` (a single image, or a list for multiple +references). For Flux.2 inpainting (which does add noise to a latent and therefore does take a +`strength` parameter), use `Flux2KleinInpaintPipeline` instead. + ## Flux2Pipeline [[autodoc]] Flux2Pipeline diff --git a/src/diffusers/pipelines/flux2/pipeline_flux2.py b/src/diffusers/pipelines/flux2/pipeline_flux2.py index 4b60c6042d4f..2602a49934a9 100644 --- a/src/diffusers/pipelines/flux2/pipeline_flux2.py +++ b/src/diffusers/pipelines/flux2/pipeline_flux2.py @@ -769,11 +769,15 @@ def __call__( Args: image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`): - `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both - numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list - or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a - list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image - latents as `image`, but if passing latents directly it is not encoded again. + Reference image(s) used to condition generation. Flux.2 encodes them as additional attention tokens that + flow through the transformer alongside the text prompt — this is **reference conditioning**, not + SD/Flux.1 style img2img, so there is no companion `strength` argument. Pass a list to provide multiple + references. + + For both numpy array and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor + or a list of tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy + array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)`. Can also accept + image latents directly, in which case they will not be re-encoded. prompt (`str` or `list[str]`, *optional*): The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`. instead. diff --git a/src/diffusers/pipelines/flux2/pipeline_flux2_klein.py b/src/diffusers/pipelines/flux2/pipeline_flux2_klein.py index 1f3b5c3c4fde..6593e81dba5f 100644 --- a/src/diffusers/pipelines/flux2/pipeline_flux2_klein.py +++ b/src/diffusers/pipelines/flux2/pipeline_flux2_klein.py @@ -635,11 +635,15 @@ def __call__( Args: image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`): - `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both - numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list - or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a - list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image - latents as `image`, but if passing latents directly it is not encoded again. + Reference image(s) used to condition generation. Flux.2 encodes them as additional attention tokens that + flow through the transformer alongside the text prompt — this is **reference conditioning**, not + SD/Flux.1 style img2img, so there is no companion `strength` argument. Pass a list to provide multiple + references. + + For both numpy array and pytorch tensor, the expected value range is between `[0, 1]`. If it's a tensor + or a list of tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy + array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)`. Can also accept + image latents directly, in which case they will not be re-encoded. prompt (`str` or `List[str]`, *optional*): The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`. instead. From 2db8a9e7330c61de8a1aa6649822223d28788f56 Mon Sep 17 00:00:00 2001 From: Nazar Kozak Date: Mon, 27 Apr 2026 14:18:30 -0700 Subject: [PATCH 2/2] =?UTF-8?q?docs(flux2):=20address=20review=20=E2=80=94?= =?UTF-8?q?=20tighten=20wording,=20use=20doc-link=20refs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Apply @stevhliu's suggestions: - Compress 'Reference conditioning vs. img2img' section. - Use [`ClassName`] cross-reference syntax for Flux2Pipeline, Flux2KleinPipeline, FluxImg2ImgPipeline, Flux2KleinInpaintPipeline. - Drop the 'This differs from...' paragraph; the TypeError example alone makes the point. --- docs/source/en/api/pipelines/flux2.md | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/docs/source/en/api/pipelines/flux2.md b/docs/source/en/api/pipelines/flux2.md index e62e877bf1c5..9cd724aea0ef 100644 --- a/docs/source/en/api/pipelines/flux2.md +++ b/docs/source/en/api/pipelines/flux2.md @@ -34,23 +34,14 @@ The [official implementation](https://github.com/black-forest-labs/flux2/blob/5a ## Reference conditioning vs. img2img -The `image` argument on `Flux2Pipeline` and `Flux2KleinPipeline` is **reference conditioning**, not -img2img. Reference images are encoded into additional attention tokens that flow through the -transformer alongside the text prompt — there is no noisy latent initialization, and so no `strength` -parameter to scale. - -This differs from `StableDiffusionImg2ImgPipeline`, `FluxImg2ImgPipeline`, and -`FluxKontextInpaintPipeline`, which add noise to a latent encoding of the input image and then -partially denoise it. If you port code from those pipelines and pass `strength=...` to a Flux.2 -pipeline, you will see: +The `image` argument on [`Flux2Pipeline`] and [`Flux2KleinPipeline`] is a *reference conditioning*. Reference images are encoded as additional attention tokens that flow through the +transformer alongside the text prompt. Flux.2 doesn't add noise to the input image unlike [`FluxImg2ImgPipeline`]. Passing `strength` to [`Flux2Pipeline`] raises: ``` TypeError: Flux2Pipeline.__call__() got an unexpected keyword argument 'strength' ``` -Drop the `strength` kwarg and pass references via `image=` (a single image, or a list for multiple -references). For Flux.2 inpainting (which does add noise to a latent and therefore does take a -`strength` parameter), use `Flux2KleinInpaintPipeline` instead. +Drop the `strength` argument and pass references with `image`. For inpainting, use [`Flux2KleinInpaintPipeline`] instead. ## Flux2Pipeline