FLUX.1-Kontext-dev batch inference throughput issue

### Describe the bug

I am trying to use FLUX.1-Kontext-dev in batch inference, but the inference time for batch is linear to batch size, e.g., time of batch size 2 ~= 2* time of batch size 1. This makes me confused, did I get something wrong or this is the issue with this pipeline?

### Reproduction

```
import torch
from diffusers import FluxKontextPipeline
from diffusers.utils import load_image


# Load the pipeline
pipe = FluxKontextPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Kontext-dev", 
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

input_images = [
    load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png"),
    load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
]

prompts = [
    "Make it look like a watercolor painting",
    "Add dramatic lighting and shadows",
]
with torch.inference_mode():
    images = pipe(
        image=input_images,
        prompt=prompts,
        guidance_scale=2.5,
        num_inference_steps=28,
        generator=[torch.Generator("cuda").manual_seed(42),torch.Generator("cuda").manual_seed(123)]
    ).images

for i, img in enumerate(images):
    img.save(f"output_i2i_batch_{i}.png")
```

### Logs

```shell

```

### System Info

- 🤗 Diffusers version: 0.36.0.dev0
- Platform: Linux-6.8.0-63-generic-x86_64-with-glibc2.39
- Running on Google Colab?: No
- Python version: 3.10.18
- PyTorch version (GPU?): 2.8.0+cu128 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.35.3
- Transformers version: 4.57.0
- Accelerate version: 1.10.1
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.6.2
- xFormers version: not installed
- Accelerator: NVIDIA H100 NVL, 95830 MiB
NVIDIA H100 NVL, 95830 MiB
NVIDIA H100 NVL, 95830 MiB
NVIDIA H100 NVL, 95830 MiB
NVIDIA H100 NVL, 95830 MiB
NVIDIA H100 NVL, 95830 MiB
NVIDIA H100 NVL, 95830 MiB
NVIDIA H100 NVL, 95830 MiB

### Who can help?

@yiyixuxu @DN6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FLUX.1-Kontext-dev batch inference throughput issue #12459

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

FLUX.1-Kontext-dev batch inference throughput issue #12459

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions