[Bug] Black Output images on FLUX kontext when reference image is "big resolution"

### Git commit

40a6a8710ec15b1b5db6b5a098409f6bc8f654a4

### Operating System & Version

Windows 11

### GGML backends

CUDA

### Command-line arguments used

sd.exe -r "C:/Users/pedro/Downloads/flux1-dev-q8_0 (1)_SD.png" --diffusion-model "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf" --vae "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors" --clip_l "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors" --t5xxl "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors" -p "change 'flux.cpp' to 'kontext.cpp'" --cfg-scale 1.0 --sampling-method euler -v -W 1920 -H 1088 --diffusion-fa --vae-tiling --steps 15 --offload-to-cpu

### Steps to reproduce

Hi, so i'm experimenting with FLUX kontext and i found an interesting bug, so while i can generate "big" images ( lets say 1920*1088 wich is supposed to be the maximum allowed by flux ) that only happens if my reference image is not that big.

Let me explain the diferent situations:
**output -> 1920*1088**  + **image ref -> 960*544** -> OK ( i get an image, but is no ok ) 

<img width="1920" height="1088" alt="Image" src="https://github.com/user-attachments/assets/c7706696-b7b3-481b-bcf0-f54b822e5ff7" />

**output -> 1920*1088**  + **image ref -> 1920*1088** -> BLACK OUTPUT

<img width="1920" height="1088" alt="Image" src="https://github.com/user-attachments/assets/9d9bb9e1-45d6-4906-af6f-ebadf9af0046" />

**output -> 960*544**  + **image ref -> 960*544** -> OK

<img width="960" height="544" alt="Image" src="https://github.com/user-attachments/assets/97830500-f428-4704-a291-91d792e7822d" />

**output -> 960*544**  + **image ref -> 1920*1088** -> BLACK OUTPUT

<img width="960" height="544" alt="Image" src="https://github.com/user-attachments/assets/9d91f162-f033-4bb4-8e5a-0afc3519cb1d" />

THE IMAGE REFERENCES ARE
**SD**
<img width="960" height="544" alt="Image" src="https://github.com/user-attachments/assets/2daaaac2-67f9-44a2-8b7a-5dc3af9d8641" />
**HD**
<img width="1920" height="1088" alt="Image" src="https://github.com/user-attachments/assets/5c923367-51c8-4968-8478-72423c6c3adb" />



### What you expected to happen

If its supported to generate images of 1920*1088 ( wich it is ) it should be posible to fed reference images of the same resolution

### What actually happened

Black outputs when reference image is big

### Logs / error messages / stack trace

sd.exe -r "C:/Users/pedro/Downloads/flux1-dev-q8_0 (1)_HD.png" --diffusion-model "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf" --vae "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors" --clip_l "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors" --t5xxl "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors" -p "change 'flux.cpp' to 'kontext.cpp'" --cfg-scale 1.0 --sampling-method euler -v -W 1920 -H 1088 --diffusion-fa --vae-tiling --steps 15 --offload-to-cpu
Option:
    n_threads:                         12
    mode:                              img_gen
    model_path:
    wtype:                             unspecified
    clip_l_path:                       F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors
    clip_g_path:
    clip_vision_path:
    t5xxl_path:                        F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors
    qwen2vl_path:
    qwen2vl_vision_path:
    diffusion_model_path:              F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf
    high_noise_diffusion_model_path:
    vae_path:                          F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors
    taesd_path:
    esrgan_path:
    control_net_path:
    embedding_dir:
    photo_maker_path:
    pm_id_images_dir:
    pm_id_embed_path:
    pm_style_strength:                 20.00
    output_path:                       output.png
    init_image_path:
    end_image_path:
    mask_image_path:
    control_image_path:
    ref_images_paths:
        C:/Users/pedro/Downloads/flux1-dev-q8_0 (1)_HD.png
    control_video_path:
    increase_ref_index:                false
    offload_params_to_cpu:             true
    clip_on_cpu:                       false
    control_net_cpu:                   false
    vae_on_cpu:                        false
    diffusion flash attention:         true
    diffusion Conv2d direct:           false
    vae_conv_direct:                   false
    control_strength:                  0.90
    prompt:                            change 'flux.cpp' to 'kontext.cpp'
    negative_prompt:
    clip_skip:                         -1
    width:                             1920
    height:                            1088
    sample_params:                     (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler, sample_steps: 15, eta: 0.00, shifted_timestep: 0)
    high_noise_sample_params:          (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
    moe_boundary:                      0.875
    prediction:                        default
    flow_shift:                        inf
    strength(img2img):                 0.75
    rng:                               cuda
    seed:                              42
    batch_count:                       1
    vae_tiling:                        true
    force_sdxl_vae_conv_scale:         false
    upscale_repeats:                   1
    upscale_tile:                      128
    chroma_use_dit_mask:               true
    chroma_use_t5_mask:                false
    chroma_t5_mask_pad:                1
    video_frames:                      1
    vace_strength:                     1.00
    fps:                               16
System Info:
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:147  - Using CUDA backend
[INFO ] ggml_extend.hpp:69   - ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
[INFO ] ggml_extend.hpp:69   - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:69   - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:69   -   Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:211  - loading diffusion model from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf'
[INFO ] model.cpp:1098 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf using gguf format
[DEBUG] model.cpp:1115 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf'
[INFO ] stable-diffusion.cpp:227  - loading clip_l from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors'
[INFO ] model.cpp:1101 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors using safetensors format
[DEBUG] model.cpp:1208 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors', prefix = 'text_encoders.clip_l.transformer.'
[INFO ] stable-diffusion.cpp:251  - loading t5xxl from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors'
[INFO ] model.cpp:1101 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors using safetensors format
[DEBUG] model.cpp:1208 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors', prefix = 'text_encoders.t5xxl.transformer.'
[INFO ] stable-diffusion.cpp:272  - loading vae from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors'
[INFO ] model.cpp:1101 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors using safetensors format
[DEBUG] model.cpp:1208 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:293  - Version: Flux
[INFO ] stable-diffusion.cpp:324  - Weight type:                 q4_0
[INFO ] stable-diffusion.cpp:325  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:326  - Diffusion model weight type: q4_0
[INFO ] stable-diffusion.cpp:327  - VAE weight type:             f32
[DEBUG] stable-diffusion.cpp:329  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:356  - Using flash attention in the diffusion model
[DEBUG] clip.hpp:171  - vocab size: 49408
[DEBUG] clip.hpp:182  - trigger word img already in vocab
[INFO ] flux.hpp:916  - Flux blocks: 19 double, 38 single
[DEBUG] ggml_extend.hpp:1758 - clip params backend buffer size =  235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1758 - t5 params backend buffer size =  9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1758 - flux params backend buffer size =  6482.39 MB(RAM) (780 tensors)
[DEBUG] ggml_extend.hpp:1758 - vae params backend buffer size =  160.00 MB(RAM) (244 tensors)
[DEBUG] stable-diffusion.cpp:604  - loading weights
[DEBUG] model.cpp:2031 - using 12 threads for model loading
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf
  |===========================>                      | 780/1439 - 636.73it/s
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors
  |=================================>                | 976/1439 - 683.47it/s
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors
  |=========================================>        | 1195/1439 - 390.52it/s
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors
  |==================================================| 1439/1439 - 440.74it/s
[INFO ] model.cpp:2358 - loading tensors completed, taking 3.27s (process: 0.01s, read: 2.69s, memcpy: 0.00s, convert: 0.01s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:702  - total params memory size = 15961.23MB (VRAM 15961.23MB, RAM 0.00MB): text_encoders 9318.83MB(VRAM), diffusion_model 6482.39MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:769  - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:811  - finished loaded file
[DEBUG] stable-diffusion.cpp:2481 - generate_image 1920x1088
[INFO ] stable-diffusion.cpp:2608 - TXT2IMG
[INFO ] stable-diffusion.cpp:2632 - EDIT mode
[DEBUG] stable-diffusion.cpp:1526 - VAE Tile size: 41x41
[DEBUG] ggml_extend.hpp:832  - num tiles : 10, 5
[DEBUG] ggml_extend.hpp:833  - optimal overlap : 0.460705, 0.420732 (targeting 0.500000)
[DEBUG] ggml_extend.hpp:866  - tile work buffer size: 1.44 MB
[INFO ] ggml_extend.hpp:879  - processing 50 tiles
[INFO ] ggml_extend.hpp:1682 - vae offload params (160.00 MB, 244 tensors) to runtime backend (CUDA0), taking 0.10s
[DEBUG] ggml_extend.hpp:1582 - vae compute buffer size: 348.22 MB(VRAM)
  |==================================================| 50/50 - 62.50it/s
[DEBUG] stable-diffusion.cpp:1550 - computing vae encode graph completed, taking 1.07s
[INFO ] stable-diffusion.cpp:2680 - encode_first_stage completed, taking 1.36s
[INFO ] stable-diffusion.cpp:960  - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:980  - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:981  - prompt after extract and remove lora: "change 'flux.cpp' to 'kontext.cpp'"
[DEBUG] conditioner.hpp:1039 - parse 'change 'flux.cpp' to 'kontext.cpp'' to [['change 'flux.cpp' to 'kontext.cpp'', 1], ]
[DEBUG] clip.hpp:311  - token length: 77
[DEBUG] t5.hpp:402  - token length: 256
[INFO ] ggml_extend.hpp:1682 - clip offload params (235.06 MB, 196 tensors) to runtime backend (CUDA0), taking 0.04s
[DEBUG] clip.hpp:741  - identity projection
[DEBUG] ggml_extend.hpp:1582 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] clip.hpp:741  - identity projection
[INFO ] ggml_extend.hpp:1682 - t5 offload params (9083.77 MB, 219 tensors) to runtime backend (CUDA0), taking 0.90s
[DEBUG] ggml_extend.hpp:1582 - t5 compute buffer size: 68.25 MB(VRAM)
[DEBUG] conditioner.hpp:1158 - computing condition graph completed, taking 1014 ms
[INFO ] stable-diffusion.cpp:2219 - get_learned_condition completed, taking 1017 ms
[INFO ] stable-diffusion.cpp:2244 - sampling using Euler method
[INFO ] stable-diffusion.cpp:2338 - generating image: 1/1 - seed 42
[INFO ] ggml_extend.hpp:1682 - flux offload params (6482.39 MB, 780 tensors) to runtime backend (CUDA0), taking 0.65s
[DEBUG] ggml_extend.hpp:1582 - flux compute buffer size: 5232.50 MB(VRAM)
  |==================================================| 15/15 - 5.07s/it
[INFO ] stable-diffusion.cpp:2375 - sampling completed, taking 76.38s
[INFO ] stable-diffusion.cpp:2383 - generating 1 latent images completed, taking 76.86s
[INFO ] stable-diffusion.cpp:2386 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1651 - VAE Tile size: 32x32
[DEBUG] ggml_extend.hpp:832  - num tiles : 14, 7
[DEBUG] ggml_extend.hpp:833  - optimal overlap : 0.500000, 0.458333 (targeting 0.500000)
[DEBUG] ggml_extend.hpp:866  - tile work buffer size: 0.81 MB
[INFO ] ggml_extend.hpp:879  - processing 98 tiles
[INFO ] ggml_extend.hpp:1682 - vae offload params (160.00 MB, 244 tensors) to runtime backend (CUDA0), taking 0.05s
[DEBUG] ggml_extend.hpp:1582 - vae compute buffer size: 416.06 MB(VRAM)
  |==================================================| 98/98 - 50.00it/s
[DEBUG] stable-diffusion.cpp:1677 - computing vae decode graph completed, taking 2.13s
[INFO ] stable-diffusion.cpp:2396 - latent 1 decoded, taking 2.13s
[INFO ] stable-diffusion.cpp:2400 - decode_first_stage completed, taking 2.13s
[INFO ] stable-diffusion.cpp:2714 - generate_image completed in 81.39s
save result PNG image to 'output.png'

### Additional context / environment details

RTX 4090

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Black Output images on FLUX kontext when reference image is "big resolution" #894

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Black Output images on FLUX kontext when reference image is "big resolution" #894

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions