-
Notifications
You must be signed in to change notification settings - Fork 431
Description
Git commit
Operating System & Version
Windows 11
GGML backends
CUDA
Command-line arguments used
sd.exe -r "C:/Users/pedro/Downloads/flux1-dev-q8_0 (1)_SD.png" --diffusion-model "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf" --vae "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors" --clip_l "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors" --t5xxl "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors" -p "change 'flux.cpp' to 'kontext.cpp'" --cfg-scale 1.0 --sampling-method euler -v -W 1920 -H 1088 --diffusion-fa --vae-tiling --steps 15 --offload-to-cpu
Steps to reproduce
Hi, so i'm experimenting with FLUX kontext and i found an interesting bug, so while i can generate "big" images ( lets say 1920*1088 wich is supposed to be the maximum allowed by flux ) that only happens if my reference image is not that big.
Let me explain the diferent situations:
output -> 1920*1088 + image ref -> 960*544 -> OK ( i get an image, but is no ok )

output -> 1920*1088 + image ref -> 1920*1088 -> BLACK OUTPUT

output -> 960*544 + image ref -> 960*544 -> OK

output -> 960*544 + image ref -> 1920*1088 -> BLACK OUTPUT

THE IMAGE REFERENCES ARE
SD
HD
What you expected to happen
If its supported to generate images of 1920*1088 ( wich it is ) it should be posible to fed reference images of the same resolution
What actually happened
Black outputs when reference image is big
Logs / error messages / stack trace
sd.exe -r "C:/Users/pedro/Downloads/flux1-dev-q8_0 (1)_HD.png" --diffusion-model "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf" --vae "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors" --clip_l "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors" --t5xxl "F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors" -p "change 'flux.cpp' to 'kontext.cpp'" --cfg-scale 1.0 --sampling-method euler -v -W 1920 -H 1088 --diffusion-fa --vae-tiling --steps 15 --offload-to-cpu
Option:
n_threads: 12
mode: img_gen
model_path:
wtype: unspecified
clip_l_path: F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors
clip_g_path:
clip_vision_path:
t5xxl_path: F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors
qwen2vl_path:
qwen2vl_vision_path:
diffusion_model_path: F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf
high_noise_diffusion_model_path:
vae_path: F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors
taesd_path:
esrgan_path:
control_net_path:
embedding_dir:
photo_maker_path:
pm_id_images_dir:
pm_id_embed_path:
pm_style_strength: 20.00
output_path: output.png
init_image_path:
end_image_path:
mask_image_path:
control_image_path:
ref_images_paths:
C:/Users/pedro/Downloads/flux1-dev-q8_0 (1)_HD.png
control_video_path:
increase_ref_index: false
offload_params_to_cpu: true
clip_on_cpu: false
control_net_cpu: false
vae_on_cpu: false
diffusion flash attention: true
diffusion Conv2d direct: false
vae_conv_direct: false
control_strength: 0.90
prompt: change 'flux.cpp' to 'kontext.cpp'
negative_prompt:
clip_skip: -1
width: 1920
height: 1088
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler, sample_steps: 15, eta: 0.00, shifted_timestep: 0)
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
moe_boundary: 0.875
prediction: default
flow_shift: inf
strength(img2img): 0.75
rng: cuda
seed: 42
batch_count: 1
vae_tiling: true
force_sdxl_vae_conv_scale: false
upscale_repeats: 1
upscale_tile: 128
chroma_use_dit_mask: true
chroma_use_t5_mask: false
chroma_t5_mask_pad: 1
video_frames: 1
vace_strength: 1.00
fps: 16
System Info:
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:147 - Using CUDA backend
[INFO ] ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
[INFO ] ggml_extend.hpp:69 - ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[INFO ] ggml_extend.hpp:69 - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:69 - Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
[INFO ] stable-diffusion.cpp:211 - loading diffusion model from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf'
[INFO ] model.cpp:1098 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf using gguf format
[DEBUG] model.cpp:1115 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf'
[INFO ] stable-diffusion.cpp:227 - loading clip_l from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors'
[INFO ] model.cpp:1101 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors using safetensors format
[DEBUG] model.cpp:1208 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors', prefix = 'text_encoders.clip_l.transformer.'
[INFO ] stable-diffusion.cpp:251 - loading t5xxl from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors'
[INFO ] model.cpp:1101 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors using safetensors format
[DEBUG] model.cpp:1208 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors', prefix = 'text_encoders.t5xxl.transformer.'
[INFO ] stable-diffusion.cpp:272 - loading vae from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors'
[INFO ] model.cpp:1101 - load F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors using safetensors format
[DEBUG] model.cpp:1208 - init from 'F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:293 - Version: Flux
[INFO ] stable-diffusion.cpp:324 - Weight type: q4_0
[INFO ] stable-diffusion.cpp:325 - Conditioner weight type: f16
[INFO ] stable-diffusion.cpp:326 - Diffusion model weight type: q4_0
[INFO ] stable-diffusion.cpp:327 - VAE weight type: f32
[DEBUG] stable-diffusion.cpp:329 - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:356 - Using flash attention in the diffusion model
[DEBUG] clip.hpp:171 - vocab size: 49408
[DEBUG] clip.hpp:182 - trigger word img already in vocab
[INFO ] flux.hpp:916 - Flux blocks: 19 double, 38 single
[DEBUG] ggml_extend.hpp:1758 - clip params backend buffer size = 235.06 MB(RAM) (196 tensors)
[DEBUG] ggml_extend.hpp:1758 - t5 params backend buffer size = 9083.77 MB(RAM) (219 tensors)
[DEBUG] ggml_extend.hpp:1758 - flux params backend buffer size = 6482.39 MB(RAM) (780 tensors)
[DEBUG] ggml_extend.hpp:1758 - vae params backend buffer size = 160.00 MB(RAM) (244 tensors)
[DEBUG] stable-diffusion.cpp:604 - loading weights
[DEBUG] model.cpp:2031 - using 12 threads for model loading
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\flux1-kontext-dev-Q4_0.gguf
|===========================> | 780/1439 - 636.73it/s
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\clip_l_FLUX.safetensors
|=================================> | 976/1439 - 683.47it/s
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\t5xxl_fp16_FLUX.safetensors
|=========================================> | 1195/1439 - 390.52it/s
[DEBUG] model.cpp:2114 - loading tensors from F:\pedro\ENVIRONMENTS\pc_nuke_diffusion\models\FLUX\ae_FLUX.safetensors
|==================================================| 1439/1439 - 440.74it/s
[INFO ] model.cpp:2358 - loading tensors completed, taking 3.27s (process: 0.01s, read: 2.69s, memcpy: 0.00s, convert: 0.01s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:702 - total params memory size = 15961.23MB (VRAM 15961.23MB, RAM 0.00MB): text_encoders 9318.83MB(VRAM), diffusion_model 6482.39MB(VRAM), vae 160.00MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:769 - running in Flux FLOW mode
[DEBUG] stable-diffusion.cpp:811 - finished loaded file
[DEBUG] stable-diffusion.cpp:2481 - generate_image 1920x1088
[INFO ] stable-diffusion.cpp:2608 - TXT2IMG
[INFO ] stable-diffusion.cpp:2632 - EDIT mode
[DEBUG] stable-diffusion.cpp:1526 - VAE Tile size: 41x41
[DEBUG] ggml_extend.hpp:832 - num tiles : 10, 5
[DEBUG] ggml_extend.hpp:833 - optimal overlap : 0.460705, 0.420732 (targeting 0.500000)
[DEBUG] ggml_extend.hpp:866 - tile work buffer size: 1.44 MB
[INFO ] ggml_extend.hpp:879 - processing 50 tiles
[INFO ] ggml_extend.hpp:1682 - vae offload params (160.00 MB, 244 tensors) to runtime backend (CUDA0), taking 0.10s
[DEBUG] ggml_extend.hpp:1582 - vae compute buffer size: 348.22 MB(VRAM)
|==================================================| 50/50 - 62.50it/s
[DEBUG] stable-diffusion.cpp:1550 - computing vae encode graph completed, taking 1.07s
[INFO ] stable-diffusion.cpp:2680 - encode_first_stage completed, taking 1.36s
[INFO ] stable-diffusion.cpp:960 - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:980 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:981 - prompt after extract and remove lora: "change 'flux.cpp' to 'kontext.cpp'"
[DEBUG] conditioner.hpp:1039 - parse 'change 'flux.cpp' to 'kontext.cpp'' to [['change 'flux.cpp' to 'kontext.cpp'', 1], ]
[DEBUG] clip.hpp:311 - token length: 77
[DEBUG] t5.hpp:402 - token length: 256
[INFO ] ggml_extend.hpp:1682 - clip offload params (235.06 MB, 196 tensors) to runtime backend (CUDA0), taking 0.04s
[DEBUG] clip.hpp:741 - identity projection
[DEBUG] ggml_extend.hpp:1582 - clip compute buffer size: 1.40 MB(VRAM)
[DEBUG] clip.hpp:741 - identity projection
[INFO ] ggml_extend.hpp:1682 - t5 offload params (9083.77 MB, 219 tensors) to runtime backend (CUDA0), taking 0.90s
[DEBUG] ggml_extend.hpp:1582 - t5 compute buffer size: 68.25 MB(VRAM)
[DEBUG] conditioner.hpp:1158 - computing condition graph completed, taking 1014 ms
[INFO ] stable-diffusion.cpp:2219 - get_learned_condition completed, taking 1017 ms
[INFO ] stable-diffusion.cpp:2244 - sampling using Euler method
[INFO ] stable-diffusion.cpp:2338 - generating image: 1/1 - seed 42
[INFO ] ggml_extend.hpp:1682 - flux offload params (6482.39 MB, 780 tensors) to runtime backend (CUDA0), taking 0.65s
[DEBUG] ggml_extend.hpp:1582 - flux compute buffer size: 5232.50 MB(VRAM)
|==================================================| 15/15 - 5.07s/it
[INFO ] stable-diffusion.cpp:2375 - sampling completed, taking 76.38s
[INFO ] stable-diffusion.cpp:2383 - generating 1 latent images completed, taking 76.86s
[INFO ] stable-diffusion.cpp:2386 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:1651 - VAE Tile size: 32x32
[DEBUG] ggml_extend.hpp:832 - num tiles : 14, 7
[DEBUG] ggml_extend.hpp:833 - optimal overlap : 0.500000, 0.458333 (targeting 0.500000)
[DEBUG] ggml_extend.hpp:866 - tile work buffer size: 0.81 MB
[INFO ] ggml_extend.hpp:879 - processing 98 tiles
[INFO ] ggml_extend.hpp:1682 - vae offload params (160.00 MB, 244 tensors) to runtime backend (CUDA0), taking 0.05s
[DEBUG] ggml_extend.hpp:1582 - vae compute buffer size: 416.06 MB(VRAM)
|==================================================| 98/98 - 50.00it/s
[DEBUG] stable-diffusion.cpp:1677 - computing vae decode graph completed, taking 2.13s
[INFO ] stable-diffusion.cpp:2396 - latent 1 decoded, taking 2.13s
[INFO ] stable-diffusion.cpp:2400 - decode_first_stage completed, taking 2.13s
[INFO ] stable-diffusion.cpp:2714 - generate_image completed in 81.39s
save result PNG image to 'output.png'
Additional context / environment details
RTX 4090