Skip to content

[Bug] ERNIE - white blank (diffusion-fa and gen-size affected) #1447

@SolicTous

Description

@SolicTous

Git commit

44cca3d

Operating System & Version

Win 10

GGML backends

CUDA

Command-line arguments used

chcp 65001 "../sd-cli.exe" ^ --diffusion-model ../../Models/Diffusers/ERNIE-Image/ernie-image-turbo-UD-Q4_K_M.gguf ^ --vae ../../Models/Diffusers/Vae/vae_ernie_image_turbo.safetensors ^ --llm ../../Models/Diffusers/Ministral/ministral-3-3b.safetensors ^ -p "a lovely cat holding a sign says 'ernie.cpp'" ^ --cfg-scale 1.0 ^ --steps 8 ^ --diffusion-fa ^ -H 1024 ^ -W 1024 ^ -v ^ -o ../output/cli_out.jpg pause

Steps to reproduce

Run any model of Ernie-Image

What you expected to happen

Images as on example

What actually happened

Getting white blank

Image

Logs / error messages / stack trace

"../sd-cli.exe" --diffusion-model ../../Models/Diffusers/ERNIE-Image/ernie-image-turbo-Q8_0.gguf --vae ../../Models/Diffusers/Vae/vae_ernie_image_turbo.safetensors --llm ../../Models/Diffusers/Ministral/Ministral-3-3B-Instruct-2512-UD-Q8_K_XL.gguf -p "a lovely cat holding a sign says 'ernie.cpp'" --cfg-scale 1.0 --steps 8 --diffusion-fa -H 1024 -W 1024 -v -o ../output/cli_out.jpg
[DEBUG] main.cpp:547 - version: stable-diffusion.cpp version unknown, commit 44cca3d
[DEBUG] main.cpp:548 - System Info:
SSE3 = 1 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | VSX = 0 |
[DEBUG] main.cpp:549 - SDCliParams {
mode: img_gen,
output_path: "../output/cli_out.jpg",
image_path: "",
metadata_format: "text",
verbose: true,
color: false,
canny_preprocess: false,
convert_name: false,
preview_method: none,
preview_interval: 1,
preview_path: "preview.png",
preview_fps: 16,
taesd_preview: false,
preview_noisy: false,
metadata_raw: false,
metadata_brief: false,
metadata_all: false
}
[DEBUG] main.cpp:550 - SDContextParams {
n_threads: 8,
model_path: "",
clip_l_path: "",
clip_g_path: "",
clip_vision_path: "",
t5xxl_path: "",
llm_path: "../../Models/Diffusers/Ministral/Ministral-3-3B-Instruct-2512-UD-Q8_K_XL.gguf",
llm_vision_path: "",
diffusion_model_path: "../../Models/Diffusers/ERNIE-Image/ernie-image-turbo-Q8_0.gguf",
high_noise_diffusion_model_path: "",
vae_path: "../../Models/Diffusers/Vae/vae_ernie_image_turbo.safetensors",
taesd_path: "",
esrgan_path: "",
control_net_path: "",
embedding_dir: "",
embeddings: {
}
wtype: NONE,
tensor_type_rules: "",
lora_model_dir: ".",
photo_maker_path: "",
rng_type: cuda,
sampler_rng_type: NONE,
offload_params_to_cpu: false,
enable_mmap: false,
control_net_cpu: false,
clip_on_cpu: false,
vae_on_cpu: false,
flash_attn: false,
diffusion_flash_attn: true,
diffusion_conv_direct: false,
vae_conv_direct: false,
circular: false,
circular_x: false,
circular_y: false,
chroma_use_dit_mask: true,
qwen_image_zero_cond_t: false,
chroma_use_t5_mask: false,
chroma_t5_mask_pad: 1,
prediction: NONE,
lora_apply_mode: auto,
force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:551 - SDGenerationParams {
loras: "{
}",
high_noise_loras: "{
}",
prompt: "a lovely cat holding a sign says 'ernie.cpp'",
negative_prompt: "",
clip_skip: -1,
width: 1024,
height: 1024,
batch_count: 1,
init_image_path: "",
end_image_path: "",
mask_image_path: "",
control_image_path: "",
ref_image_paths: [],
control_video_path: "",
auto_resize_ref_image: true,
increase_ref_index: false,
pm_id_images_dir: "",
pm_id_embed_path: "",
pm_style_strength: 20,
skip_layers: [7, 8, 9],
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 8, eta: inf, shifted_timestep: 0, flow_shift: inf),
high_noise_skip_layers: [7, 8, 9],
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: inf, shifted_timestep: 0, flow_shift: inf),
custom_sigmas: [],
cache_mode: "",
cache_option: "",
cache: disabled (threshold=inf, start=0.15, end=0.95),
moe_boundary: 0.875,
video_frames: 1,
fps: 16,
vace_strength: 1,
strength: 0.75,
control_strength: 0.9,
seed: 42,
upscale_repeats: 1,
upscale_tile_size: 128,
vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
}
[DEBUG] stable-diffusion.cpp:175 - Using CUDA backend
[INFO ] ggml_extend.hpp:81 - ggml_cuda_init: found 1 CUDA devices (Total VRAM: 12281 MiB):
[INFO ] ggml_extend.hpp:81 - Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes, VRAM: 12281 MiB
[INFO ] stable-diffusion.cpp:269 - loading diffusion model from '../../Models/Diffusers/ERNIE-Image/ernie-image-turbo-Q8_0.gguf'
[INFO ] model.cpp:229 - load ../../Models/Diffusers/ERNIE-Image/ernie-image-turbo-Q8_0.gguf using gguf format
[DEBUG] model.cpp:278 - init from '../../Models/Diffusers/ERNIE-Image/ernie-image-turbo-Q8_0.gguf'
[INFO ] stable-diffusion.cpp:316 - loading llm from '../../Models/Diffusers/Ministral/Ministral-3-3B-Instruct-2512-UD-Q8_K_XL.gguf'
[INFO ] model.cpp:229 - load ../../Models/Diffusers/Ministral/Ministral-3-3B-Instruct-2512-UD-Q8_K_XL.gguf using gguf format
[DEBUG] model.cpp:278 - init from '../../Models/Diffusers/Ministral/Ministral-3-3B-Instruct-2512-UD-Q8_K_XL.gguf'
[INFO ] stable-diffusion.cpp:330 - loading vae from '../../Models/Diffusers/Vae/vae_ernie_image_turbo.safetensors'
[INFO ] model.cpp:232 - load ../../Models/Diffusers/Vae/vae_ernie_image_turbo.safetensors using safetensors format
[DEBUG] model.cpp:307 - init from '../../Models/Diffusers/Vae/vae_ernie_image_turbo.safetensors', prefix = 'vae.'
[INFO ] stable-diffusion.cpp:355 - Version: Ernie Image
[INFO ] stable-diffusion.cpp:383 - Weight type stat: f32: 203 | f16: 26 | q8_0: 410 | bf16: 254
[INFO ] stable-diffusion.cpp:384 - Conditioner weight type stat: f32: 53 | f16: 26 | q8_0: 157
[INFO ] stable-diffusion.cpp:385 - Diffusion model weight type stat: f32: 150 | q8_0: 253 | bf16: 6
[INFO ] stable-diffusion.cpp:386 - VAE weight type stat: bf16: 248
[DEBUG] stable-diffusion.cpp:388 - ggml tensor size = 400 bytes
[DEBUG] mistral_tokenizer.cpp:23 - vocab size: 131072
[DEBUG] mistral_tokenizer.cpp:31 - merges size 269443
[DEBUG] llm.hpp:697 - llm: num_layers = 26, vocab_size = 131072, hidden_size = 3072, intermediate_size = 9216
[INFO ] ernie_image.hpp:383 - ernie_image: layers = 36, hidden_size = 4096, heads = 32, ffn_hidden_size = 12288, in_channels = 128, out_channels = 128
[DEBUG] ggml_extend.hpp:2050 - ministral3.3b params backend buffer size = 4285.00 MB(VRAM) (236 tensors)
[DEBUG] ggml_extend.hpp:2050 - ernie_image params backend buffer size = 8292.08 MB(VRAM) (409 tensors)
[INFO ] stable-diffusion.cpp:681 - using VAE for encoding / decoding
[INFO ] auto_encoder_kl.hpp:517 - vae decoder: ch = 128
[DEBUG] ggml_extend.hpp:2050 - vae params backend buffer size = 94.72 MB(VRAM) (140 tensors)
[INFO ] stable-diffusion.cpp:776 - Using flash attention in the diffusion model
[DEBUG] stable-diffusion.cpp:805 - loading weights
[DEBUG] model.cpp:755 - using 8 threads for model loading
[DEBUG] model.cpp:777 - loading tensors from ../../Models/Diffusers/ERNIE-Image/ernie-image-turbo-Q8_0.gguf
|======================> | 409/893 - 3.23GB/s
[DEBUG] model.cpp:777 - loading tensors from ../../Models/Diffusers/Ministral/Ministral-3-3B-Instruct-2512-UD-Q8_K_XL.gguf
|====================================> | 645/893 - 3.24GB/s
[DEBUG] model.cpp:777 - loading tensors from ../../Models/Diffusers/Vae/vae_ernie_image_turbo.safetensors
|==================================================| 893/893 - 3.09GB/s
[INFO ] model.cpp:1012 - loading tensors completed, taking 4.01s (process: 0.00s, read: 2.39s, memcpy: 0.00s, convert: 0.04s, copy_to_backend: 1.03s)
[DEBUG] stable-diffusion.cpp:845 - finished loaded file
[INFO ] stable-diffusion.cpp:912 - total params memory size = 12671.79MB (VRAM 12671.79MB, RAM 0.00MB): text_encoders 4285.00MB(VRAM), diffusion_model 8292.08MB(VRAM), vae 94.72MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:981 - running in FLOW mode
[INFO ] stable-diffusion.cpp:3160 - generate_image 1024x1024
[INFO ] denoiser.hpp:499 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:2736 - sampling using Euler method
[DEBUG] conditioner.hpp:1699 - parse 'a lovely cat holding a sign says 'ernie.cpp'' to [['a lovely cat holding a sign says 'ernie.cpp'', 1], ]
[DEBUG] bpe_tokenizer.cpp:183 - split prompt "a lovely cat holding a sign says 'ernie.cpp'" to tokens ["a", "Ġlovely", "Ġcat", "Ġholding", "Ġa", "Ġsign", "Ġsays", "Ġ'", "ern", "ie", ".cpp", "'", ]
[DEBUG] ggml_extend.hpp:1862 - ministral3.3b compute buffer size: 1.42 MB(VRAM)
[DEBUG] conditioner.hpp:1953 - computing condition graph completed, taking 85 ms
[INFO ] stable-diffusion.cpp:3090 - get_learned_condition completed, taking 0.09s
[INFO ] stable-diffusion.cpp:3194 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1862 - ernie_image compute buffer size: 647.99 MB(VRAM)
|==================================================| 8/8 - 2.14s/it
[INFO ] stable-diffusion.cpp:3225 - sampling completed, taking 17.57s
[INFO ] stable-diffusion.cpp:3245 - generating 1 latent images completed, taking 17.69s
[INFO ] stable-diffusion.cpp:3114 - decoding 1 latents
[DEBUG] ggml_extend.hpp:1862 - vae compute buffer size: 6658.00 MB(VRAM)
[DEBUG] vae.hpp:206 - computing vae decode graph completed, taking 0.95s
[INFO ] stable-diffusion.cpp:3130 - latent 1 decoded, taking 0.99s
[INFO ] stable-diffusion.cpp:3134 - decode_first_stage completed, taking 0.99s
[INFO ] stable-diffusion.cpp:3255 - generate_image completed in 19.23s
[INFO ] main.cpp:438 - save result image 0 to '../output/cli_out.jpg' (success)
[INFO ] main.cpp:487 - 1/1 images saved

Additional context / environment details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions