Skip to content

Conversation

leejet
Copy link
Owner

@leejet leejet commented Oct 10, 2025

Qwen Image Edit

.\bin\Release\sd.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\Qwen_Image_Edit-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\qwen_image_vae.safetensors  --qwen2vl ..\..\ComfyUI\models\text_encoders\qwen_2.5_vl_7b.safetensors --cfg-scale 2.5 --sampling-method euler -v --offload-to-cpu --diffusion-fa --flow-shift 3 -r ..\assets\flux\flux1-dev-q8_0.png -p "change 'flux.cpp' to 'edit.cpp'" --seed 1118877715456453
qwen_image_edit

Qwen Image Edit 2509

.\bin\Release\sd.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\Qwen-Image-Edit-2509-Q8_0.gguf --vae ..\..\ComfyUI\models\vae\qwen_image_vae.safetensors  --qwen2vl ..\..\ComfyUI\models\text_encoders\qwen_2.5_vl_7b.safetensors --cfg-scale 2.5 --sampling-method euler -v --offload-to-cpu --diffusion-fa --flow-shift 3 -r .\qwen-pose2.png -r .\replicate-prediction-2rq8q6nrg5rmc0csex6818jzk8.jpeg -p "The woman in image 2 adopts the pose from image 1" -H 1024 -W 1024

image 1:

image 2:

result:
qwen_image_edit_multi

@leejet leejet mentioned this pull request Oct 10, 2025
@LostRuins
Copy link
Contributor

LostRuins commented Oct 11, 2025

Seems to work fine on vulkan.

Edit: Running multiple generations on the same instance causes issues.
I get conditioner.hpp:1558: GGML_ASSERT(hidden_states->ne[1] > prompt_template_encode_start_idx)
I think this can be fixed by resetting prompt_template_encode_start_idx back to 34.

The quality is... weird. It seems a lot less coherent than the reference implementation. For simple tasks like background removal it's fine, but anything else seems off. Are we supposed to ensure the input reference image and output image is exactly the same size?

@leejet
Copy link
Owner Author

leejet commented Oct 11, 2025

I get conditioner.hpp:1558: GGML_ASSERT(hidden_states->ne[1] > prompt_template_encode_start_idx)
I think this can be fixed by resetting prompt_template_encode_start_idx back to 34.

@LostRuins The Qwen image edit model uses a different system prompt, so it requires a different prompt_template_encode_start_idx. Can you share the detailed output? In theory, this issue shouldn’t be triggered.

The quality is... weird. It seems a lot less coherent than the reference implementation. For simple tasks like background removal it's fine, but anything else seems off.

Can you give an example?

Are we supposed to ensure the input reference image and output image is exactly the same size?

That’s not necessary — the Qwen image edit pipeline will automatically resize the reference image to an appropriate size.

@LostRuins
Copy link
Contributor

In theory, this issue shouldn’t be triggered.

It will not be triggered in CLI, but in server mode it can be, because you initialize the Conditioner once on model load

struct Qwen2_5_VLCLIPEmbedder : public Conditioner {
    Qwen::Qwen2Tokenizer tokenizer;
    std::shared_ptr<Qwen::Qwen2_5_VLRunner> qwenvl;
    int prompt_template_encode_start_idx = 34;

later you overwrite it, but never reset it back if it is reused without a ref image later

    SDCondition get_learned_condition(ggml_context* work_ctx,
                                      int n_threads,
                                      const ConditionerParams& conditioner_params) {
        std::string prompt;
        std::vector<std::pair<int, ggml_tensor*>> image_embeds;
        size_t system_prompt_length = 0;
        if (qwenvl->enable_vision && conditioner_params.ref_images.size() > 0) {
            LOG_INFO("QwenImageEditPlusPipeline");
            prompt_template_encode_start_idx = 64;                            //this is permanent!!

this is a simple fix:

    SDCondition get_learned_condition(ggml_context* work_ctx,
                                      int n_threads,
                                      const ConditionerParams& conditioner_params) {
        std::string prompt;
        std::vector<std::pair<int, ggml_tensor*>> image_embeds;
        size_t system_prompt_length = 0;
        prompt_template_encode_start_idx = 34;                //reset it back in case the user removes their reference images.
        if (qwenvl->enable_vision && conditioner_params.ref_images.size() > 0) {
            LOG_INFO("QwenImageEditPlusPipeline");
            prompt_template_encode_start_idx = 64;

Can you give an example?

Sure, the below was done with 20 steps on Qwen_Image_Edit-Q4_K_S.gguf

Prompt 1: Remove the background
s1
Result 1:
r1

Prompt 2: Change the hair color to blue and add a cat
s2

Result 2:
r2

In each case I seem to be losing a bunch of quality and detail compared to the source. It's hard to explain exactly what I mean but hopefully the pictures make sense.

@leejet
Copy link
Owner Author

leejet commented Oct 11, 2025

The result of q8_0 looks good.

Prompt 1: Remove the background

p1

Prompt 2: Change the hair color to blue and add a cat

p2

@leejet
Copy link
Owner Author

leejet commented Oct 12, 2025

The results of q4_k_s also look good now.

 .\bin\Release\sd.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\Qwen-Image-Edit-2509-Q4_K_S.gguf --vae ..\..\ComfyUI\models\vae\qwen_image_vae.safetensors  --qwen2vl ..\..\ComfyUI\models\text_encoders\Qwen2.5-VL-7B-Instruct-Q8_0.gguf --qwen2vl_vision ..\..\ComfyUI\models\text_encoders\Qwen2.5-VL-7B-Instruct.mmproj-Q8_0.gguf --cfg-scale 2.5 --sampling-method euler -v --offload-to-cpu --diffusion-fa --flow-shift 3 -r girl.png -p "Remove the background"
output

@wbruna
Copy link
Contributor

wbruna commented Oct 12, 2025

Just confirming the Pruning models work fine with this branch. I only noticed very small image changes between this branch and the qwen_edit + Pruning PR.

@LostRuins
Copy link
Contributor

Just a matter of curiosity @leejet , how did you arrive at a value of 1/128.f for the precision fix scaler for qwen (and also why is it 1/32 for the t5 and to_add_out)?

@leejet
Copy link
Owner Author

leejet commented Oct 12, 2025

Just a matter of curiosity @leejet , how did you arrive at a value of 1/128.f for the precision fix scaler for qwen (and also why is it 1/32 for the t5 and to_add_out)?

The scaling value was determined through testing. I tested with different prompts and tried to keep the scaling value as small as possible while ensuring the issue was fixed.

@LostRuins
Copy link
Contributor

LostRuins commented Oct 12, 2025

image

much better now!

The quality has improved a lot after the fixes

@leejet leejet changed the base branch from qwen_image to master October 12, 2025 16:06
@leejet leejet merged commit 2e9242e into master Oct 13, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants