-
Notifications
You must be signed in to change notification settings - Fork 6.3k
feat: Add QwenImageEditPlus to support future feature upgrades #12357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this! My comments are minor in nature except for the comment on using autocast
from VAE encoding.
We can actually also pass a dict to torch_dtype
when initializing the pipeline like so:
diffusers/tests/pipelines/test_pipelines_common.py
Lines 2331 to 2332 in edd614e
torch_dtype_dict = {specified_key: torch.bfloat16, "default": torch.float16} | |
loaded_pipe = self.pipeline_class.from_pretrained(tmpdirname, torch_dtype=torch_dtype_dict) |
LMK if anything is unclear.
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_edit_plus.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_edit_plus.py
Outdated
Show resolved
Hide resolved
|
||
def _encode_vae_image(self, image: torch.Tensor, generator: torch.Generator): | ||
origin_dtype = image.dtype | ||
with torch.autocast(image.device.type, torch.float32): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is a bit of an anti-pattern. We avoid using autocast
within pipeline code as much as possible. If the goal is to always ensure that the VAE encoding takes place in torch.float32
, then we could instruct the user in doing so, like we do in
https://huggingface.co/docs/diffusers/main/en/api/pipelines/wan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we might consider making FP32 VAE encoding the default — just to help users who may not catch the note in the documentation.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi @sayakpaul, Thank you for your review! I’ve updated the code to include the reuse statement as suggested — with the exception of two functions that behave differently by design. Regarding the autocast issue: if we use torch_dtype_dict during checkpoint saving, would that allow users to encode images in FP32 transparently? |
If we save using |
@naykun we can enforce upcast like in SDXL, but only when dtype isn't already torch.float32 diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py Line 1267 in edd614e
|
src/diffusers/pipelines/qwenimage/pipeline_qwenimage_edit_plus.py
Outdated
Show resolved
Hide resolved
@sayakpaul @yiyixuxu All other concerns have been addressed. Please let me know if you have any further suggestions or changes you’d like us to consider. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# QwenImage latents are turned into 2x2 patches and packed. This means the latent width and height has to be divisible | ||
# by the patch size. So the vae scale factor is multiplied by the patch size to account for this | ||
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor * 2) | ||
self.vl_processor = processor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit): we're not using vl_processor
anywhere in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch 🤗
Thank you! That would be so great 😊 |
Introduces QwenImageEditPlusPipeline to support upcoming version upgrade.
cc @yiyixuxu @sayakpaul