Skip to content

Wan I2V support#788

Merged
quic-rishinr merged 7 commits intoquic:mainfrom
tv-karthikeya:wan_i2v_rebase
Mar 25, 2026
Merged

Wan I2V support#788
quic-rishinr merged 7 commits intoquic:mainfrom
tv-karthikeya:wan_i2v_rebase

Conversation

@tv-karthikeya
Copy link
Contributor

@tv-karthikeya tv-karthikeya commented Feb 10, 2026

Support for Wan Image to video model
Model card: "Wan-AI/Wan2.2-I2V-A14B-Diffusers"

@tv-karthikeya tv-karthikeya force-pushed the wan_i2v_rebase branch 2 times, most recently from 5981470 to f8b77d5 Compare February 16, 2026 07:39
@tv-karthikeya tv-karthikeya marked this pull request as ready for review February 16, 2026 08:01
@tv-karthikeya tv-karthikeya marked this pull request as draft February 16, 2026 11:23
@tv-karthikeya tv-karthikeya force-pushed the wan_i2v_rebase branch 4 times, most recently from 86953b9 to 48a2afa Compare March 9, 2026 10:33
@tv-karthikeya tv-karthikeya marked this pull request as ready for review March 9, 2026 10:34
@tv-karthikeya
Copy link
Contributor Author

tv-karthikeya commented Mar 10, 2026

Results for Wan-AI/Wan2.2-I2V-A14B-Diffusers 180p
https://github.com/user-attachments/assets/2b2f4f3e-a0f0-463e-a742-f9dfb6f2a195

QEfficient Diffusers Pipeline Inference Report
============================================================

Module-wise Inference Times:
------------------------------------------------------------
  Vae Encoder               89.2636 s
  Transformer               37.0940 s
    - Total steps: 4
    - Average per step:    9.2735 s
    - Min step time:       9.2583 s
    - Max step time:       9.2984 s
  Vae Decoder               5.8202 s
------------------------------------------------------------

End-to-End Inference Time: 132.1779 s

# - When you need to skip video generation and only prepare the model
#
# NOTE-1: If compile_config is not specified, the default configuration from
# QEfficient/diffusers/pipelines/wan/wan_i2v_config.json will be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path seems not correct, please correct and cross-check the whole statements.

latents = (1 - first_frame_mask) * condition + first_frame_mask * latents

# Step 9: Decode latents to video
if not output_type == "latent":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel overall call could be cleaned where you can remove the code which is not relevent in our case. For example here this if else condition can be removed.

self.model = model

# To have different hashing for encoder/decoder
self.model.config["type"] = type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed, with that flag we can use this class for both encoder and decoder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now config will be overwritten when both vae encoder, decoder are present, so removed this line.

Copy link
Contributor

@quic-amitraj quic-amitraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@quic-rishinr
Copy link
Contributor

Please update the documentation and add the model to validated list.

@@ -0,0 +1,655 @@
# -----------------------------------------------------------------------------
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both tests are taking around 37 mins. Please optimize it.

1154.56s call tests/diffusers/test_wan_i2v.py::test_wan_i2v_pipeline
1075.03s setup tests/diffusers/test_wan_i2v.py::test_wan_i2v_pipeline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update test with lower dims now it is taking 12 mins, will work on this in next PR to reduce further.

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
 - Updated project file to have diffusers config jsons, yaml files
 - refactor and clean up for wan i2v

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
@quic-rishinr quic-rishinr merged commit ea23498 into quic:main Mar 25, 2026
5 checks passed
quic-abhamidi pushed a commit to quic-abhamidi/efficient-transformers that referenced this pull request Mar 25, 2026
Support for Wan Image to video model
Model card:  "Wan-AI/Wan2.2-I2V-A14B-Diffusers"

---------

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
quic-akuruvil pushed a commit to quic-akuruvil/efficient_transformers that referenced this pull request Mar 25, 2026
Support for Wan Image to video model
Model card:  "Wan-AI/Wan2.2-I2V-A14B-Diffusers"

---------

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
tv-karthikeya added a commit to tv-karthikeya/efficient-transformers that referenced this pull request Mar 25, 2026
Support for Wan Image to video model
Model card:  "Wan-AI/Wan2.2-I2V-A14B-Diffusers"

---------

Signed-off-by: vtirumal <vtirumal@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants