wan2.2 multi-gpu inference issue

`VAE encoding: 100%|██████████| 1/1 [00:00<00:00,  1.12it/s]
VAE encoding: 100%|██████████| 1/1 [00:00<00:00,  1.16it/s]
VAE encoding: 100%|██████████| 1/1 [00:00<00:00,  1.15it/s]
VAE encoding: 100%|██████████| 1/1 [00:00<00:00,  1.11it/s]
  0%|          | 0/50 [00:00<?, ?it/s]
[rank1]: Traceback (most recent call last):
[rank1]:   File "/data2/users/fengxiaoyi/video_gen/Index-anisora/models/test_2_2.py", line 100, in <module>
[rank1]:     video = pipe(
[rank1]:             ^^^^^
[rank1]:   File "/data2/users/fengxiaoyi/conda_env/video/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank1]:     return func(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data2/users/fengxiaoyi/video_gen/Index-anisora/DiffSynth-Studio/diffsynth/pipelines/wan_video_new.py", line 449, in __call__
[rank1]:     noise_pred_posi = self.model_fn(**models, **inputs_shared, **inputs_posi, timestep=timestep)
[rank1]:                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data2/users/fengxiaoyi/video_gen/Index-anisora/DiffSynth-Studio/diffsynth/pipelines/wan_video_new.py", line 1101, in model_fn_wan_video
[rank1]:     x = block(x, context, t_mod, freqs)
[rank1]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data2/users/fengxiaoyi/conda_env/video/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data2/users/fengxiaoyi/conda_env/video/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data2/users/fengxiaoyi/video_gen/Index-anisora/DiffSynth-Studio/diffsynth/models/wan_video_dit.py", line 225, in forward
[rank1]:     input_x = modulate(self.norm1(x), shift_msa, scale_msa)
[rank1]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/data2/users/fengxiaoyi/video_gen/Index-anisora/DiffSynth-Studio/diffsynth/models/wan_video_dit.py", line 65, in modulate
[rank1]:     return (x * (1 + scale) + shift)
[rank1]:             ~~^~~~~~~~~~~~~
[rank1]: RuntimeError: The size of tensor a (878) must match the size of tensor b (3510) at non-singleton dimension 1
  0%|          | 0/50 [00:00<?, ?it/s]`


用的wan2.2-5B进行推理，单卡可以跑，但是多卡会出现这样的错误

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wan2.2 multi-gpu inference issue #717

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

wan2.2 multi-gpu inference issue #717

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions