Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RuntimeError: Input type (c10::Half) and bias type (float) should be the same" when running examples/make_music_video.py #150

Open
philgzl opened this issue Jan 22, 2023 · 3 comments

Comments

@philgzl
Copy link
Contributor

philgzl commented Jan 22, 2023

When trying to generate a music video using examples/make_music_video.py locally, I get the following error:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /zhome/d6/0/134239/stable-diffusion-videos/examples/make_music_video.py:49   │
│ in <module>                                                                  │
│                                                                              │
│   46 │   1326004,                                                            │
│   47 │   5019608,                                                            │
│   48 ]                                                                       │
│ ❱ 49 pipe.walk(                                                              │
│   50 │   prompts=prompts,                                                    │
│   51 │   seeds=seeds,                                                        │
│   52 │   num_interpolation_steps=num_interpolation_steps,                    │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/stable_diffusion_videos/stable_di │
│ ffusion_pipeline.py:840 in walk                                              │
│                                                                              │
│   837 │   │   │   audio_offset = audio_start_sec + sum(num_interpolation_ste │
│   838 │   │   │   audio_duration = num_step / fps                            │
│   839 │   │   │                                                              │
│ ❱ 840 │   │   │   self.make_clip_frames(                                     │
│   841 │   │   │   │   prompt_a,                                              │
│   842 │   │   │   │   prompt_b,                                              │
│   843 │   │   │   │   seed_a,                                                │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/stable_diffusion_videos/stable_di │
│ ffusion_pipeline.py:624 in make_clip_frames                                  │
│                                                                              │
│   621 │   │                                                                  │
│   622 │   │   frame_index = skip                                             │
│   623 │   │   for _, embeds_batch, noise_batch in batch_generator:           │
│ ❱ 624 │   │   │   outputs = self(                                            │
│   625 │   │   │   │   latents=noise_batch,                                   │
│   626 │   │   │   │   text_embeddings=embeds_batch,                          │
│   627 │   │   │   │   height=height,                                         │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /torch/autograd/grad_mode.py:27 in decorate_context                          │
│                                                                              │
│    24 │   │   @functools.wraps(func)                                         │
│    25 │   │   def decorate_context(*args, **kwargs):                         │
│    26 │   │   │   with self.clone():                                         │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                           │
│    28 │   │   return cast(F, decorate_context)                               │
│    29 │                                                                      │
│    30 │   def _wrap_generator(self, func):                                   │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/stable_diffusion_videos/stable_di │
│ ffusion_pipeline.py:527 in __call__                                          │
│                                                                              │
│   524 │   │   │   │   callback(i, t, latents)                                │
│   525 │   │                                                                  │
│   526 │   │   latents = 1 / 0.18215 * latents                                │
│ ❱ 527 │   │   image = self.vae.decode(latents).sample                        │
│   528 │   │                                                                  │
│   529 │   │   image = (image / 2 + 0.5).clamp(0, 1)                          │
│   530                                                                        │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /diffusers/models/vae.py:605 in decode                                       │
│                                                                              │
│   602 │   │   │   decoded_slices = [self._decode(z_slice).sample for z_slice │
│   603 │   │   │   decoded = torch.cat(decoded_slices)                        │
│   604 │   │   else:                                                          │
│ ❱ 605 │   │   │   decoded = self._decode(z).sample                           │
│   606 │   │                                                                  │
│   607 │   │   if not return_dict:                                            │
│   608 │   │   │   return (decoded,)                                          │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /diffusers/models/vae.py:576 in _decode                                      │
│                                                                              │
│   573 │   │   return AutoencoderKLOutput(latent_dist=posterior)              │
│   574 │                                                                      │
│   575 │   def _decode(self, z: torch.FloatTensor, return_dict: bool = True)  │
│ ❱ 576 │   │   z = self.post_quant_conv(z)                                    │
│   577 │   │   dec = self.decoder(z)                                          │
│   578 │   │                                                                  │
│   579 │   │   if not return_dict:                                            │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /torch/nn/modules/module.py:1194 in _call_impl                               │
│                                                                              │
│   1191 │   │   # this function, and just call forward.                       │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._ │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                     │
│   1195 │   │   # Do not call functions when jit is used                      │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:            │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /torch/nn/modules/conv.py:463 in forward                                     │
│                                                                              │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)     │
│    461 │                                                                     │
│    462 │   def forward(self, input: Tensor) -> Tensor:                       │
│ ❱  463 │   │   return self._conv_forward(input, self.weight, self.bias)      │
│    464                                                                       │
│    465 class Conv3d(_ConvNd):                                                │
│    466 │   __doc__ = r"""Applies a 3D convolution over an input signal compo │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /torch/nn/modules/conv.py:459 in _conv_forward                               │
│                                                                              │
│    456 │   │   │   return F.conv2d(F.pad(input, self._reversed_padding_repea │
│    457 │   │   │   │   │   │   │   weight, bias, self.stride,                │
│    458 │   │   │   │   │   │   │   _pair(0), self.dilation, self.groups)     │
│ ❱  459 │   │   return F.conv2d(input, weight, bias, self.stride,             │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)     │
│    461 │                                                                     │
│    462 │   def forward(self, input: Tensor) -> Tensor:                       │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

However when using the snippet in README.md everything works well.

@Atomic-Germ
Copy link
Contributor

Can you give a description of your environment? Python version, OS and version, etc.

@philgzl
Copy link
Contributor Author

philgzl commented Jan 23, 2023

Python: 3.10.7
OS: Scientific Linux 7.9 (Nitrogen)

It seems like what is causing the issue is setting vae=AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema") when initializing the StableDiffusionWalkPipeline here. This is not done in the code snippet in README.md, and commenting this line out fixes the issue.

Maybe the script should be updated to also use SD 2.1. Maybe I open a PR.

@lingster
Copy link

alternatively you could resolve using this:

        revision = "fp16"
        model_path = "runwayml/stable-diffusion-v1-5"

# add this into your StableDiffusionWalkPipeline(): 

        vae=AutoencoderKL.from_pretrained(
            model_path,
            subfolder="vae",
            revision=revision,
            torch_dtype=torch_dtype
        ),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants