"RuntimeError: Input type (c10::Half) and bias type (float) should be the same" when running examples/make_music_video.py #150

philgzl · 2023-01-22T02:13:07Z

When trying to generate a music video using examples/make_music_video.py locally, I get the following error:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /zhome/d6/0/134239/stable-diffusion-videos/examples/make_music_video.py:49   │
│ in <module>                                                                  │
│                                                                              │
│   46 │   1326004,                                                            │
│   47 │   5019608,                                                            │
│   48 ]                                                                       │
│ ❱ 49 pipe.walk(                                                              │
│   50 │   prompts=prompts,                                                    │
│   51 │   seeds=seeds,                                                        │
│   52 │   num_interpolation_steps=num_interpolation_steps,                    │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/stable_diffusion_videos/stable_di │
│ ffusion_pipeline.py:840 in walk                                              │
│                                                                              │
│   837 │   │   │   audio_offset = audio_start_sec + sum(num_interpolation_ste │
│   838 │   │   │   audio_duration = num_step / fps                            │
│   839 │   │   │                                                              │
│ ❱ 840 │   │   │   self.make_clip_frames(                                     │
│   841 │   │   │   │   prompt_a,                                              │
│   842 │   │   │   │   prompt_b,                                              │
│   843 │   │   │   │   seed_a,                                                │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/stable_diffusion_videos/stable_di │
│ ffusion_pipeline.py:624 in make_clip_frames                                  │
│                                                                              │
│   621 │   │                                                                  │
│   622 │   │   frame_index = skip                                             │
│   623 │   │   for _, embeds_batch, noise_batch in batch_generator:           │
│ ❱ 624 │   │   │   outputs = self(                                            │
│   625 │   │   │   │   latents=noise_batch,                                   │
│   626 │   │   │   │   text_embeddings=embeds_batch,                          │
│   627 │   │   │   │   height=height,                                         │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /torch/autograd/grad_mode.py:27 in decorate_context                          │
│                                                                              │
│    24 │   │   @functools.wraps(func)                                         │
│    25 │   │   def decorate_context(*args, **kwargs):                         │
│    26 │   │   │   with self.clone():                                         │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                           │
│    28 │   │   return cast(F, decorate_context)                               │
│    29 │                                                                      │
│    30 │   def _wrap_generator(self, func):                                   │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/stable_diffusion_videos/stable_di │
│ ffusion_pipeline.py:527 in __call__                                          │
│                                                                              │
│   524 │   │   │   │   callback(i, t, latents)                                │
│   525 │   │                                                                  │
│   526 │   │   latents = 1 / 0.18215 * latents                                │
│ ❱ 527 │   │   image = self.vae.decode(latents).sample                        │
│   528 │   │                                                                  │
│   529 │   │   image = (image / 2 + 0.5).clamp(0, 1)                          │
│   530                                                                        │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /diffusers/models/vae.py:605 in decode                                       │
│                                                                              │
│   602 │   │   │   decoded_slices = [self._decode(z_slice).sample for z_slice │
│   603 │   │   │   decoded = torch.cat(decoded_slices)                        │
│   604 │   │   else:                                                          │
│ ❱ 605 │   │   │   decoded = self._decode(z).sample                           │
│   606 │   │                                                                  │
│   607 │   │   if not return_dict:                                            │
│   608 │   │   │   return (decoded,)                                          │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /diffusers/models/vae.py:576 in _decode                                      │
│                                                                              │
│   573 │   │   return AutoencoderKLOutput(latent_dist=posterior)              │
│   574 │                                                                      │
│   575 │   def _decode(self, z: torch.FloatTensor, return_dict: bool = True)  │
│ ❱ 576 │   │   z = self.post_quant_conv(z)                                    │
│   577 │   │   dec = self.decoder(z)                                          │
│   578 │   │                                                                  │
│   579 │   │   if not return_dict:                                            │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /torch/nn/modules/module.py:1194 in _call_impl                               │
│                                                                              │
│   1191 │   │   # this function, and just call forward.                       │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._ │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                     │
│   1195 │   │   # Do not call functions when jit is used                      │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []         │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:            │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /torch/nn/modules/conv.py:463 in forward                                     │
│                                                                              │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)     │
│    461 │                                                                     │
│    462 │   def forward(self, input: Tensor) -> Tensor:                       │
│ ❱  463 │   │   return self._conv_forward(input, self.weight, self.bias)      │
│    464                                                                       │
│    465 class Conv3d(_ConvNd):                                                │
│    466 │   __doc__ = r"""Applies a 3D convolution over an input signal compo │
│                                                                              │
│ /zhome/d6/0/134239/stable-diffusion-videos/venv/lib/python3.10/site-packages │
│ /torch/nn/modules/conv.py:459 in _conv_forward                               │
│                                                                              │
│    456 │   │   │   return F.conv2d(F.pad(input, self._reversed_padding_repea │
│    457 │   │   │   │   │   │   │   weight, bias, self.stride,                │
│    458 │   │   │   │   │   │   │   _pair(0), self.dilation, self.groups)     │
│ ❱  459 │   │   return F.conv2d(input, weight, bias, self.stride,             │
│    460 │   │   │   │   │   │   self.padding, self.dilation, self.groups)     │
│    461 │                                                                     │
│    462 │   def forward(self, input: Tensor) -> Tensor:                       │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

However when using the snippet in README.md everything works well.

The text was updated successfully, but these errors were encountered:

Atomic-Germ · 2023-01-22T20:23:13Z

Can you give a description of your environment? Python version, OS and version, etc.

philgzl · 2023-01-23T12:13:25Z

Python: 3.10.7
OS: Scientific Linux 7.9 (Nitrogen)

It seems like what is causing the issue is setting vae=AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-ema") when initializing the StableDiffusionWalkPipeline here. This is not done in the code snippet in README.md, and commenting this line out fixes the issue.

Maybe the script should be updated to also use SD 2.1. Maybe I open a PR.

lingster · 2023-03-28T22:26:06Z

alternatively you could resolve using this:

        revision = "fp16"
        model_path = "runwayml/stable-diffusion-v1-5"

# add this into your StableDiffusionWalkPipeline(): 

        vae=AutoencoderKL.from_pretrained(
            model_path,
            subfolder="vae",
            revision=revision,
            torch_dtype=torch_dtype
        ),

philgzl mentioned this issue Jan 24, 2023

Add CLI script #153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RuntimeError: Input type (c10::Half) and bias type (float) should be the same" when running examples/make_music_video.py #150

"RuntimeError: Input type (c10::Half) and bias type (float) should be the same" when running examples/make_music_video.py #150

philgzl commented Jan 22, 2023

Atomic-Germ commented Jan 22, 2023

philgzl commented Jan 23, 2023

lingster commented Mar 28, 2023

"RuntimeError: Input type (c10::Half) and bias type (float) should be the same" when running examples/make_music_video.py #150

"RuntimeError: Input type (c10::Half) and bias type (float) should be the same" when running examples/make_music_video.py #150

Comments

philgzl commented Jan 22, 2023

Atomic-Germ commented Jan 22, 2023

philgzl commented Jan 23, 2023

lingster commented Mar 28, 2023