[StableDiffusionPipeline] Image height and width both divisible by 8, throws error in unet_blocks

### Describe the bug

Dear diffusers team, 

I encountered a bug where the image width and height are both divisible by 8, but the prompt results in the following runtime error:  
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 18 but got size 17 for tensor number 1 in the list.

This doesn't happen when the image width and height are divisible by 64 instead (at least from the few options I can do with my graphics card).

I don't think this is a huge issue, I just found it interesting.

Best, Sebastian

### Reproduction

``` python
from diffusers import StableDiffusionPipeline
import torch
from torch import autocast

# load model
model = StableDiffusionPipeline.from_pretrained(
            "CompVis/stable-diffusion-v1-4",
            revision="fp16",
            torch_dtype=torch.float16,
            use_auth_token="" # TODO enter token here
        )
model.to('cuda')
with autocast('cuda'):
    # prompt
    image = model("astronaut on mars", width=512, height=520)["sample"]
```

### Logs

```shell
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "C:/Users/Watson/Desktop/gradio_diffusers/playground.py", line 12, in <module>
    image = model("astronaut on mars", width=512, height=520)["sample"]
  File "C:\Users\Watson\miniconda3\envs\diffuser\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Watson\miniconda3\envs\diffuser\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py", line 137, in __call__
    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings)["sample"]
  File "C:\Users\Watson\miniconda3\envs\diffuser\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Watson\miniconda3\envs\diffuser\lib\site-packages\diffusers\models\unet_2d_condition.py", line 168, in forward
    sample = upsample_block(
  File "C:\Users\Watson\miniconda3\envs\diffuser\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Watson\miniconda3\envs\diffuser\lib\site-packages\diffusers\models\unet_blocks.py", line 1034, in forward
    hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 18 but got size 17 for tensor number 1 in the list.

Process finished with exit code 1
```


### System Info

```shell
diffusers==0.2.4
torch: conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge

OS: Windows 11
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[StableDiffusionPipeline] Image height and width both divisible by 8, throws error in unet_blocks #255

Describe the bug

Reproduction

Logs

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[StableDiffusionPipeline] Image height and width both divisible by 8, throws error in unet_blocks #255

Description

Describe the bug

Reproduction

Logs

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions