-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Kandinsky 3.0 fails with "Out of memory" error when the pipeline starts to work.
When I try other models, like SDXL, there are no problems with it and code lines like "pipe.to('cuda')" work without problems, but when I try Kandinsky3 there are.
GPU: 1x T4 GPU (Google colab)
Reproduction
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
prompt = "Any prompt"
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0] # < Here is the error.
image.save('1.png')
Logs
Loading pipeline components...: 100%
5/5 [00:02<00:00, 1.75it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%
5/5 [00:01<00:00, 3.13it/s]
---------------------------------------------------------------------------
OutOfMemoryError Traceback (most recent call last)
<ipython-input-1-7c6f4c265399> in <cell line: 10>()
8
9 generator = torch.Generator(device="cpu").manual_seed(0)
---> 10 image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]
18 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in convert(t)
1156 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1157 non_blocking, memory_format=convert_to_format)
-> 1158 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
1159
1160 return self._apply(convert)
OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 24.81 MiB is free. Process 79636 has 14.72 GiB memory in use. Of the allocated memory 14.62 GiB is allocated by PyTorch, and 1.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
System Info
diffusers
version: 0.25.0.dev0- Platform: Linux-5.15.120+-x86_64-with-glibc2.35 (Google Colab)
- Python version: 3.10.12
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Huggingface_hub version: 0.19.4
- Transformers version: 4.35.2
- Accelerate version: 0.25.0
- xFormers version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working