Skip to content

Kandinsky 3.0 "CUDA Out of memory" error #6028

@Atm4x

Description

@Atm4x

Describe the bug

Kandinsky 3.0 fails with "Out of memory" error when the pipeline starts to work.

When I try other models, like SDXL, there are no problems with it and code lines like "pipe.to('cuda')" work without problems, but when I try Kandinsky3 there are.

GPU: 1x T4 GPU (Google colab)

Reproduction

from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
        
prompt = "Any prompt"

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0] # < Here is the error. 

image.save('1.png')

Logs

Loading pipeline components...: 100%
5/5 [00:02<00:00, 1.75it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%
5/5 [00:01<00:00, 3.13it/s]
---------------------------------------------------------------------------
OutOfMemoryError                          Traceback (most recent call last)
<ipython-input-1-7c6f4c265399> in <cell line: 10>()
      8 
      9 generator = torch.Generator(device="cpu").manual_seed(0)
---> 10 image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

18 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in convert(t)
   1156                 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
   1157                             non_blocking, memory_format=convert_to_format)
-> 1158             return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
   1159 
   1160         return self._apply(convert)

OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 24.81 MiB is free. Process 79636 has 14.72 GiB memory in use. Of the allocated memory 14.62 GiB is allocated by PyTorch, and 1.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

System Info

  • diffusers version: 0.25.0.dev0
  • Platform: Linux-5.15.120+-x86_64-with-glibc2.35 (Google Colab)
  • Python version: 3.10.12
  • PyTorch version (GPU?): 2.1.0+cu121 (True)
  • Huggingface_hub version: 0.19.4
  • Transformers version: 4.35.2
  • Accelerate version: 0.25.0
  • xFormers version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@yiyixuxu @patrickvonplaten

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions