Skip to content

LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #5897

@MhDang

Description

@MhDang

Describe the bug

I tried to experiment with LoRA training following examples/text_to_image/README.md#training-with-lora.

However, I got the error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm) on line 801.

The same issue did not occur when I was trying the the same example (with the implementation at that time) months ago. I noticed there were several commits after that.

I followed the README.md for installing packages and the non-LoRA training works well.

Thank you very much!

Reproduction

  1. Install packages following README.md:
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

Then cd in the folder examples/text_to_image and run

pip install -r requirements.txt
  1. in directory examples/text_to_image run the following
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME --caption_column="text" \
  --resolution=512 --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=100 --checkpointing_steps=5000 \
  --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --seed=42 \
  --output_dir="sd-pokemon-model-lora" \
  --validation_prompt="cute dragon creature" --report_to="wandb"

Logs

11/22/2023 08:36:20 - INFO - __main__ - ***** Running training *****
11/22/2023 08:36:20 - INFO - __main__ -   Num examples = 833
11/22/2023 08:36:20 - INFO - __main__ -   Num Epochs = 100
11/22/2023 08:36:20 - INFO - __main__ -   Instantaneous batch size per device = 1
11/22/2023 08:36:20 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
11/22/2023 08:36:20 - INFO - __main__ -   Gradient Accumulation steps = 1
11/22/2023 08:36:20 - INFO - __main__ -   Total optimization steps = 83300
Steps:   0%|                                                                                                                                | 0/83300 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "./repo/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 975, in <module>
    main()
  File "./repo/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 801, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 1075, in forward
    sample, res_samples = downsample_block(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/unet_2d_blocks.py", line 1160, in forward
    hidden_states = attn(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/transformer_2d.py", line 375, in forward
    hidden_states = block(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/attention.py", line 258, in forward
    attn_output = self.attn1(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 522, in forward
    return self.processor(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 1211, in __call__
    query = attn.to_q(hidden_states, *args)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/lora.py", line 433, in forward
    out = super().forward(hidden_states) + (scale * self.lora_layer(hidden_states))
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/lora.py", line 220, in forward
    down_hidden_states = self.down(hidden_states.to(dtype))
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapp
er_CUDA_mm)

System Info

  • diffusers version: 0.24.0.dev0
  • Platform: Linux-5.4.0-144-generic-x86_64-with-glibc2.31
  • Python version: 3.9.18
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Huggingface_hub version: 0.19.4
  • Transformers version: 4.35.2
  • Accelerate version: 0.24.1
  • xFormers version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

@sayakpaul @patrickvonplaten

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingstaleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions