indices should be either on cpu or on the same device as the indexed tensor (cpu) #239

andydhancock · 2022-08-23T14:28:10Z

Describe the bug

image_to_image.py line 92 throws the error above.

  `  init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)`

I've tried adding .to(self.device) to the 3 parameters.
Device should be 'cuda' though.

Reproduction

device = "cuda"

pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)

response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')

init_image = Image.open(io.BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image = preprocess(init_image)

outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

Logs

No response

System Info

diffusers==0.2.4
nvidia/cuda:11.3.0-cudnn8-devel-ubuntu20.04

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2022-08-24T06:31:58Z

@patil-suraj could you take a look here?

andydhancock · 2022-08-24T13:26:50Z

I downloaded the source and tracked the problem down to this line...
diffusers/schedulers/scheduling_pndm.py:254

    sqrt_alpha_prod = self.alphas_cumprod[timesteps] ** 0.5

andydhancock · 2022-08-25T09:25:22Z

Solved:
Maybe I should listen to the error message sometime.. it should all be cpu
init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)
becomes
init_latents = self.scheduler.add_noise(init_latents, noise, timesteps.cpu())
and all is well.

loretoparisi · 2022-08-25T10:06:10Z

Solved:

Maybe I should listen to the error message sometime.. it should all be cpu

init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)

becomes

init_latents = self.scheduler.add_noise(init_latents, noise, timesteps.cpu())

and all is well.

Is .to(device) not supported for timesteps then?

andydhancock · 2022-08-25T10:51:16Z

Solved:
Maybe I should listen to the error message sometime.. it should all be cpu
init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)
becomes
init_latents = self.scheduler.add_noise(init_latents, noise, timesteps.cpu())
and all is well.

Is .to(device) not supported for timesteps then?

I think it was supported but 'device' was cuda, and it's trying to do something with numpy which uses cpu

patil-suraj · 2022-08-26T10:43:13Z

Hi @andydhancock

This is working fine for me. Here's the code snippet I tried.

cd into examples/inference and run

import requests
from PIL import Image
from io import BytesIO

from torch import autocast


from image_to_image import *

device = "cuda"
pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image = preprocess(init_image)

prompt = "a cat, artstation"

with autocast("cuda"):
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

andydhancock · 2022-08-26T12:05:46Z

Hi @andydhancock

This is working fine for me. Here's the code snippet I tried.

cd into examples/inference and run

import requests
from PIL import Image
from io import BytesIO

from torch import autocast


from image_to_image import *

device = "cuda"
pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image = preprocess(init_image)

prompt = "a cat, artstation"

with autocast("cuda"):
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

Strange it wouldn't on mine then! I got it to work with the timesteps.cpu() thing anyways so not a problem anymore.

loretoparisi · 2022-08-26T16:02:38Z

@andydhancock thanks! On Apple M1 it worked in cpu removing FP16 and autocast:
🥇

import requests
from PIL import Image
from io import BytesIO

from torch import autocast

from image_to_image import *

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'running on {device}')

pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    #revision="fp16", 
    #torch_dtype=torch.float16,
    use_auth_token=True
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image = preprocess(init_image)

prompt = "a cat, artstation"
samples = 1
outputs = []
if device=='cuda':
    with autocast("cuda"):
        outputs = pipei2i(prompt=[prompt]*samples, 
            init_image=init_image, 
            #strength=0.75, 
            num_inference_steps=50,
            guidance_scale=7.5)
else:
    outputs = pipei2i(prompt=[prompt]*samples,
        init_image=init_image, 
        #strength=0.75, 
        num_inference_steps=50,
        guidance_scale=7.5)

A question, the default value guidance_scale was 0.75 not 7.5 as above, right? And same for strength I assume.
Thank you.

andydhancock · 2022-08-26T16:38:44Z

Yeah, guidance_scale is 7.5, strength 0.75.. typo.. they're variables in my actual code

ghost · 2022-09-01T01:13:04Z

Wait why was this closed? Shouldn't this be fixed in source? Or is the Collab someone else's problem?

patrickvonplaten · 2022-09-02T18:18:36Z

Doesn't this work for you @cherrerajobs:

import requests
from PIL import Image
from io import BytesIO

from torch import autocast


from diffusers import StableDiffusionImg2ImgPipeline

device = "cuda"
pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))

prompt = "a cat, artstation"

with autocast("cuda"):
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

It should work now on master

FahimF · 2022-09-03T00:36:16Z

Using the code from main branch and the following code, it fails on Apple Silicon devices:

import torch
import requests
from PIL import Image
from io import BytesIO

from torch import autocast


from diffusers import StableDiffusionImg2ImgPipeline

device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "stable-diffusion-v1-4",
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))

prompt = "a cat, artstation"

with autocast("cpu"):
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

The error is as follows:

Traceback (most recent call last):
  File "test.py", line 24, in <module>
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)
  File "/Users/fahim/miniforge3/envs/ml/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/fahim/miniforge3/envs/ml/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py", line 101, in __call__
    init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)
  File "/Users/fahim/miniforge3/envs/ml/lib/python3.8/site-packages/diffusers/schedulers/scheduling_pndm.py", line 270, in add_noise
    sqrt_alpha_prod = self.alphas_cumprod[timesteps] ** 0.5
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

It appears that self.alphas is on cpu while timesteps is on mps in scheduling_pndm.py on line 268? (I know it says 270 but I made some changes to the code there to check something ...)

If I change references to timesteps to timesteps.cpu() in that method, then the code works. But wouldn't it be better to put self.alphas_cumprod on whatever device is being used and so basically put that on the current device instead? Just asking since I'm not sure if that is a better idea or not ...

patrickvonplaten · 2022-09-03T10:30:28Z

Interesting! I don't have a mac computer - @patil-suraj @pcuenca do you have one? Could you give it a try maybe? :-)

FahimF · 2022-09-03T10:36:07Z

@patrickvonplaten Thanks for looking into this. I'm able to get around this currently by simply modifying code to do the following instead of the existing code at line 268 in scheduling_pndm.py:

        device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
        arr = self.alphas_cumprod.to(device)

Then I usee arr instead of self.alphas_cumprod in the rest of the method. Just wasn't sure if that was the best way to go ... I'm a developer, but not a Python developer 😄 So if there's anything I can do to help with troubleshooting the issue, please do let me know.

- See huggingface#239

patrickvonplaten · 2022-09-05T13:07:55Z

Think @pcuenca is currently working on a fix in #355 🙂

FahimF · 2022-09-05T13:53:38Z

Thanks guys 😄 Is there anything I should have done differently about this bug report? Just asking in case I should have reported MPS issues elsewhere?

I do have two other issues (both on MPS) which are not crashes but the behaviour is different to what is expected (and what is observed on non-MPS devices). So just wondering if I should report or wait till you guys are done with the MPS changes first?

Addresses #239.

pcuenca · 2022-09-05T15:18:39Z

@FahimF Thanks for reporting this!

By all means, do report any other issues you've found. If you are sure they are related with the mps device, feel free to say so in the title :)

FahimF · 2022-09-05T15:21:50Z

@pcuenca Thanks, will do 🙂 And yes, both are mps device issues though I don't know if they are in the Hugging Face code or upstream in Pytorch. Just that the behaviour is distinctly different than on a non-mps device. But will open new issues for those rather than discussing here ... unless there is already an existing issue. Haven't checked yet ...

@patrickvonplaten

* Initial support for mps in Stable Diffusion pipeline. * Initial "warmup" implementation when using mps. * Make some deterministic tests pass with mps. * Disable training tests when using mps. * SD: generate latents in CPU then move to device. This is especially important when using the mps device, because generators are not supported there. See for example pytorch/pytorch#84288. In addition, the other pipelines seem to use the same approach: generate the random samples then move to the appropriate device. After this change, generating an image in MPS produces the same result as when using the CPU, if the same seed is used. * Remove prints. * Pass AutoencoderKL test_output_pretrained with mps. Sampling from `posterior` must be done in CPU. * Style * Do not use torch.long for log op in mps device. * Perform incompatible padding ops in CPU. UNet tests now pass. See pytorch/pytorch#84535 * Style: fix import order. * Remove unused symbols. * Remove MPSWarmupMixin, do not apply automatically. We do apply warmup in the tests, but not during normal use. This adopts some PR suggestions by @patrickvonplaten. * Add comment for mps fallback to CPU step. * Add README_mps.md for mps installation and use. * Apply `black` to modified files. * Restrict README_mps to SD, show measures in table. * Make PNDM indexing compatible with mps. Addresses #239. * Do not use float64 when using LDMScheduler. Fixes #358. * Fix typo identified by @patil-suraj Co-authored-by: Suraj Patil <surajp815@gmail.com> * Adapt example to new output style. * Restore 1:1 results reproducibility with CompVis. However, mps latents need to be generated in CPU because generators don't work in the mps device. * Move PyTorch nightly to requirements. * Adapt `test_scheduler_outputs_equivalence` ton MPS. * mps: skip training tests instead of ignoring silently. * Make VQModel tests pass on mps. * mps ddim tests: warmup, increase tolerance. * ScoreSdeVeScheduler indexing made mps compatible. * Make ldm pipeline tests pass using warmup. * Style * Simplify casting as suggested in PR. * Add Known Issues to readme. * `isort` import order. * Remove _mps_warmup helpers from ModelMixin. And just make changes to the tests. * Skip tests using unittest decorator for consistency. * Remove temporary var. * Remove spurious blank space. * Remove unused symbol. * Remove README_mps. Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

FahimF · 2022-09-09T07:41:00Z

This issue appears to have cropped up again in the 0.3.0 release in scheduling_lms_discrete.py at line 187. I believe the fix might have been made in scheduling_pndm.py only perhaps?

Update: tagging @pcuenca since the ticket is closed and not sure if anybody gets notified.

patrickvonplaten · 2022-09-13T15:06:33Z

Should we maybe open a new issue here stating that this concerns mostly "mps"?

pcuenca · 2022-09-13T16:15:44Z

I just opened #501. If we keep discovering mps issues we could create a label for that.

@patrickvonplaten

* Initial support for mps in Stable Diffusion pipeline. * Initial "warmup" implementation when using mps. * Make some deterministic tests pass with mps. * Disable training tests when using mps. * SD: generate latents in CPU then move to device. This is especially important when using the mps device, because generators are not supported there. See for example pytorch/pytorch#84288. In addition, the other pipelines seem to use the same approach: generate the random samples then move to the appropriate device. After this change, generating an image in MPS produces the same result as when using the CPU, if the same seed is used. * Remove prints. * Pass AutoencoderKL test_output_pretrained with mps. Sampling from `posterior` must be done in CPU. * Style * Do not use torch.long for log op in mps device. * Perform incompatible padding ops in CPU. UNet tests now pass. See pytorch/pytorch#84535 * Style: fix import order. * Remove unused symbols. * Remove MPSWarmupMixin, do not apply automatically. We do apply warmup in the tests, but not during normal use. This adopts some PR suggestions by @patrickvonplaten. * Add comment for mps fallback to CPU step. * Add README_mps.md for mps installation and use. * Apply `black` to modified files. * Restrict README_mps to SD, show measures in table. * Make PNDM indexing compatible with mps. Addresses huggingface#239. * Do not use float64 when using LDMScheduler. Fixes huggingface#358. * Fix typo identified by @patil-suraj Co-authored-by: Suraj Patil <surajp815@gmail.com> * Adapt example to new output style. * Restore 1:1 results reproducibility with CompVis. However, mps latents need to be generated in CPU because generators don't work in the mps device. * Move PyTorch nightly to requirements. * Adapt `test_scheduler_outputs_equivalence` ton MPS. * mps: skip training tests instead of ignoring silently. * Make VQModel tests pass on mps. * mps ddim tests: warmup, increase tolerance. * ScoreSdeVeScheduler indexing made mps compatible. * Make ldm pipeline tests pass using warmup. * Style * Simplify casting as suggested in PR. * Add Known Issues to readme. * `isort` import order. * Remove _mps_warmup helpers from ModelMixin. And just make changes to the tests. * Skip tests using unittest decorator for consistency. * Remove temporary var. * Remove spurious blank space. * Remove unused symbol. * Remove README_mps. Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

andydhancock added the bug label Aug 23, 2022

andydhancock closed this as completed Aug 26, 2022

haowang1013 added a commit to directivegames/diffusers that referenced this issue Sep 4, 2022

Fixed the issue that add_noise crashes when using the mps device

8151089

- See huggingface#239

pcuenca added a commit that referenced this issue Sep 5, 2022

Make PNDM indexing compatible with mps.

5ed8889

Addresses #239.

FahimF mentioned this issue Sep 12, 2022

Apple MPS error in unet_2d_condition.py #358

Closed

pcuenca reopened this Sep 13, 2022

pcuenca mentioned this issue Sep 13, 2022

Fix MPS scheduler indexing when using mps #450

Merged

pcuenca mentioned this issue Sep 13, 2022

MPS: scheduler indices must be in the same device as the indexed tensor #501

Closed

pcuenca closed this as completed Sep 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indices should be either on cpu or on the same device as the indexed tensor (cpu) #239

indices should be either on cpu or on the same device as the indexed tensor (cpu) #239

andydhancock commented Aug 23, 2022

patrickvonplaten commented Aug 24, 2022

andydhancock commented Aug 24, 2022

andydhancock commented Aug 25, 2022

loretoparisi commented Aug 25, 2022

andydhancock commented Aug 25, 2022

patil-suraj commented Aug 26, 2022 •

edited

Loading

andydhancock commented Aug 26, 2022

loretoparisi commented Aug 26, 2022

andydhancock commented Aug 26, 2022

ghost commented Sep 1, 2022 •

edited by ghost

Loading

patrickvonplaten commented Sep 2, 2022

FahimF commented Sep 3, 2022

patrickvonplaten commented Sep 3, 2022

FahimF commented Sep 3, 2022

patrickvonplaten commented Sep 5, 2022

FahimF commented Sep 5, 2022

pcuenca commented Sep 5, 2022 •

edited

Loading

FahimF commented Sep 5, 2022

FahimF commented Sep 9, 2022 •

edited

Loading

patrickvonplaten commented Sep 13, 2022

pcuenca commented Sep 13, 2022

indices should be either on cpu or on the same device as the indexed tensor (cpu) #239

indices should be either on cpu or on the same device as the indexed tensor (cpu) #239

Comments

andydhancock commented Aug 23, 2022

Describe the bug

Reproduction

Logs

System Info

patrickvonplaten commented Aug 24, 2022

andydhancock commented Aug 24, 2022

andydhancock commented Aug 25, 2022

loretoparisi commented Aug 25, 2022

andydhancock commented Aug 25, 2022

patil-suraj commented Aug 26, 2022 • edited Loading

andydhancock commented Aug 26, 2022

loretoparisi commented Aug 26, 2022

andydhancock commented Aug 26, 2022

ghost commented Sep 1, 2022 • edited by ghost Loading

patrickvonplaten commented Sep 2, 2022

FahimF commented Sep 3, 2022

patrickvonplaten commented Sep 3, 2022

FahimF commented Sep 3, 2022

patrickvonplaten commented Sep 5, 2022

FahimF commented Sep 5, 2022

pcuenca commented Sep 5, 2022 • edited Loading

FahimF commented Sep 5, 2022

FahimF commented Sep 9, 2022 • edited Loading

patrickvonplaten commented Sep 13, 2022

pcuenca commented Sep 13, 2022

patil-suraj commented Aug 26, 2022 •

edited

Loading

ghost commented Sep 1, 2022 •

edited by ghost

Loading

pcuenca commented Sep 5, 2022 •

edited

Loading

FahimF commented Sep 9, 2022 •

edited

Loading