Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indices should be either on cpu or on the same device as the indexed tensor (cpu) #239

Closed
andydhancock opened this issue Aug 23, 2022 · 21 comments
Labels
bug Something isn't working

Comments

@andydhancock
Copy link

Describe the bug

image_to_image.py line 92 throws the error above.

  `  init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)`

I've tried adding .to(self.device) to the 3 parameters.
Device should be 'cuda' though.

Reproduction

device = "cuda"

pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)

response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')

init_image = Image.open(io.BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image = preprocess(init_image)

outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

Logs

No response

System Info

diffusers==0.2.4
nvidia/cuda:11.3.0-cudnn8-devel-ubuntu20.04
@andydhancock andydhancock added the bug Something isn't working label Aug 23, 2022
@patrickvonplaten
Copy link
Contributor

@patil-suraj could you take a look here?

@andydhancock
Copy link
Author

I downloaded the source and tracked the problem down to this line...
diffusers/schedulers/scheduling_pndm.py:254

    sqrt_alpha_prod = self.alphas_cumprod[timesteps] ** 0.5

@andydhancock
Copy link
Author

Solved:
Maybe I should listen to the error message sometime.. it should all be cpu
init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)
becomes
init_latents = self.scheduler.add_noise(init_latents, noise, timesteps.cpu())
and all is well.

@loretoparisi
Copy link

Solved:

Maybe I should listen to the error message sometime.. it should all be cpu

init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)

becomes

init_latents = self.scheduler.add_noise(init_latents, noise, timesteps.cpu())

and all is well.

Is .to(device) not supported for timesteps then?

@andydhancock
Copy link
Author

Solved:
Maybe I should listen to the error message sometime.. it should all be cpu
init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)
becomes
init_latents = self.scheduler.add_noise(init_latents, noise, timesteps.cpu())
and all is well.

Is .to(device) not supported for timesteps then?

I think it was supported but 'device' was cuda, and it's trying to do something with numpy which uses cpu

@patil-suraj
Copy link
Contributor

patil-suraj commented Aug 26, 2022

Hi @andydhancock

This is working fine for me. Here's the code snippet I tried.

cd into examples/inference and run

import requests
from PIL import Image
from io import BytesIO

from torch import autocast


from image_to_image import *

device = "cuda"
pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image = preprocess(init_image)

prompt = "a cat, artstation"

with autocast("cuda"):
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

@andydhancock
Copy link
Author

Hi @andydhancock

This is working fine for me. Here's the code snippet I tried.

cd into examples/inference and run

import requests
from PIL import Image
from io import BytesIO

from torch import autocast


from image_to_image import *

device = "cuda"
pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image = preprocess(init_image)

prompt = "a cat, artstation"

with autocast("cuda"):
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

Strange it wouldn't on mine then! I got it to work with the timesteps.cpu() thing anyways so not a problem anymore.

@loretoparisi
Copy link

@andydhancock thanks! On Apple M1 it worked in cpu removing FP16 and autocast:
🥇

import requests
from PIL import Image
from io import BytesIO

from torch import autocast

from image_to_image import *

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'running on {device}')

pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    #revision="fp16", 
    #torch_dtype=torch.float16,
    use_auth_token=True
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))
init_image = preprocess(init_image)

prompt = "a cat, artstation"
samples = 1
outputs = []
if device=='cuda':
    with autocast("cuda"):
        outputs = pipei2i(prompt=[prompt]*samples, 
            init_image=init_image, 
            #strength=0.75, 
            num_inference_steps=50,
            guidance_scale=7.5)
else:
    outputs = pipei2i(prompt=[prompt]*samples,
        init_image=init_image, 
        #strength=0.75, 
        num_inference_steps=50,
        guidance_scale=7.5)

A question, the default value guidance_scale was 0.75 not 7.5 as above, right? And same for strength I assume.
Thank you.

@andydhancock
Copy link
Author

Yeah, guidance_scale is 7.5, strength 0.75.. typo.. they're variables in my actual code

@cherrerajobs
Copy link

cherrerajobs commented Sep 1, 2022

Wait why was this closed? Shouldn't this be fixed in source? Or is the Collab someone else's problem?

@patrickvonplaten
Copy link
Contributor

Doesn't this work for you @cherrerajobs:

import requests
from PIL import Image
from io import BytesIO

from torch import autocast


from diffusers import StableDiffusionImg2ImgPipeline

device = "cuda"
pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))

prompt = "a cat, artstation"

with autocast("cuda"):
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

It should work now on master

@FahimF
Copy link

FahimF commented Sep 3, 2022

Using the code from main branch and the following code, it fails on Apple Silicon devices:

import torch
import requests
from PIL import Image
from io import BytesIO

from torch import autocast


from diffusers import StableDiffusionImg2ImgPipeline

device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
pipei2i = StableDiffusionImg2ImgPipeline.from_pretrained(
    "stable-diffusion-v1-4",
).to(device)


response = requests.get('https://pbs.twimg.com/media/Fa1_7_vWYAEwfX-.png')
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))

prompt = "a cat, artstation"

with autocast("cpu"):
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)

The error is as follows:

Traceback (most recent call last):
  File "test.py", line 24, in <module>
    outputs = pipei2i(prompt=prompt, init_image=init_image, strength=0.75, num_inference_steps=75,guidance_scale=0.75)
  File "/Users/fahim/miniforge3/envs/ml/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/fahim/miniforge3/envs/ml/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py", line 101, in __call__
    init_latents = self.scheduler.add_noise(init_latents, noise, timesteps)
  File "/Users/fahim/miniforge3/envs/ml/lib/python3.8/site-packages/diffusers/schedulers/scheduling_pndm.py", line 270, in add_noise
    sqrt_alpha_prod = self.alphas_cumprod[timesteps] ** 0.5
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

It appears that self.alphas is on cpu while timesteps is on mps in scheduling_pndm.py on line 268? (I know it says 270 but I made some changes to the code there to check something ...)

If I change references to timesteps to timesteps.cpu() in that method, then the code works. But wouldn't it be better to put self.alphas_cumprod on whatever device is being used and so basically put that on the current device instead? Just asking since I'm not sure if that is a better idea or not ...

@patrickvonplaten
Copy link
Contributor

Interesting! I don't have a mac computer - @patil-suraj @pcuenca do you have one? Could you give it a try maybe? :-)

@FahimF
Copy link

FahimF commented Sep 3, 2022

@patrickvonplaten Thanks for looking into this. I'm able to get around this currently by simply modifying code to do the following instead of the existing code at line 268 in scheduling_pndm.py:

        device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
        arr = self.alphas_cumprod.to(device)

Then I usee arr instead of self.alphas_cumprod in the rest of the method. Just wasn't sure if that was the best way to go ... I'm a developer, but not a Python developer 😄 So if there's anything I can do to help with troubleshooting the issue, please do let me know.

haowang1013 added a commit to directivegames/diffusers that referenced this issue Sep 4, 2022
@patrickvonplaten
Copy link
Contributor

Think @pcuenca is currently working on a fix in #355 🙂

@FahimF
Copy link

FahimF commented Sep 5, 2022

Thanks guys 😄 Is there anything I should have done differently about this bug report? Just asking in case I should have reported MPS issues elsewhere?

I do have two other issues (both on MPS) which are not crashes but the behaviour is different to what is expected (and what is observed on non-MPS devices). So just wondering if I should report or wait till you guys are done with the MPS changes first?

pcuenca added a commit that referenced this issue Sep 5, 2022
@pcuenca
Copy link
Member

pcuenca commented Sep 5, 2022

@FahimF Thanks for reporting this!

By all means, do report any other issues you've found. If you are sure they are related with the mps device, feel free to say so in the title :)

@FahimF
Copy link

FahimF commented Sep 5, 2022

@pcuenca Thanks, will do 🙂 And yes, both are mps device issues though I don't know if they are in the Hugging Face code or upstream in Pytorch. Just that the behaviour is distinctly different than on a non-mps device. But will open new issues for those rather than discussing here ... unless there is already an existing issue. Haven't checked yet ...

pcuenca added a commit that referenced this issue Sep 8, 2022
* Initial support for mps in Stable Diffusion pipeline.

* Initial "warmup" implementation when using mps.

* Make some deterministic tests pass with mps.

* Disable training tests when using mps.

* SD: generate latents in CPU then move to device.

This is especially important when using the mps device, because
generators are not supported there. See for example
pytorch/pytorch#84288.

In addition, the other pipelines seem to use the same approach: generate
the random samples then move to the appropriate device.

After this change, generating an image in MPS produces the same result
as when using the CPU, if the same seed is used.

* Remove prints.

* Pass AutoencoderKL test_output_pretrained with mps.

Sampling from `posterior` must be done in CPU.

* Style

* Do not use torch.long for log op in mps device.

* Perform incompatible padding ops in CPU.

UNet tests now pass.
See pytorch/pytorch#84535

* Style: fix import order.

* Remove unused symbols.

* Remove MPSWarmupMixin, do not apply automatically.

We do apply warmup in the tests, but not during normal use.
This adopts some PR suggestions by @patrickvonplaten.

* Add comment for mps fallback to CPU step.

* Add README_mps.md for mps installation and use.

* Apply `black` to modified files.

* Restrict README_mps to SD, show measures in table.

* Make PNDM indexing compatible with mps.

Addresses #239.

* Do not use float64 when using LDMScheduler.

Fixes #358.

* Fix typo identified by @patil-suraj

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Adapt example to new output style.

* Restore 1:1 results reproducibility with CompVis.

However, mps latents need to be generated in CPU because generators
don't work in the mps device.

* Move PyTorch nightly to requirements.

* Adapt `test_scheduler_outputs_equivalence` ton MPS.

* mps: skip training tests instead of ignoring silently.

* Make VQModel tests pass on mps.

* mps ddim tests: warmup, increase tolerance.

* ScoreSdeVeScheduler indexing made mps compatible.

* Make ldm pipeline tests pass using warmup.

* Style

* Simplify casting as suggested in PR.

* Add Known Issues to readme.

* `isort` import order.

* Remove _mps_warmup helpers from ModelMixin.

And just make changes to the tests.

* Skip tests using unittest decorator for consistency.

* Remove temporary var.

* Remove spurious blank space.

* Remove unused symbol.

* Remove README_mps.

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
@FahimF
Copy link

FahimF commented Sep 9, 2022

This issue appears to have cropped up again in the 0.3.0 release in scheduling_lms_discrete.py at line 187. I believe the fix might have been made in scheduling_pndm.py only perhaps?

Update: tagging @pcuenca since the ticket is closed and not sure if anybody gets notified.

@patrickvonplaten
Copy link
Contributor

Should we maybe open a new issue here stating that this concerns mostly "mps"?

@pcuenca
Copy link
Member

pcuenca commented Sep 13, 2022

I just opened #501. If we keep discovering mps issues we could create a label for that.

@pcuenca pcuenca closed this as completed Sep 13, 2022
PhaneeshB pushed a commit to nod-ai/diffusers that referenced this issue Mar 1, 2023
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this issue Dec 25, 2023
* Initial support for mps in Stable Diffusion pipeline.

* Initial "warmup" implementation when using mps.

* Make some deterministic tests pass with mps.

* Disable training tests when using mps.

* SD: generate latents in CPU then move to device.

This is especially important when using the mps device, because
generators are not supported there. See for example
pytorch/pytorch#84288.

In addition, the other pipelines seem to use the same approach: generate
the random samples then move to the appropriate device.

After this change, generating an image in MPS produces the same result
as when using the CPU, if the same seed is used.

* Remove prints.

* Pass AutoencoderKL test_output_pretrained with mps.

Sampling from `posterior` must be done in CPU.

* Style

* Do not use torch.long for log op in mps device.

* Perform incompatible padding ops in CPU.

UNet tests now pass.
See pytorch/pytorch#84535

* Style: fix import order.

* Remove unused symbols.

* Remove MPSWarmupMixin, do not apply automatically.

We do apply warmup in the tests, but not during normal use.
This adopts some PR suggestions by @patrickvonplaten.

* Add comment for mps fallback to CPU step.

* Add README_mps.md for mps installation and use.

* Apply `black` to modified files.

* Restrict README_mps to SD, show measures in table.

* Make PNDM indexing compatible with mps.

Addresses huggingface#239.

* Do not use float64 when using LDMScheduler.

Fixes huggingface#358.

* Fix typo identified by @patil-suraj

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Adapt example to new output style.

* Restore 1:1 results reproducibility with CompVis.

However, mps latents need to be generated in CPU because generators
don't work in the mps device.

* Move PyTorch nightly to requirements.

* Adapt `test_scheduler_outputs_equivalence` ton MPS.

* mps: skip training tests instead of ignoring silently.

* Make VQModel tests pass on mps.

* mps ddim tests: warmup, increase tolerance.

* ScoreSdeVeScheduler indexing made mps compatible.

* Make ldm pipeline tests pass using warmup.

* Style

* Simplify casting as suggested in PR.

* Add Known Issues to readme.

* `isort` import order.

* Remove _mps_warmup helpers from ModelMixin.

And just make changes to the tests.

* Skip tests using unittest decorator for consistency.

* Remove temporary var.

* Remove spurious blank space.

* Remove unused symbol.

* Remove README_mps.

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants