New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.compile
error with dynamic=True
: Found <class 'sympy.core.relational.Unequality'>, which is not a supported top level IR node
#103587
Comments
I'll take a closer look in a bit, but try this patch:
Also, as an alternative, try instead setting this config:
which will enable dynamic shapes in a less aggressive way (we will be turning this on as default soon). |
Hi, I saw that Now I'm using
And this is the content from from ctypes import c_void_p, c_long
import torch
import math
import random
import os
import tempfile
from math import inf, nan
from torch._inductor.hooks import run_intermediate_hooks
from torch._inductor.utils import maybe_profile
from torch import empty_strided, as_strided, device
from torch._inductor.codecache import AsyncCompile
from torch._inductor.select_algorithm import extern_kernels
aten = torch.ops.aten
assert_size_stride = torch._C._dynamo.guards.assert_size_stride
async_compile = AsyncCompile()
async_compile.wait(globals())
del async_compile
def call(args):
arg0_1, arg1_1, arg2_1, arg3_1 = args
args.clear()
s0 = arg0_1
s1 = arg1_1
s2 = arg2_1
assert_size_stride(arg3_1, (s0, s1, s2, s2), (s1*(s2*s2), s2*s2, s2, 1))
return (Ne(Mod(s2, 8), 0), Ne(Mod(s2, 8), 0), )
def benchmark_compiled_module(times=10, repeat=10):
from torch._dynamo.testing import rand_strided
from torch._inductor.utils import print_performance
arg0_1 = 2
arg1_1 = 4
arg2_1 = 64
arg3_1 = rand_strided((2, 4, 64, 64), (16384, 4096, 64, 1), device='cuda:0', dtype=torch.float16)
return print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]), times=times, repeat=repeat)
if __name__ == "__main__":
from torch._inductor.utils import compiled_module_main
compiled_module_main('None', benchmark_compiled_module) And I missing something? |
No, you just hit another but. I'll send a patch tomorrow |
Confirmed with #104104 the test script runs all the way to completion. |
Confirmed it's working on pytorch-nightly 2.1.0.dev20230625. Thanks a lot! |
Sorry, it's working for the first run, but if the input shapes are changed for the second run, it seems to hang for a long time before I have to ctrl-C. Could you try this and confirm? from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"SG161222/Realistic_Vision_V2.0", torch_dtype=torch.float16
)
pipe.to("cuda:0")
pipe.unet = torch.compile(pipe.unet, dynamic=True)
pipe(prompt="prompt", height=512, width=512)
pipe(prompt="prompt", height=768, width=768) |
@sunhs Instead of |
@ezyang It completed to run, but still took a long time if the input shape changed. Already upgraded pytorch to from diffusers import StableDiffusionPipeline
import torch
import torch._dynamo
import datetime
torch._dynamo.config.dynamic_shapes = True
torch._dynamo.config.automatic_dynamic_shapes = True
torch._dynamo.config.assume_static_by_default = True
pipe = StableDiffusionPipeline.from_pretrained(
"SG161222/Realistic_Vision_V2.0", torch_dtype=torch.float16
)
pipe.to("cuda:0")
pipe.unet = torch.compile(pipe.unet)
start = datetime.datetime.now()
pipe(prompt="prompt", height=512, width=512)
first_done = datetime.datetime.now()
pipe(prompt="prompt", height=768, width=768)
second_done = datetime.datetime.now()
print("first elapsed:", first_done - start)
print("second elapsed:", second_done - first_done) Log:
Not sure if this |
So, I'm still checking up on things, but at least in the example you posted, I do expect you need to compile once: the first time we assume that you only wanted one image size, and then the second time we recompile trying to keep the kernel dynamic for size. I'm not too sure if the resulting kernel is dynamic or not, but when I asked it to do a third run at 512x 768 at least it didn't recompile. Do you have a list of sizes you want to work, by any chance? |
I'm wrong, it actually does recompile the third time. |
@sunhs Please feel free to file a new issue with more details about how few kernels you need for your use case. My impression with SD is there are not too many sizes people typically want to run generation on and it is not a big burden to compile each of them, but I could be wrong. The inductor generated kernels are specialized in somewhat hard to parse ways, so it will take some more serious debugging to diagnose. |
@ezyang It depends on the use case. For instance, an AI horde worker might get any arbitrary shape and size coming in from clients. Or even a single user running an img2img pipeline might swap out resolution very frequently. And thats just the tip of the iceberg... In the wild outside of pure 512x512 to 1024x1024, in 64 bit non square increments, is probably a sane range? I noticed that AITemplate allows for manually specifying a dynamic range and changing weights, which is very nice. @sunhs If you are running linux or WSL, you should probably give it a shot, as its very quick after the initial compile: https://github.com/facebookincubator/AITemplate/tree/main/examples/05_stable_diffusion#alternative-pipeline |
Ok. That suggests there aren't fundamental problems with dynamic compilation here, just need to knock out the problems. I will take a closer look at some point. |
馃悰 Describe the bug
Trying to perform
torch.compile
withdynamic=True
on Unet from hugging face's StableDiffusionPipeline, and error occurs.With
dynamic=False
it works, but every time the prompt is changed (and thus the prompt embedding could have a different shape) the compilation is triggered, which takes a long time.Reproduction code
Log
Versions
cc @ezyang @msaroufim @wconstab @bdhirsh @anijain2305
The text was updated successfully, but these errors were encountered: