New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP16 inference with Cuda 11.1 returns NaN on Nvidia GTX 1660 #58123
Comments
I'm unable to reproduce the NaN outputs on a 2080Ti (same compute capability |
Yes, every run and irrespective of the input values. I have tested with a 2080ti and a 1080ti and found valid outputs as well - I was only able to produce the NaN output on both a 1660 and a 1660ti. |
I've managed a minimal reproducible example with just Conv2d operations - see below:
produces
|
Thank you for the follow up! |
@hamishc the fix will land in the upcoming cudnn8.2.2 release (just verified it). |
@ptrblck I'm on cudnn v8.3.0.2 and I'm still seeing this. |
@illtellyoulater let me try to re-verify the fix and try to reproduce your issue to see if you are running into the same or a new issue. I'll ping you in the linked issue in case I need help in reproducing it. |
Hello, I've been debugging an issue for literally almost a week full-time now; a couple of users (both gtx 1660) somehow can't run VQGAN-CLIP powered by pytorch because some operations such as nn.Conv2D or nn.MultiheadAttention will give "nan" in certain situations that wouldn't happen on other machines. And, since I used pyinstaller to package the whole thing into a giant directory, it's highly unlikely to be caused by different library code. I googled pytorch gtx 1660 in desperation. This seems to be related. |
Updating above cuDNN 8.2.2 was sufficient to fix my issue even with cuda toolkit 11.3. It was not necessary to downgrade CUDA toolkit |
Same Problem here. (also same hardware) |
Just to clarify, as of 2023 this topic remains unsolved and no one can run Stable Difussion using the supposedly FP16-specially dedicated tensor cores provided by this card, or are there some news in this regard?
I'm running SD with the crippling |
I was able to isolate this problem to PyTorch using the following test cases on an NVidia GTX 1660 Super machine: (1) torch==2.0.1+cu117 torchvision==0.15.2+cu117 Without "--no-half", scenario (1) produces NaNs and scenario (2) does not produce NaNs. I've also tried doing scenario (1) with the current version of cudnn (8.9.4.25) but still got NaNs. I would guess that this issue would originate from PyTorch assuming that tensor cores are being used based on the fact that half-precision is being used, as 16xx cards are the only(?) scenario where that assumption would be incorrect. Only a guess, though. |
@andreszs I have found several ways around the blank images issue when not using "--no-half", but all of them increase the time it takes to generate images by at least 100%. This does have some use, as it reduces VRAM usage by almost 2GB during generation and may better enable you to play games and such during generation, but it's mostly bad. And if the game wants to use your fp16 cores, it may be bad for that, too. Still, ideally, TU116 cards would be able to run without "--no-half", never produce NaNs in any configuration, and receive a small speed boost and VRAM saving compared to running with "--no-half", but I can see how mixing this small quantity of fp16 cores with a larger quantity of fp32 cores in an efficient way might be difficult for PyTorch and other programs to do... |
Could you please tell me how you managed to accomplish the fix? I'm an unhappy 1660 user, and I'm willing to wait an extended period of time until I can afford a good video card if it will at least just work. |
Well, to be clear, the avenue I would most recommend is to use "--no-half" and stick to dimensions that 1660 Super can handle (512x768 with a 2x upscale is an example that should work). But if you're looking to force it to use the 128 FP16 cores, then it's all about having the right torch and torchvision versions and then not using "--no-half". One combination that does this is: torch==1.13.1+cu117 torchvision==0.14.1+cu117 You may be able to get newer versions of torch/torchvision to work without "--no-half" by manually updating your cudnn version, but you'd have to try it out. For a more industrious option: I wound up getting a Tesla M40 as a 2nd GPU along with a cooling fan for it (since it doesn't come with one) for < $150 USD. Though you'd have to make sure your rig can support it, fit it, power it, etc. It was a pain to setup initially, but I haven't had any problems since then and can even generate images using both cards at once. |
馃悰 Bug
Half precision inference returns NaNs for a number of models when run on a 1660 with Cuda 11.1
To Reproduce
Expected behavior
I would expect the output from each to be approximately the same for FP16 as with FP32, but the above script produces (truncated for clarity):
Testing with other nets produces the same results. e.g.
Running this same test on other GPUs (e.g. 1080ti, 2080ti) provides valid fp16 output on each, while testing on another machine with a 1660 produced the same results as above.
Similarly, the same test on a 1660 with torch 1.8.1 / cuda 10.2 is unable to reproduce the issue.
Environment
Pytorch 1.8.1+cu111 was installed with pip / python 3.7 from the following wheel:
https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp37-cp37m-linux_x86_64.whl
Additional context
cc @ngimel @csarofeen @ptrblck @xwang233
The text was updated successfully, but these errors were encountered: