Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available! #38

Closed
1 task done
imamqaum1 opened this issue Mar 10, 2023 · 67 comments
Labels
directml DirectML related or specific issue

Comments

@imamqaum1
Copy link

imamqaum1 commented Mar 10, 2023

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

Stable diffusion crash, after generating some pixel and appear error : Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available!
Screenshot 2023-03-11 045325

Steps to reproduce the problem

  1. Go to Text2Img
  2. Insert prompt and negative promt
  3. Generating
    Screenshot 2023-03-11 045445

What should have happened?

Stable diffusion running normally, and generating some image

Commit where the problem happens

RuntimeError: Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available!

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Microsoft Edge

Command Line Arguments

--lowvram --disable-nan-check --autolaunch --no-half

List of extensions

No

Console logs

venv "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Commit hash: ff558348682fea569785dcfae1f1282cfbefda6b
Installing requirements for Web UI
Launching Web UI with arguments: --lowvram --disable-nan-check --autolaunch --no-half
Warning: experimental graphic memory optimization is disabled due to gpu vendor. Currently this optimization is only available for AMDGPUs.
Disabled experimental graphic memory optimizations.
Interrogations are fallen back to cpu. This doesn't affect on image generation. But if you want to use interrogate (CLIP or DeepBooru), check out this issue: https://github.com/lshqqytiger/stable-diffusion-webui-directml/issues/10
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
No module 'xformers'. Proceeding without it.
Loading weights [bfcaf07557] from D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\models\Stable-diffusion\768-v-ema.ckpt
Creating model from config: D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\configs\stable-diffusion\v2-inference-v.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Applying cross attention optimization (InvokeAI).
Textual inversion embeddings loaded(0):
Model loaded in 235.4s (load weights from disk: 133.7s, find config: 48.2s, load config: 0.3s, create model: 3.4s, apply weights to model: 40.4s, apply dtype to VAE: 0.8s, load VAE: 2.6s, move model to device: 5.0s, hijack: 0.1s, load textual inversion embeddings: 0.8s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Calculating sha256 for D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\models\Stable-diffusion\aresMix_v01.safetensors: 6ecece11bf069e9950746d33ab346826c5352acf047c64a3ab74c8884924adf0
Loading weights [6ecece11bf] from D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\models\Stable-diffusion\aresMix_v01.safetensors
Creating model from config: D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (InvokeAI).
Model loaded in 42.4s (create model: 1.7s, apply weights to model: 40.2s, load textual inversion embeddings: 0.2s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00,  8.41s/it]
Error completing request███████████████████████████████████████████████████████████████| 20/20 [02:05<00:00,  6.39s/it]
Arguments: ('task(c6cyhnv8oj55v19)', 'photo of a 22 years old Japanese girl, detailed facial features, beautiful detailed face, perfect face, dreamy face expression, high detailed skin, white skin texture, detailed eyes, seductive eyes, alluring eyes, beautiful eyes, full red lips, hourglass body, perfect body, skinny, petite, red pussy, showing pussy, nude, small breast, sitting, hijab, hijab, elegant, sexually suggestive, sex appeal, seductive look, bedroom, submissive, fantasy environment, magical atmosphere, dramatic style, golden hour, embers swirling, soft lighting, volumetric lighting, realistic lighting, cinematic lighting, natural lighting, long exposure trails, hyper detailed, sharp focus, bokeh, masterpiece, award winning photograph, epic character composition,Key light, backlight, soft natural lighting, photography 800 ISO film grain 50mm lens RAW aperture f1.6, highly detailed, Girl, full body, full body view, full body shoot, full body photograph', '(asian:1.2), black and white, sepia, bad art, b&w, canvas frame, cartoon, 3d, Photoshop, video game, 3d render, semi-realistic, cgi, render, sketch, drawing, anime, worst quality, low quality, jpeg artifacts, duplicate, messy drawing, black-white, doll, illustration, lowres, deformed, disfigured, mutation, amputation, distorted, mutated, mutilated, poorly drawn, bad anatomy, wrong anatomy, bad proportions, gross proportions, double body, long body, unnatural body, extra limb, missing limb, floating limb, disconnected limbs, malformed limbs, missing arms, extra arms, disappearing arms, missing legs, extra legs, broken legs, disappearing legs, deformed thighs, malformed hands, mutated hands and fingers, double hands, extra fingers, poorly drawn hands, mutated hands, fused fingers, too many fingers, poorly drawn feet, poorly drawn hands, big hands, hand with more than 5 fingers, hand with less than 5 fingers, bad feet, poorly drawn feet, fused feet, missing feet, bad knee, extra knee, more than 2 legs, poorly drawn face, cloned face, double face, bad hairs, poorly drawn hairs, fused hairs, cross-eye, ugly eyes, bad eyes, poorly drawn eyes, asymmetric eyes, cross-eyed, ugly mouth, missing teeth, crooked teeth, bad mouth, poorly drawn mouth, dirty teeth, bad tongue, fused ears, bad ears, poorly drawn ears, extra ears, heavy ears, missing ears, poorly drawn breasts, more than 2 nipples, missing nipples, different nipples, fused nipples, bad nipples, poorly drawn nipples, bad asshole, poorly drawn asshole, fused asshole, bad anus, bad pussy, bad crotch, fused anus, fused pussy, poorly drawn crotch, poorly drawn anus, poorly drawn pussy, bad clit, fused clit, fused pantie, poorly drawn pantie, fused cloth, poorly drawn cloth, bad pantie, obese, ugly, disgusting, morbid, big muscles, blurry, censored, oversaturated, watermark, watermarked, extra digit, fewer digits, signature, text', [], 20, 15, False, False, 1, 1, 6, -1.0, -1.0, 0, 0, 0, False, 720, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
Traceback (most recent call last):
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\processing.py", line 486, in process_images
    res = process_images_inner(p)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\processing.py", line 634, in process_images_inner
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\processing.py", line 634, in <listcomp>
    x_samples_ddim = [decode_first_stage(p.sd_model, samples_ddim[i:i+1].to(dtype=devices.dtype_vae))[0].cpu() for i in range(samples_ddim.size(0))]
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\processing.py", line 423, in decode_first_stage
    x = model.decode_first_stage(x)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 17, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 28, in __call__
    return self.__orig_func(*args, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage
    return self.first_stage_model.decode(z)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\modules\lowvram.py", line 52, in first_stage_model_decode_wrap
    return first_stage_model_decode(z)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode
    dec = self.decoder(z)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward
    h = self.up[i_level].block[i_block](h, temb)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 132, in forward
    h = nonlinearity(h)
  File "D:\Data Imam\Imam File\web-ui\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\functional.py", line 2059, in silu
    return torch._C._nn.silu(input)
RuntimeError: Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available!

Additional information

RX 570 4GB
Ryzen 5 3500
RAM 8GB single channel
Driver AMD Software PRO Edition
DirectX 12

@ethan0228
Copy link

me 2
I have same error...

@chenshaoju
Copy link

try add --precision full in COMMANDLINE_ARGS=

here is my example(5500XT):
set COMMANDLINE_ARGS=--listen --medvram --precision full --opt-split-attention-v1 --no-half --no-half-vae --opt-sub-quad-attention --disable-nan-check --use-cpu interrogate gfpgan bsrgan esrgan scunet codeformer

@Miraihi
Copy link

Miraihi commented Mar 11, 2023

First - the arguments.
Second - not sure what's the maximum resolution your GPU is capable of. I can generate a maximum of 600x800 on my RX 580 (8Gb) with arguments --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check.

@lylerolleman
Copy link

The above arguments got it working for me when doing single pictures. Batches still fail but it at least works (even on a 5800X, running this on CPU was painful...)

Tried with --lowvram with same results. Running an RX 580 8GB

@Miraihi
Copy link

Miraihi commented Mar 12, 2023

Tried with --lowvram with same results. Running an RX 580 8GB

--lowvram makes your GPU heavily limit its utilization (50-60%), so --medvram is a way to go. (Still have to check the lowvram box for ControlNet though).

@SunGreen777
Copy link

Thank you, RX 570 (8) ok

@TheWingAg
Copy link

First - the arguments. Second - not sure what's the maximum resolution your GPU is capable of. I can generate a maximum of 600x800 on my RX 580 (8Gb) with arguments --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check.

in my case, my computer: R2700 + RX6800 + 16GB ram, windown 10. i can generated image as 512 x 512 normaly but size image bigger, i can't generate. error: ....There is not enough GPU video memory available. hmm, rx6800 has 16G Vram.

@Miraihi
Copy link

Miraihi commented Mar 30, 2023

Some cards have their own quirks, search the mentions of your card in the discussions. The latest collection of arguments the community have come with is
set COMMANDLINE_ARGS=--medvram --precision full --no-half --no-half-vae --opt-split-attention-invokeai --always-batch-cond-uncond --opt-sub-quad-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --disable-nan-check --upcast-sampling
set SAFETENSORS_FAST_GPU=1

@justanothernguyen
Copy link

It is unfortunately because of the memory inefficiency of DirectML (what made this repo possible in the first place). Not able to use xformers also hurts performance and VRAM usage too.

What's weird is that when I run with 6900XT I noticed the "shared GPU memory" was being used (only for about 2GB, but still). This is not the case when I run regular A1111 webui with a 3060.

May be you can generate 512x512 and try to upscale in img2img using SD upscale (In the Script section at the bottom of img2img tab).

@TheWingAg
Copy link

It is unfortunately because of the memory inefficiency of DirectML (what made this repo possible in the first place). Not able to use xformers also hurts performance and VRAM usage too.

What's weird is that when I run with 6900XT I noticed the "shared GPU memory" was being used (only for about 2GB, but still). This is not the case when I run regular A1111 webui with a 3060.

May be you can generate 512x512 and try to upscale in img2img using SD upscale (In the Script section at the bottom of img2img tab).

thanks. me to. i use rx6800 - 16GB . i think that shared ram is avaiable because i see on tab manganer. shared ram is dont used. max size image is about 420.000 with width x height. Do u think so?

@tornado73
Copy link

tornado73 commented Apr 4, 2023

my 6800 ,win 11 pro, 22H2
Adrenalin Edition 23.4.1

1,
it is important for me - folder SD is in the root of drive C
2
.Open CMD in the root of the directory stable-diffusion-webui-directml.

git pull to ensure latest update
pip install -r requirements.txt

<- it was at this point I knew I effed up during initial setup because I saw several missing items getting installed.
3
For the webui-user.bat file, I added the following line set COMMANDLINE_ARGS=--medvram --precision full --no-half --no-half-vae --opt-sub-quad-attention --opt-split-attention --opt-split-attention-v1 --disable-nan-check --autolaunch


result 1024*1024

euler a ---- MAX 26/26 [01:16<00:00, 2.96s/it]
dpm++2m karras -----MAX 26/26 [02:19<00:18, 6.05s/it]

with my trained model .cpkl

3

model deliberate_v2 .safetensors
1024x1280
DPM++2m Karras ----- max 26/26 [01:50<00:00, 4.24s/it]

I usually generate 440 * 640, 4 pictures each and then the necessary upscale from Topaz Photo AI

Good luck

p.s. 1280*1280 RuntimeError: Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available! -)))

@thegr1mmer
Copy link

First - the arguments. Second - not sure what's the maximum resolution your GPU is capable of. I can generate a maximum of 600x800 on my RX 580 (8Gb) with arguments --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check.

in my case, my computer: R2700 + RX6800 + 16GB ram, windown 10. i can generated image as 512 x 512 normaly but size image bigger, i can't generate. error: ....There is not enough GPU video memory available. hmm, rx6800 has 16G Vram.

same exactly for me

@justanothernguyen
Copy link

Guys, have you tried the extension for tiled VAE? It should dramatically reduce VRAM usage

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions#multidiffusion-with-tiled-vae

@Neoony
Copy link

Neoony commented Apr 11, 2023

Also having this problem
It definitely got reduced by --opt-split-attention-v1 --opt-sub-quad-attention
However sometimes rarely it still crashes with this error

Running
--medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check
or
--precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check

Radeon 7900 XTX Nitro+ (24GB VRAM)

If I set very high resolution (e.g. above 1280) I will likely crash
If I go lower res (e.g. 768x1024), I can generate images fine...however doing bigger batch count, or generating some animation (e.g. deforum) it will eventually crash.
I can generate 500/hundreds of images fine...but then at some point it will crash with not enough memory
I have been messing with various settings, but no luck getting rid of it.
Will be checking if the tiled VAE is useful for this meanwhile
:(

@Miraihi
Copy link

Miraihi commented Apr 11, 2023

Also having this problem It definitely got reduced by --opt-split-attention-v1 --opt-sub-quad-attention However sometimes rarely it still crashes with this error

Running --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check or --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check

Recently I've revealed another combination of the arguments that (seemingly) allowed me to run the basic f-16 canny controlnet model without lowvram flag when I couldn't do it before.

So, here it is:
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128
set COMMANDLINE_ARGS=--medvram --always-batch-cond-uncond --precision full --no-half --opt-split-attention --opt-sub-quad-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --disable-nan-check --use-cpu interrogate gfpgan codeformer --upcast-sampling --autolaunch --api
set SAFETENSORS_FAST_GPU=1

I'm not entirely sure if PYTORCH_CUDA_ALLOC_CONF actually work, maybe it's a placebo. Requires more testing. But the log doesn't complain.

Also, if you want to generate really big images, use this (works for me) or that (doesn't work for me, but seems like objectively a better option) extension. In general most modern models are trained at 768x768 and don't handle well anything higher than 1024 pixels.

@Chocobollitoo
Copy link

hi. I just tried to generate some images and when the IA its close to end generating, this error jumps and i can't see any images that i could have generated, just the same error as this issue title. any idea?

@lshqqytiger
Copy link
Owner

That error is same as OOM (Out of Memory).
The resolution or batch size of image you tried to generate may be too large.
(DirectML does not support cleaning useless memory yet)

@Chocobollitoo
Copy link

512x512 is too large for generating? i didnt know that

@lshqqytiger
Copy link
Owner

It depends on available size of vram your gpu has.
Add --opt-sub-quad-attention or --medvram or both.

@Chocobollitoo
Copy link

added both and nothing 🤷, my gpu is a rx 6600

@lshqqytiger
Copy link
Owner

lshqqytiger commented Apr 13, 2023

My RX 5700 XT can generate 512x768 with hires fix x1.5 when I turned off everything without webui and necessary processes.
I uses --no-half --precision full --opt-sub-quad-attention.

@Neoony
Copy link

Neoony commented Apr 13, 2023

Try these
--precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check

That at least helped me to generate bigger res or have it stay without error for much longer (but it might still crash)
Mainly these 2
--opt-split-attention-v1
enable older version of split attention optimization that does not consume all the VRAM it can find

--opt-sub-quad-attention
enable memory efficient sub-quadratic cross-attention layer optimization

@lshqqytiger lshqqytiger added the directml DirectML related or specific issue label Apr 14, 2023
@strykenyne
Copy link

Some cards have their own quirks, search the mentions of your card in the discussions. The latest collection of arguments the community have come with is set COMMANDLINE_ARGS=--medvram --precision full --no-half --no-half-vae --opt-split-attention-invokeai --always-batch-cond-uncond --opt-sub-quad-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --disable-nan-check --upcast-sampling set SAFETENSORS_FAST_GPU=1

Thank you so much for sharing this. I used to be only able to do 512 x 512 images at 20 steps max before I would get out of VRAM. Now I'm doing 1024 x 768 at 50 steps... 1024 x 1024 still puts me out of VRAM though, but hey, it's a major improvement! :D

@Nathan-dm
Copy link

Nathan-dm commented Apr 18, 2023

i tried every argument in this issue, none of them work with any resolution image or any sampler, tried to change --medvramwith --lowvram, some argument work but maximum able to generate is 592x600 with --lowvram argument added.
my system spec:
Acer Swift 3X
I5 1135G7
16 GB Ram
Intel xe graphic 80eu (shared memory)
intel xe max (4gb vram)

@Miraihi
Copy link

Miraihi commented Apr 18, 2023

but maximum able to generate is 592x600 with --lowvram argument added.

The only workaround to reach higher resolutions for now is using img2img Ultimate upscaler script.

@tornado73
Copy link

tornado73 commented Apr 19, 2023

install a second system ubuntu

ubunta 2004 + 6800
Total progress: 100%|███████████████████████████| 26/26 [00:03<00:00, 7.46it/s] 512512
Total progress: 100%|███████████████████████████| 20/20 [00:05<00:00, 3.79it/s] 640
640
Total progress: 100%|███████████████████████████| 20/20 [00:08<00:00, 2.37it/s] 768768
Total progress: 100%|███████████████████████████| 20/20 [00:26<00:00, 1.33s/it] 1024
1024
Total progress: 100%|███████████████████████████| 20/20 [01:58<00:00, 5.93s/it] 1280*1280

when generating 768 * 768, memory is spent 6.7 GB from 16 ,together with the system and two browsers that consume 3gb in idle time

Screenshot from 2023-04-20 00-06-27

one thing but ,you have to put it manually, auto-installation from the topic does not work
other instructions are outdated

this is my compilation from different sources, tested on my 6800 :-)
it seems that it is complicated and cumbersome, but it is only an hour of your time
you sit more waiting for the generation :-)

install ubuntu 20.04
start terminal
and let's go

sudo apt update
sudo apt install wget gnupg2 gawk curl
sudo apt install libnuma-dev libncurses5
sudo reboot
sudo usermod -a -G video <username>
sudo usermod -a -G render <username>
sudo apt update
wget https://repo.radeon.com/amdgpu-install/5.4.2/ubuntu/focal/amdgpu-install_5.4.50402-1_all.deb
sudo apt-get install ./amdgpu-install_5.4.50402-1_all.deb
sudo amdgpu-install --usecase=rocm,hip,mllib --no-dkms

you can put 5.4.3 but everything suits me

rocminfo

name: gfx1030 --ok

sudo reboot
sudo apt-get install python3
alias python=python3
nano ~/.bashrc

add

alias python=python3
export HSA_OVERRIDE_GFX_VERSION=10.3.0

28021120

save x - y - enter

sudo apt install python3-venv
sudo reboot
sudo apt-get install git
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
python -m venv venv
sudo apt install python3-pip
python -m pip install --upgrade pip wheel
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
pip list

if pytorch 2.0.0+5.4.2---ok

python launch.py

if the cards are not the 6000th line

try
python launch.py --precision full –no-half

or

python launch.py --precision full --no-half –medvram

add in webui-user.sh

export HSA_OVERRIDE_GFX_VERSION=10.3.0
python -m venv venv
source venv/bin/activate

Screenshot+from+2023-04-16+12-59-32

save

launch - double click webui.sh - run in terminal

increase in generation rate in 0.33 it/s -win vs 6.8-7 it/s ubuntu 512*512
it's worth it -)
and no memory leaks and crashes -)

Good luck!

lshqqytiger pushed a commit that referenced this issue Apr 23, 2023
@hellozhaoming
Copy link

hellozhaoming commented Apr 24, 2023

the sd webui in win10 will work after I shutdown wsl, even if sd didn't free the memory. wsl and directml in win may not work together.

@justanothernguyen
Copy link

justanothernguyen commented Apr 24, 2023

All of you gays think this is beacuse the GPU memory is too small or the image size is too large.

Maybe read the discussion again...?

We know the issue is from DirectML not releasing memory, so by cutting down memory usage in the first place, DirectML also hog less memory. Think step 1 taking 1GB of VRAM instead of 1.5GB, etc... you will be able to go 12 steps instead of 8.

Also optimizing memory is the only actionable the mass can do. Or are you suggesting everyone to go fixing DirectML instead?

@Neoony
Copy link

Neoony commented Jun 28, 2023

try to delete the venv folder and then run the bat again
I guess it might be good to do that for any update

or maybe also the repositories folder, but not sure if that's needed

@waldolin
Copy link

waldolin commented Jun 28, 2023

Error code: 128 错误代码:128
stderr: fatal: reference is not a tree: c9fe758757e022f05ca5a53fa8fac28889e4f1cf

i delete the venv, run the bat again
but it is the same

venv "C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Version:
Commit hash: 06296ff
Fetching updates for K-diffusion...
Checking out commit for K-diffusion with hash: c9fe758757e022f05ca5a53fa8fac28889e4f1cf...
Traceback (most recent call last):
File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 38, in
main()
File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 29, in main
prepare_environment()
File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 304, in prepare_environment
git_clone(k_diffusion_repo, repo_dir('k-diffusion'), "K-diffusion", k_diffusion_commit_hash)
File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 145, in git_clone
run(f'"{git}" -C "{dir}" checkout {commithash}', f"Checking out commit for {name} with hash: {commithash}...", f"Couldn't checkout commit {commithash} for {name}")
File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 102, in run
raise RuntimeError("\n".join(error_bits))
RuntimeError: Couldn't checkout commit c9fe758757e022f05ca5a53fa8fac28889e4f1cf for K-diffusion.
Command: "C:\Users\lin\stable-diffusion-webui\git\cmd/git.exe" -C "C:\Users\lin\stable-diffusion-webui-directml\repositories\k-diffusion" checkout c9fe758757e022f05ca5a53fa8fac28889e4f1cf
Error code: 128
stderr: fatal: reference is not a tree: c9fe758757e022f05ca5a53fa8fac28889e4f1cf

the massage shows
Creating venv in directory C:\Users\lin\stable-diffusion-webui-directml\venv using python "C:\Users\lin\stable-diffusion-webui\python\python.exe"
venv "C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Version:
Commit hash: 06296ff
Installing torch and torchvision
Collecting torch==2.0.0
Using cached torch-2.0.0-cp310-cp310-win_amd64.whl (172.3 MB)
Collecting torchvision==0.15.1
Using cached torchvision-0.15.1-cp310-cp310-win_amd64.whl (1.2 MB)
Collecting torch-directml
Using cached torch_directml-0.2.0.dev230426-cp310-cp310-win_amd64.whl (8.2 MB)
Collecting sympy
Using cached sympy-1.12-py3-none-any.whl (5.7 MB)
Collecting networkx
Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
Collecting jinja2
Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting filelock
Downloading filelock-3.12.2-py3-none-any.whl (10 kB)
Collecting typing-extensions
Using cached typing_extensions-4.6.3-py3-none-any.whl (31 kB)
Collecting numpy
Using cached numpy-1.25.0-cp310-cp310-win_amd64.whl (15.0 MB)
Collecting requests
Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting pillow!=8.3.*,>=5.3.0
Using cached Pillow-9.5.0-cp310-cp310-win_amd64.whl (2.5 MB)
Collecting MarkupSafe>=2.0
Using cached MarkupSafe-2.1.3-cp310-cp310-win_amd64.whl (17 kB)
Collecting certifi>=2017.4.17
Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Collecting charset-normalizer<4,>=2
Using cached charset_normalizer-3.1.0-cp310-cp310-win_amd64.whl (97 kB)
Collecting urllib3<3,>=1.21.1
Using cached urllib3-2.0.3-py3-none-any.whl (123 kB)
Collecting idna<4,>=2.5
Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting mpmath>=0.19
Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Installing collected packages: mpmath, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, torch, torchvision, torch-directml
Successfully installed MarkupSafe-2.1.3 certifi-2023.5.7 charset-normalizer-3.1.0 filelock-3.12.2 idna-3.4 jinja2-3.1.2 mpmath-1.3.0 networkx-3.1 numpy-1.25.0 pillow-9.5.0 requests-2.31.0 sympy-1.12 torch-2.0.0 torch-directml-0.2.0.dev230426 torchvision-0.15.1 typing-extensions-4.6.3 urllib3-2.0.3

[notice] A new release of pip available: 22.2.2 -> 23.1.2
[notice] To update, run: C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\python.exe -m pip install --upgrade pip
Installing gfpgan
Installing clip
Installing open_clip
Fetching updates for K-diffusion...
Checking out commit for K-diffusion with hash: c9fe758757e022f05ca5a53fa8fac28889e4f1cf...
Traceback (most recent call last):
File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 38, in
main()
File "C:\Users\lin\stable-diffusion-webui-directml\launch.py", line 29, in main
prepare_environment()
File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 304, in prepare_environment
git_clone(k_diffusion_repo, repo_dir('k-diffusion'), "K-diffusion", k_diffusion_commit_hash)
File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 145, in git_clone
run(f'"{git}" -C "{dir}" checkout {commithash}', f"Checking out commit for {name} with hash: {commithash}...", f"Couldn't checkout commit {commithash} for {name}")
File "C:\Users\lin\stable-diffusion-webui-directml\modules\launch_utils.py", line 102, in run
raise RuntimeError("\n".join(error_bits))
RuntimeError: Couldn't checkout commit c9fe758757e022f05ca5a53fa8fac28889e4f1cf for K-diffusion.
Command: "C:\Users\lin\stable-diffusion-webui\git\cmd/git.exe" -C "C:\Users\lin\stable-diffusion-webui-directml\repositories\k-diffusion" checkout c9fe758757e022f05ca5a53fa8fac28889e4f1cf
Error code: 128
stderr: fatal: reference is not a tree: c9fe758757e022f05ca5a53fa8fac28889e4f1cf

@Neoony
Copy link

Neoony commented Jun 28, 2023

and try to delete repository folder?

@waldolin
Copy link

waldolin commented Jun 28, 2023

and try to delete repository folder?并尝试删除存储库文件夹?
I Delete repository
it works, thank you
but I have a new problem about restarting my pc when i generate
nothing like this happened in the past when generating
what can i do?

it happen only one time.
it works normally now.


venv "C:\Users\lin\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Version:
Commit hash: 06296ff
Installing requirements

No module 'xformers'. Proceeding without it.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
If submitting an issue on github, please provide the full startup log for debugging purposes.

Initializing Dreambooth
Dreambooth revision: dc413a14379b165355502d9f65856c40a4bb5b6f
Successfully installed accelerate-0.19.0 fastapi-0.94.1 gitpython-3.1.31 transformers-4.29.2

Does your project take forever to startup?
Repetitive dependency installation may be the reason.
Automatic1111's base project sets strict requirements on outdated dependencies.
If an extension is using a newer version, the dependency is uninstalled and reinstalled twice every startup.

[!] xformers NOT installed.
[+] torch version 2.0.0 installed.
[+] torchvision version 0.15.1 installed.
[+] accelerate version 0.19.0 installed.
[+] diffusers version 0.16.1 installed.
[+] transformers version 4.29.2 installed.
[+] bitsandbytes version 0.35.4 installed.

Launching Web UI with arguments: --lowvram --precision full --no-half --no-half-vae --opt-sub-quad-attention --enable-insecure-extension-access --deepdanbooru --disable-nan-check --backend directml
C:\Users\lin\stable-diffusion-webui-directml\venv\lib\site-packages\pkg_resources_init_.py:123: PkgResourcesDeprecationWarning: llow is an invalid version and will not be supported in a future release
warnings.warn(
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
[AddNet] Updating model hashes...
0it [00:00, ?it/s]
[AddNet] Updating model hashes...
0it [00:00, ?it/s]
2023-06-28 22:26:15,122 - ControlNet - INFO - ControlNet v1.1.227
ControlNet preprocessor location: C:\Users\lin\stable-diffusion-webui-directml\extensions\sd-webui-controlnet\annotator\downloads
2023-06-28 22:26:15,274 - ControlNet - INFO - ControlNet v1.1.227
Loading weights [fc2511737a] from C:\Users\lin\stable-diffusion-webui-directml\models\Stable-diffusion\chilloutmix_NiPrunedFp32Fix.safetensors
Creating model from config: C:\Users\lin\stable-diffusion-webui-directml\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Textual inversion embeddings loaded(0):
Model loaded in 3.6s (load weights from disk: 0.3s, create model: 0.6s, apply weights to model: 2.6s, load VAE: 0.1s).
Applying optimization: sub-quadratic... done.
CUDA SETUP: Loading binary C:\Users\lin\stable-diffusion-webui-directml\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().

my PC
MSI B660M
RX-6650XT,
64G RAM

@liquiddandruff
Copy link

liquiddandruff commented Jul 9, 2023

@Neoony AUTOMATIC11111 1.3 web ui merge happened. Now there's new option called "Optimizations" where you can choose --opt variations without adding them into webui-user.bat (If you're wondering - yes, only one --opt argument can be used. You can't choose both --opt-split-attention-v1 and --opt-sub-quad-attention). Also, well, token merging happened. You can use my settings. These ain't too bad, the performance boost is significant (You can't use sub-quad-opt with token merging though. You'll be getting black images left and right.). But 1.3 broke the memory management even more, to the point when I can't even use any ControlNet model. So I run vladmandic-directml version in parallel when I have to deal with controlnet. Also, live preview now works properly, and there's the brand new, performance-efficient and good-looking method.

Thanks for this info. I was at 3284ccc, and now after update, I also got the super slow generation of ~7 seconds per iteration.

After copying your optimization settings I am able to return to similar generation speeds as before of ~2 iterations per sec.

However I find like you that all ControlNet attempts now fail with GPU OOM errors (I can't upscale, not even with tiled upscaling etc).

@Miraihi
Copy link

Miraihi commented Jul 9, 2023

However I find like you that all ControlNet attempts now fail with GPU OOM errors (I can't upscale, not even with tiled upscaling etc).

True, Controlnet is unusable in the current commit, no matter the model or resolution. Also the size of the possible image sizes has decreased in general (Can't render anything in 1024p anymore). But at least token merging and Negative Guidance minimum sigma bring a massive speedup. But I use Stable Diffusion primarily for inpainting so the inability to use ControlNet is not that critical. I keep the earlier version just in case I need it.

@Booty3ater900
Copy link

man stable diffusions a bitch

@Miraihi
Copy link

Miraihi commented Jul 11, 2023

I urge you all to try vladmandic/automatic. It has a functioning ControlNet and a ton of settings not seen in the classic branch. It's multiplatform right now and can be run in Rocm and DirectMl modes.

@FrakerKill
Copy link

I urge you all to try vladmandic/automatic. It has a functioning ControlNet and a ton of settings not seen in the classic branch. It's multiplatform right now and can be run in Rocm and DirectMl modes.

But supports AMD GPUs?

@Miraihi
Copy link

Miraihi commented Jul 11, 2023

I repeat, it's multiplatform and can be run in Rocm and DirectMl modes. That implies that AMD cards are supported. Using it right now myself.

@Grathew
Copy link

Grathew commented Jul 13, 2023

I am getting this error when doing image generation. It seems like memory isn't being released after image creation.

@Eleiyas
Copy link

Eleiyas commented Jul 18, 2023

Inside the webui_user.bat:

set COMMANDLINE_ARGS=--medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check
set SAFETENSORS_FAST_GPU=1

Works for me with a 6800XT card (16GB).
I can now actually generate stuff over 512*512 without it immediately crashing. Still get some issues, but I can generate 10s of images before it even thinks of being weird.

Any of the other commandline args I see other people use make the program completely hang and refuse to generate anything, so if you have the same card as I do, just use what I put in the codeblock ;)

@Miraihi
Copy link

Miraihi commented Jul 18, 2023

After the token merging update you pretty much have to set token merging to about 0.5 and negative segma to about 3 (Optimizations tab in Options). Gives a great boost in performance and memory efficency without sacrificing much. But you can't use the subquadratic optimization with token merging.

@Grathew
Copy link

Grathew commented Jul 19, 2023 via email

@Grathew
Copy link

Grathew commented Jul 19, 2023 via email

@Miraihi
Copy link

Miraihi commented Jul 19, 2023

@Grathew I mentioned that you can't use sub-quad-attention with token merging. Choose Doggetx or V1. If you're not aware, only one of --opt arguments can be active at the time. Having several of them just keep only one of them active and others inactive.

@upright2003
Copy link

I reinstalled the new version of AMD DRIVER, and the pictures can appear normally with a resolution of 960X540 and my graphics card RX5500XT 4GB.
But it is almost impossible to use the Hires. fix function. How to set it?

@evascroll
Copy link

evascroll commented Oct 5, 2023

I just fix mine using this arg(set COMMANDLINE_ARGS= --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check) I'm been trying to generate for a couple days with out any progress, after adding the arg , i now can generate 512x512 hire fix 1024x1024(upscale by 2)8 count, up to 50 step, using 3 control net no problem, im using the latest amd driver 23.9.3,lates chipset, spec( windows 11,cpu amd 5800x-gpu asus dual 6700xt oc- 32g ram) controlnet 1.1.410! a1111 fork from lshqqytiger , check point 1.5, 2.0, 2.1(sdxl no luck, still testing) hope it help

@AuroraTheDragon
Copy link

First - the arguments. Second - not sure what's the maximum resolution your GPU is capable of. I can generate a maximum of 600x800 on my RX 580 (8Gb) with arguments --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check.

I am having a similar issue. I have an RX 580 that has 8gb of vram, and 2x16gb ram. About 5 days ago, I was still able to generate images well above 768x512, and can even upscale it to around... 4x I believe, no issues at all. But all of a sudden, yesterday, it just stopped working, claiming that I don't have enough GPU video memory available. I tried uninstalling everything, (python 3.10.6, git, and stable diffusion), and then reinstalled everything. It still didn't work. I'm really hoping that this isn't a graphics card problem, which I think it really isn't because, I can run triple A games pretty smoothly without crashes or anything, so maybe it has something to do with Stable Diffusion's latest updates and all that.

@Sepacc
Copy link

Sepacc commented Nov 19, 2023

i have rx580/8gb and 2x8 ram, tried arguments mentioned before and it works kinda well for me (at least i can generate 600x800 now, in the past i was getting an error every 2-3 512x512 images and on 600x800 it was straight error). Also im using official SD.NEXT if its important

@Menober
Copy link

Menober commented Nov 29, 2023

same error with memory allocation. No way to chunk this data?

@FrancoContegni
Copy link

I can't find the webui-user.bat file

@thedevtechs
Copy link

thedevtechs commented Dec 4, 2023

First - the arguments. Second - not sure what's the maximum resolution your GPU is capable of. I can generate a maximum of 600x800 on my RX 580 (8Gb) with arguments --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check.

Thank you!! Got me up & running on my AMD RX 6600 (finally)

@thisisnotreal459
Copy link

my 6800 ,win 11 pro, 22H2 Adrenalin Edition 23.4.1

1, it is important for me - folder SD is in the root of drive C 2 .Open CMD in the root of the directory stable-diffusion-webui-directml.

git pull to ensure latest update pip install -r requirements.txt

<- it was at this point I knew I effed up during initial setup because I saw several missing items getting installed. 3 For the webui-user.bat file, I added the following line set COMMANDLINE_ARGS=--medvram --precision full --no-half --no-half-vae --opt-sub-quad-attention --opt-split-attention --opt-split-attention-v1 --disable-nan-check --autolaunch

result 1024*1024

euler a ---- MAX 26/26 [01:16<00:00, 2.96s/it] dpm++2m karras -----MAX 26/26 [02:19<00:18, 6.05s/it]

with my trained model .cpkl

3

model deliberate_v2 .safetensors 1024x1280 DPM++2m Karras ----- max 26/26 [01:50<00:00, 4.24s/it]

I usually generate 440 * 640, 4 pictures each and then the necessary upscale from Topaz Photo AI

Good luck

p.s. 1280*1280 RuntimeError: Could not allocate tensor with 377487360 bytes. There is not enough GPU video memory available! -)))

So I went ahead and tried this solution, and it was after I did "pip install -r requirements.txt" step when things went wrong for me. Now whenever I run webui-user.bat it spits out this:

venv "M:\Program Files\Stable Diffusion\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: 1.7.0
Commit hash: cfa6e40e6d7e290b52940253bf705f282477b890
Traceback (most recent call last):
  File "M:\Program Files\Stable Diffusion\stable-diffusion-webui-directml\launch.py", line 48, in <module>
    main()
  File "M:\Program Files\Stable Diffusion\stable-diffusion-webui-directml\launch.py", line 39, in main
    prepare_environment()
  File "M:\Program Files\Stable Diffusion\stable-diffusion-webui-directml\modules\launch_utils.py", line 560, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
Press any key to continue . . .

@Sairu9
Copy link

Sairu9 commented Feb 21, 2024

i followed this tutorial :
https://www.youtube.com/watch?v=mKxt0kxD5C0&t=1087s&ab_channel=FE-Engineer

and then added For the webui-user.bat file, I added the following line set
COMMANDLINE_ARGS=--use-directml --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check

when i add --medvram --precision full --no-half --no-half-vae --opt-split-attention-v1 --opt-sub-quad-attention --disable-nan-check prompt its only working in 1.5 models but not in XL models.Adding the prompt speeds up the generation significantly.but losing the xl models.

rx 6800 gpu

@lshqqytiger lshqqytiger closed this as not planned Won't fix, can't repro, duplicate, stale Feb 27, 2024
@JonathanDiez
Copy link

added both and nothing 🤷, my gpu is a rx 6600

U fixed this?

@CoolnJuicy
Copy link

For me it was it the simple combo of adding --medvram to the bat file and checking the LVRAM box in Controlnet. I installed Controlnet last night. Come morning I was getting the OP's error. This worked.

Ryzen 3600, RX580,16G.

@zmsoft
Copy link

zmsoft commented May 21, 2024

Does a low-memory graphics card can running only on the CPU?
AMD RX 550 2G
set COMMANDLINE_ARGS=--use-directml --lowvram --opt-split-attention --enable-insecure-extension-access --skip-torch-cuda-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
directml DirectML related or specific issue
Projects
None yet
Development

No branches or pull requests