-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting stuck at "Distributed - injecting images 100%" #28
Comments
Could you add Also, yes, you're using it just fine as long as the compute device for each instance is different. |
Thanks for the reply. I ran both instances with this is the output of the log on the web-ui:
I've not encountered this issue before using the extension, and it goes away if I disable it when generating. |
How long was your main instance running until you encountered this? Also, did you happen to use the interrupt/skip button on a request in the web-ui at any point before this happened? |
I had just restarted both instances, so newly started. I didn't interrupt the generation. I tried with no prior generation to the distributed one, and with one on each instance beforehand; same result. D: |
When you say restarted you mean you fully stopped sdwui and restarted both? (didn't use the restart button built into the web interface?) |
Correct, stopped and started again. |
So this is happening everytime first try for you after rebooting sdwui? |
Yes, every time I try generating an image with the Distributed checkbox ticked, it will happen, regardless if it's the first image generated on the instance or not. |
What seems most likely is that |
Could you add this debug statement |
Could you also post your |
I tried generating one image.
Slave:
Config: {
"workers": [
{
"master": {
"avg_ipm": 3.468163269192995,
"master": true,
"address": "0.0.0.0",
"port": 7860,
"eta_percent_error": [],
"tls": false,
"state": 1,
"user": null,
"password": null,
"pixel_cap": -1
}
},
{
"slave": {
"avg_ipm": 118.3148549027504,
"master": false,
"address": "0.0.0.0",
"port": 7861,
"eta_percent_error": [],
"tls": false,
"state": 1,
"user": "None",
"password": "None",
"pixel_cap": -1
}
}
],
"benchmark_payload": {
"prompt": "A herd of cows grazing at the bottom of a sunny valley",
"negative_prompt": "",
"steps": 20,
"width": 512,
"height": 512,
"batch_size": 1
},
"job_timeout": 3,
"enabled": true,
"complement_production": true
} |
What happened to your speed ratings btw? Before it showed both of your instances were going at about 3 ipm but now the slave is at around 120 ipm? |
It's strange, it sometimes shows something normal like 3, but usually the slave is really high at about 120ipm. I wish it could do 120ipm haha. |
Can you let me know if this also happens consistently on 424a1c8 |
Just tested, and on that commit I cannot generate anything with the extension enabled. :( It hangs like this. master:
slave:
WebUI extension log:
It seems that it doesn't send the command to the slave instance? |
Does this happen with no other extensions enabled (builtin ones should be fine)? Also can you post your extension list of what you've been using. |
This is the only non-builtin extension I'm using, and it works fine if I disable it. |
In that case:
Were you reloading the config yourself multiple times in a row? At least initially it looks like your slave instance wasn't running yet as you were getting a connection refused error. |
Remove the extension from the slave worker, you only need it installed on the main instance. If you're using the same installation root for more than one instance this means that you probably need to use sdwui's command line options so you can force it to use a separate config that has the extension disabled for the slave instance. You can see on the slave instance it's trying to connect back to itself as a worker (since the port is the same), this shouldn't happen. |
Thank you - I was able to get it working by adding But I am now facing an issue where my slave instance is seemingly running out of VRAM when generating using the extension. Low sampling steps seem to work, but anything higher than ~10-15 seems to cause it to run out of VRAM. It's strange because it works just fine if I am to generate through its own web-ui, at any number of sampling steps, even higher resolutions. Do you think this be an issue with how the extension spreads the workload? |
The problem in this case is that your slave's ipm was ending up at around 120 for some reason (3 ipm like before sounds about right). The best thing to do would be to rebenchmark or manually adjust that ipm in the config. Then, the distribution logic should split your requests about evenly and this should be far less of an issue. |
If there are still issues let me know and I can reopen the issue. |
After another benchmark and some restarting I was able to get it to work. Thank you for your time and help! |
Hi, I first want to thank you for this project. I'm running into an issue where after finishing to generate the image, stable diffusion gets stuck at: Distributed - injecting images 100%.
I currently have 2 GPUs installed. One instance is running on device 0, the other on device 1 and I can confirm they are both being used through
nvtop
andnvidia-smi
. Both instances are ran from the same folder, on different ports; unsure if this is how it's supposed to be used.I have installed the extension and judging by the log, it seems to work. It generates an image, but gets stuck at the mentioned status. There are no errors I can see. When that status is displayed, the log reports the 2nd instance is idle.
Am I doing something wrong? If so, can you expand a bit on what the proper usage of this extension looks like? Please let me know if you need more information. Thank you.
Output from the main instance:
The text was updated successfully, but these errors were encountered: