Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support more than one simultaneous GPU/worker #38

Closed
ssube opened this issue Jan 13, 2023 · 8 comments · Fixed by #91
Closed

support more than one simultaneous GPU/worker #38

ssube opened this issue Jan 13, 2023 · 8 comments · Fixed by #91
Labels
status/fixed issues that have been fixed and released type/feature new features
Milestone

Comments

@ssube
Copy link
Owner

ssube commented Jan 13, 2023

Even with the thread limit from #15, making multiple requests to generate images even with the same pipeline and model will likely crash the API:

reusing existing pipeline                                                                                                                                                                       
10.2.2.16 - - [13/Jan/2023 10:34:50] "POST /api/txt2img?cfg=6.0&steps=25&model=stable-diffusion-onnx-v1-5&platform=amd&scheduler=euler-a&seed=-1&prompt=a+stone+magnifying+glass,+a+wooden+desk,
+steampunk,+realistic,+highly+detailed,+oil+painting&negativePrompt=&width=512&height=512 HTTP/1.1" 200 -                                                         
10.2.2.16 - - [13/Jan/2023 10:34:50] "GET /api/ready?output=txt2img_1200220748_9ff672bc153bad59ccd8bf173beb011cf170aaa3ef6f579713ab1ceaf5fbf04c.png HTTP/1.1" 200 -                             
                                                                                                                                                                                               2
023-01-13 10:34:50.0965716 [E:onnxruntime:, sequential_executor.cc:369 onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running Reshape node. Name:'/up_blocks.3/at
tentions.1/transformer_blocks.0/attn2/Reshape_2' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1971)\onnxruntime_pybind11_stat
e.pyd!00007FFAF7CC11AB: (caller: 00007FFAF82D0F7F) Exception(2) tid(c90) 8007023E {Application Error}                                                              
The exception %s (0x                                                                                                                                                                            
 13%|████████████████████▌                                                                                                                                     | 10/75 [00:04<00:28,  2.25it/s] 

(onnx_env) ssube@SSUBE-DESKTOP C:\Users\ssube\stabdiff\onnx-try-2\onnx-web\api>flask --app=onnx_web.serve run --host=0.0.0.0                                    | 1/25 [00:00<00:08,  2.74it/s]
 * Serving Flask app 'onnx_web.serve'
@ssube ssube added status/new issues that have not been confirmed yet type/bug broken features labels Jan 13, 2023
@ssube ssube added this to the v0.5 milestone Jan 13, 2023
@ssube
Copy link
Owner Author

ssube commented Jan 19, 2023

This was a result of setting the number of background workers to a value greater than the number of GPUs I had available, causing multiple pipelines to run simultaneously and compete for resources. Reducing the number of workers to 1 fixes them, but does not allow more than 1 GPU to be used.

@ssube ssube removed this from the v0.5 milestone Jan 19, 2023
@ssube ssube added type/feature new features and removed type/bug broken features labels Jan 19, 2023
@ssube ssube changed the title prevent API from crashing during concurrent image requests support more than one simultaneous GPU/worker Jan 19, 2023
@ssube ssube added this to the v0.6 milestone Jan 22, 2023
@ssube ssube added status/progress issues that are in progress and have a branch and removed status/new issues that have not been confirmed yet labels Jan 31, 2023
@elliotcourant
Copy link

Would this require multiple GPUs? I might be able to help with that on one of my servers as long as the pipeline doesn't require full PCIe bandwidth for each GPU.

@ssube
Copy link
Owner Author

ssube commented Jan 31, 2023

@elliotcourant I want to test both cases: 2 different GPUs with different device IDs, and 1 GPU with enough memory for 2 models and 2 simultaneous workers. I don't think PCIe bandwidth matters as much for this as it does for gaming, since the model is usually loaded once and run many times, but I expect it will have some impact on those load times at least.

@ssube ssube added status/fixed issues that have been fixed and released and removed status/progress issues that are in progress and have a branch labels Feb 4, 2023
@ssube
Copy link
Owner Author

ssube commented Feb 4, 2023

This is mostly implemented, although I need to make sure all of the pipeline code is using .to(device).

2023-02-04T21:04:09.740185783Z [2023-02-04 21:04:09,739] INFO: onnx_web.serve: overriding default platform to cuda
2023-02-04T21:04:09.756684842Z [2023-02-04 21:04:09,756] INFO: onnx_web.serve: available acceleration platforms: cuda:0 - CUDAExecutionProvider, cuda:1 - CUDAExecutionProvider
2023-02-04T21:04:09.758146513Z [2023-02-04 21:04:09,757] INFO: onnx_web.device_pool: creating thread pool executor for 2 devices: ['cuda:0', 'cuda:1']

image

@ssube
Copy link
Owner Author

ssube commented Feb 4, 2023

This is sort of working, but not passing device_id through properly, so the jobs may land on the same CUDA node:

image

@ssube ssube reopened this Feb 4, 2023
@ssube
Copy link
Owner Author

ssube commented Feb 5, 2023

Passing session options requires the SessionOptions type:

2023-02-05T03:45:16.273561080Z [2023-02-05 03:45:16,273] WARNING: onnx_web.device_pool: job txt2img_2106418342_0f3f2e6aac66681d43af44d681f1418615afe68d35edd5254f6ca1148faa5b1a_1675568716.png failed with an error: __init__(): incompatible constructor arguments. The following argument types are supported:
2023-02-05T03:45:16.273582180Z     1. onnxruntime.capi.onnxruntime_pybind11_state.InferenceSession(arg0: onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions, arg1: str, arg2: bool, arg3: bool)
2023-02-05T03:45:16.273592541Z 
2023-02-05T03:45:16.273594891Z Invoked with: {'device_id': 1}, '/data/models/diffusion-openjourney/vae_decoder/model.onnx', True, False

Trying to run a second image causes the first job to crash with an error about an unwanted list of tensors:

2023-02-05T04:11:50.027730899Z 
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:11,  4.42it/s]The `scale_model_input` function should be called before `step` to ensure correct denoising. See `StableDiffusionPipeline` for a usage example.
2023-02-05T04:11:50.029684833Z [2023-02-05 04:11:50,029] WARNING: onnx_web.device_pool: job txt2img_1798687040_e1773f83e93b6ca71035044bbd97b805c05847788e9fa4b11ae866e02694cc82_1675570309.png failed with an error: only one element tensors can be converted to Python scalars

@ssube ssube added status/progress issues that are in progress and have a branch and removed status/fixed issues that have been fixed and released labels Feb 5, 2023
@ssube ssube modified the milestones: v0.6, v0.7 Feb 5, 2023
@ssube
Copy link
Owner Author

ssube commented Feb 5, 2023

The LPW pipeline from #27 is not aware of the provider_options either and needs to be patched:

Keyword arguments {'provider_options': None} are not expected by OnnxStableDiffusionLongPromptWeightingPipeline and will be ignored.

@ssube ssube removed this from the v0.7 milestone Feb 5, 2023
@ssube
Copy link
Owner Author

ssube commented Feb 15, 2023

There are still a few issues that can arise around reusing the same model on multiple workers, I'm not sure if the process worker pool would resolve those, or if I need to keep track of the device in the model cache. However, this is mostly working:

[2023-02-15 03:46:38,536] INFO: onnx_web.serve: request from 100.64.0.23: 25 rounds of EulerAncestralDiscreteScheduler using /data/models/diffusion-knollingcase on any device, 512x512, 6.0, 19957945
59 - an astronaut eating a hamburger                                                                                                                                                                  [2023-02-15 03:46:38,537] DEBUG: onnx_web.server.device_pool: pruning 0 of 1 pending jobs                                                                                                             
[2023-02-15 03:46:38,537] DEBUG: onnx_web.server.device_pool: jobs queued by device: [(0, 2), (1, 1)]                                                                                          
[2023-02-15 03:46:38,537] INFO: onnx_web.server.device_pool: assigning job txt2img_1995794559_9cba334ddf90f4c92a7fcb2c6238394a942c86cc872efffdf2cb2345d44a84b5_1676432798.png to device 1: cuda - CUDA
ExecutionProvider ({'device_id': 1})                                                                                                                                                                  
[2023-02-15 03:46:38,540] DEBUG: onnx_web.server.device_pool: job txt2img_1995794559_9cba334ddf90f4c92a7fcb2c6238394a942c86cc872efffdf2cb2345d44a84b5_1676432798.png assigned to device cuda - CUDAExe
cutionProvider ({'device_id': 1})                                                                                                                                                                     
[2023-02-15 03:46:42,444] DEBUG: onnx_web.server.device_pool: setting progress for job txt2img_1457963384_7109d8eee4a2df3670903abff9e0436d18b02c8b1b9521139aba253c3f3f46ac_1676432789.png to 0 
[2023-02-15 03:46:47,117] INFO: onnx_web.serve: request from 100.64.0.23: 25 rounds of EulerAncestralDiscreteScheduler using /data/models/stable-diffusion-onnx-v1-5 on any device, 512x512, 6.0, 1923
654220 - an astronaut eating a hamburger                                                                                                                                                              
[2023-02-15 03:46:47,117] DEBUG: onnx_web.server.device_pool: pruning 0 of 2 pending jobs                                                                                                             
[2023-02-15 03:46:47,117] DEBUG: onnx_web.server.device_pool: jobs queued by device: [(0, 2), (1, 2)]                                                                                          
[2023-02-15 03:46:47,117] INFO: onnx_web.server.device_pool: assigning job txt2img_1923654220_ce18927320e7397fbc88a7ef6a968386d80f20a7eda45122b492743f4dc44934_1676432807.png to device 0: cuda - CUDA
ExecutionProvider ({'device_id': 0})

image

One of the possible cache errors:

[2023-02-15 03:46:29,413] WARNING: onnx_web.server.device_pool: job txt2img_1277851170_17596a8e8d7a8d6b0d8956120e6e1adbb7253014ae50f17f4368f71373573ad3_1676432780.png failed with an error: ['Traceba
ck (most recent call last):\n', '  File "/onnx-web/api/onnx_web/server/device_pool.py", line 243, in job_done\n    f.result()\n', '  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, 
in result\n    return self.__get_result()\n', '  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result\n    raise self._exception\n', '  File "/usr/lib/python3.8/concurren
t/futures/thread.py", line 57, in run\n    result = self.fn(*self.args, **self.kwargs)\n', '  File "/onnx-web/api/onnx_web/diffusion/run.py", line 58, in run_txt2img_pipeline\n    result = pipe(\n',
 '  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion.py", line 296, in __call__\n    [self.vae_decoder(latent_sample=laten
ts[i : i + 1])[0] for i in range(latents.shape[0])]\n', '  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion.py", line 296,
 in <listcomp>\n    [self.vae_decoder(latent_sample=latents[i : i + 1])[0] for i in range(latents.shape[0])]\n', '  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/diffusers/pipelines/onnx_
utils.py", line 61, in __call__\n    return self.model.run(None, inputs)\n', '  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 2
00, in run\n    return self._sess.run(output_names, input_feed, run_options)\n', 'onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : /onnxruntime_src/onnxruntime/core/
providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cublasStatu
s_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntim
e::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common
::Status> = void] CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED ; GPU=0 ; hostname=a0bd396b64fd ; expr=cublasCreate(&cublas_handle_); \n\n\n']

@ssube ssube added this to the v0.7 milestone Feb 15, 2023
@ssube ssube added status/fixed issues that have been fixed and released and removed status/progress issues that are in progress and have a branch labels Feb 15, 2023
@ssube ssube mentioned this issue Feb 15, 2023
51 tasks
@ssube ssube closed this as completed Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/fixed issues that have been fixed and released type/feature new features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants