support more than one simultaneous GPU/worker #38

ssube · 2023-01-13T16:37:38Z

Even with the thread limit from #15, making multiple requests to generate images even with the same pipeline and model will likely crash the API:

reusing existing pipeline                                                                                                                                                                       
10.2.2.16 - - [13/Jan/2023 10:34:50] "POST /api/txt2img?cfg=6.0&steps=25&model=stable-diffusion-onnx-v1-5&platform=amd&scheduler=euler-a&seed=-1&prompt=a+stone+magnifying+glass,+a+wooden+desk,
+steampunk,+realistic,+highly+detailed,+oil+painting&negativePrompt=&width=512&height=512 HTTP/1.1" 200 -                                                         
10.2.2.16 - - [13/Jan/2023 10:34:50] "GET /api/ready?output=txt2img_1200220748_9ff672bc153bad59ccd8bf173beb011cf170aaa3ef6f579713ab1ceaf5fbf04c.png HTTP/1.1" 200 -                             
                                                                                                                                                                                               2
023-01-13 10:34:50.0965716 [E:onnxruntime:, sequential_executor.cc:369 onnxruntime::SequentialExecutor::Execute] Non-zero status code returned while running Reshape node. Name:'/up_blocks.3/at
tentions.1/transformer_blocks.0/attn2/Reshape_2' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1971)\onnxruntime_pybind11_stat
e.pyd!00007FFAF7CC11AB: (caller: 00007FFAF82D0F7F) Exception(2) tid(c90) 8007023E {Application Error}                                                              
The exception %s (0x                                                                                                                                                                            
 13%|████████████████████▌                                                                                                                                     | 10/75 [00:04<00:28,  2.25it/s] 

(onnx_env) ssube@SSUBE-DESKTOP C:\Users\ssube\stabdiff\onnx-try-2\onnx-web\api>flask --app=onnx_web.serve run --host=0.0.0.0                                    | 1/25 [00:00<00:08,  2.74it/s]
 * Serving Flask app 'onnx_web.serve'

The text was updated successfully, but these errors were encountered:

ssube · 2023-01-19T00:56:05Z

This was a result of setting the number of background workers to a value greater than the number of GPUs I had available, causing multiple pipelines to run simultaneously and compete for resources. Reducing the number of workers to 1 fixes them, but does not allow more than 1 GPU to be used.

elliotcourant · 2023-01-31T15:35:05Z

Would this require multiple GPUs? I might be able to help with that on one of my servers as long as the pipeline doesn't require full PCIe bandwidth for each GPU.

ssube · 2023-01-31T16:25:29Z

@elliotcourant I want to test both cases: 2 different GPUs with different device IDs, and 1 GPU with enough memory for 2 models and 2 simultaneous workers. I don't think PCIe bandwidth matters as much for this as it does for gaming, since the model is usually loaded once and run many times, but I expect it will have some impact on those load times at least.

ssube · 2023-02-04T21:05:58Z

This is mostly implemented, although I need to make sure all of the pipeline code is using .to(device).

2023-02-04T21:04:09.740185783Z [2023-02-04 21:04:09,739] INFO: onnx_web.serve: overriding default platform to cuda
2023-02-04T21:04:09.756684842Z [2023-02-04 21:04:09,756] INFO: onnx_web.serve: available acceleration platforms: cuda:0 - CUDAExecutionProvider, cuda:1 - CUDAExecutionProvider
2023-02-04T21:04:09.758146513Z [2023-02-04 21:04:09,757] INFO: onnx_web.device_pool: creating thread pool executor for 2 devices: ['cuda:0', 'cuda:1']

ssube · 2023-02-04T23:56:37Z

This is sort of working, but not passing device_id through properly, so the jobs may land on the same CUDA node:

ssube · 2023-02-05T04:21:57Z

Passing session options requires the SessionOptions type:

2023-02-05T03:45:16.273561080Z [2023-02-05 03:45:16,273] WARNING: onnx_web.device_pool: job txt2img_2106418342_0f3f2e6aac66681d43af44d681f1418615afe68d35edd5254f6ca1148faa5b1a_1675568716.png failed with an error: __init__(): incompatible constructor arguments. The following argument types are supported:
2023-02-05T03:45:16.273582180Z     1. onnxruntime.capi.onnxruntime_pybind11_state.InferenceSession(arg0: onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions, arg1: str, arg2: bool, arg3: bool)
2023-02-05T03:45:16.273592541Z 
2023-02-05T03:45:16.273594891Z Invoked with: {'device_id': 1}, '/data/models/diffusion-openjourney/vae_decoder/model.onnx', True, False

Trying to run a second image causes the first job to crash with an error about an unwanted list of tensors:

2023-02-05T04:11:50.027730899Z 
  0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:11,  4.42it/s]The `scale_model_input` function should be called before `step` to ensure correct denoising. See `StableDiffusionPipeline` for a usage example.
2023-02-05T04:11:50.029684833Z [2023-02-05 04:11:50,029] WARNING: onnx_web.device_pool: job txt2img_1798687040_e1773f83e93b6ca71035044bbd97b805c05847788e9fa4b11ae866e02694cc82_1675570309.png failed with an error: only one element tensors can be converted to Python scalars

ssube · 2023-02-05T21:42:26Z

The LPW pipeline from #27 is not aware of the provider_options either and needs to be patched:

Keyword arguments {'provider_options': None} are not expected by OnnxStableDiffusionLongPromptWeightingPipeline and will be ignored.

ssube · 2023-02-15T03:51:49Z

There are still a few issues that can arise around reusing the same model on multiple workers, I'm not sure if the process worker pool would resolve those, or if I need to keep track of the device in the model cache. However, this is mostly working:

[2023-02-15 03:46:38,536] INFO: onnx_web.serve: request from 100.64.0.23: 25 rounds of EulerAncestralDiscreteScheduler using /data/models/diffusion-knollingcase on any device, 512x512, 6.0, 19957945
59 - an astronaut eating a hamburger                                                                                                                                                                  [2023-02-15 03:46:38,537] DEBUG: onnx_web.server.device_pool: pruning 0 of 1 pending jobs                                                                                                             
[2023-02-15 03:46:38,537] DEBUG: onnx_web.server.device_pool: jobs queued by device: [(0, 2), (1, 1)]                                                                                          
[2023-02-15 03:46:38,537] INFO: onnx_web.server.device_pool: assigning job txt2img_1995794559_9cba334ddf90f4c92a7fcb2c6238394a942c86cc872efffdf2cb2345d44a84b5_1676432798.png to device 1: cuda - CUDA
ExecutionProvider ({'device_id': 1})                                                                                                                                                                  
[2023-02-15 03:46:38,540] DEBUG: onnx_web.server.device_pool: job txt2img_1995794559_9cba334ddf90f4c92a7fcb2c6238394a942c86cc872efffdf2cb2345d44a84b5_1676432798.png assigned to device cuda - CUDAExe
cutionProvider ({'device_id': 1})                                                                                                                                                                     
[2023-02-15 03:46:42,444] DEBUG: onnx_web.server.device_pool: setting progress for job txt2img_1457963384_7109d8eee4a2df3670903abff9e0436d18b02c8b1b9521139aba253c3f3f46ac_1676432789.png to 0 
[2023-02-15 03:46:47,117] INFO: onnx_web.serve: request from 100.64.0.23: 25 rounds of EulerAncestralDiscreteScheduler using /data/models/stable-diffusion-onnx-v1-5 on any device, 512x512, 6.0, 1923
654220 - an astronaut eating a hamburger                                                                                                                                                              
[2023-02-15 03:46:47,117] DEBUG: onnx_web.server.device_pool: pruning 0 of 2 pending jobs                                                                                                             
[2023-02-15 03:46:47,117] DEBUG: onnx_web.server.device_pool: jobs queued by device: [(0, 2), (1, 2)]                                                                                          
[2023-02-15 03:46:47,117] INFO: onnx_web.server.device_pool: assigning job txt2img_1923654220_ce18927320e7397fbc88a7ef6a968386d80f20a7eda45122b492743f4dc44934_1676432807.png to device 0: cuda - CUDA
ExecutionProvider ({'device_id': 0})

One of the possible cache errors:

[2023-02-15 03:46:29,413] WARNING: onnx_web.server.device_pool: job txt2img_1277851170_17596a8e8d7a8d6b0d8956120e6e1adbb7253014ae50f17f4368f71373573ad3_1676432780.png failed with an error: ['Traceba
ck (most recent call last):\n', '  File "/onnx-web/api/onnx_web/server/device_pool.py", line 243, in job_done\n    f.result()\n', '  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, 
in result\n    return self.__get_result()\n', '  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result\n    raise self._exception\n', '  File "/usr/lib/python3.8/concurren
t/futures/thread.py", line 57, in run\n    result = self.fn(*self.args, **self.kwargs)\n', '  File "/onnx-web/api/onnx_web/diffusion/run.py", line 58, in run_txt2img_pipeline\n    result = pipe(\n',
 '  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion.py", line 296, in __call__\n    [self.vae_decoder(latent_sample=laten
ts[i : i + 1])[0] for i in range(latents.shape[0])]\n', '  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_onnx_stable_diffusion.py", line 296,
 in <listcomp>\n    [self.vae_decoder(latent_sample=latents[i : i + 1])[0] for i in range(latents.shape[0])]\n', '  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/diffusers/pipelines/onnx_
utils.py", line 61, in __call__\n    return self.model.run(None, inputs)\n', '  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 2
00, in run\n    return self._sess.run(output_names, input_feed, run_options)\n', 'onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : /onnxruntime_src/onnxruntime/core/
providers/cuda/cuda_call.cc:124 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cublasStatu
s_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:117 std::conditional_t<THRW, void, onnxruntim
e::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cublasStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common
::Status> = void] CUBLAS failure 3: CUBLAS_STATUS_ALLOC_FAILED ; GPU=0 ; hostname=a0bd396b64fd ; expr=cublasCreate(&cublas_handle_); \n\n\n']

ssube added status/new issues that have not been confirmed yet type/bug broken features labels Jan 13, 2023

ssube added this to the v0.5 milestone Jan 13, 2023

ssube removed this from the v0.5 milestone Jan 19, 2023

ssube added type/feature new features and removed type/bug broken features labels Jan 19, 2023

ssube changed the title ~~prevent API from crashing during concurrent image requests~~ support more than one simultaneous GPU/worker Jan 19, 2023

ssube added this to the v0.6 milestone Jan 22, 2023

ssube added status/progress issues that are in progress and have a branch and removed status/new issues that have not been confirmed yet labels Jan 31, 2023

ssube added a commit that referenced this issue Feb 4, 2023

feat(api): add provider for each available CUDA device (#38)

98b6e4d

ssube added status/fixed issues that have been fixed and released and removed status/progress issues that are in progress and have a branch labels Feb 4, 2023

ssube added a commit that referenced this issue Feb 4, 2023

feat(api): distribute jobs to devices using round-robin (#38)

5e0231c

ssube mentioned this issue Feb 4, 2023

support for multiple devices #91

Merged

ssube closed this as completed in #91 Feb 4, 2023

ssube reopened this Feb 4, 2023

ssube added status/progress issues that are in progress and have a branch and removed status/fixed issues that have been fixed and released labels Feb 5, 2023

ssube mentioned this issue Feb 5, 2023

support multiple cuda devices #92

Merged

ssube modified the milestones: v0.6, v0.7 Feb 5, 2023

ssube removed this from the v0.7 milestone Feb 5, 2023

ssube added a commit that referenced this issue Feb 14, 2023

fix(api): pass device options to ORT session (#38)

d473a0f

ssube added a commit that referenced this issue Feb 15, 2023

fix(api): pass both device and session options to ORT (#38)

8a2a917

ssube added this to the v0.7 milestone Feb 15, 2023

ssube added status/fixed issues that have been fixed and released and removed status/progress issues that are in progress and have a branch labels Feb 15, 2023

ssube mentioned this issue Feb 15, 2023

v0.7.0 release checklist #143

Closed

51 tasks

ssube closed this as completed Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support more than one simultaneous GPU/worker #38

support more than one simultaneous GPU/worker #38

ssube commented Jan 13, 2023

ssube commented Jan 19, 2023

elliotcourant commented Jan 31, 2023

ssube commented Jan 31, 2023

ssube commented Feb 4, 2023

ssube commented Feb 4, 2023

ssube commented Feb 5, 2023

ssube commented Feb 5, 2023

ssube commented Feb 15, 2023

support more than one simultaneous GPU/worker #38

support more than one simultaneous GPU/worker #38

Comments

ssube commented Jan 13, 2023

ssube commented Jan 19, 2023

elliotcourant commented Jan 31, 2023

ssube commented Jan 31, 2023

ssube commented Feb 4, 2023

ssube commented Feb 4, 2023

ssube commented Feb 5, 2023

ssube commented Feb 5, 2023

ssube commented Feb 15, 2023