Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures on insufficient GPU memory #94

Closed
Johnz86 opened this issue Sep 6, 2023 · 5 comments
Closed

Failures on insufficient GPU memory #94

Johnz86 opened this issue Sep 6, 2023 · 5 comments

Comments

@Johnz86
Copy link

Johnz86 commented Sep 6, 2023

This is my local setup:

NVIDIA-SMI 535.103 Driver Version: 537.13 CUDA Version: 12.2
GPU Name Persistence-M Bus-Id Disp.A Volatile Uncorr. ECC Fan Temp Perf Pwr:Usage/Cap Memory-Usage GPU-Uti Compute M MIG M.
0 NVIDIA RTX A2000 Laptop GPU On 00000000:01:00.0 On N/A N/A 49C P0 11W / 40W 3886MiB / 4096MiB 2% Default N/A

The container seem to start and load the model:

PS C:\Users\z0034zpz> docker run -d --rm -p 8008:8008 --env SERVER_API_TOKEN=LocalTokenInDockerContainer -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting
a455ae67d1b7b829460361bec6d3530629f7d2c1577b5e5c495318d5f378223a
PS C:\Users\z0034zpz> docker logs -f a455

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

20230906 12:56:00 adding job model-contrastcode-3b-multi-0.cfg
20230906 12:56:00 adding job enum_gpus.cfg
20230906 12:56:00 adding job filetune.cfg
20230906 12:56:00 adding job filetune_filter_only.cfg
20230906 12:56:00 adding job process_uploaded.cfg
20230906 12:56:00 adding job webgui.cfg
20230906 12:56:00 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi --compile
 -> pid 30
20230906 12:56:00 CVD= starting python -m self_hosting_machinery.scripts.enum_gpus
 -> pid 31
20230906 12:56:00 CVD= starting python -m self_hosting_machinery.webgui.webgui
 -> pid 32
-- 32 -- 20230906 12:56:00 WEBUI Started server process [32]
-- 32 -- 20230906 12:56:00 WEBUI Waiting for application startup.
-- 32 -- 20230906 12:56:00 WEBUI Application startup complete.
-- 32 -- 20230906 12:56:00 WEBUI Uvicorn running on http://0.0.0.0:8008 (Press CTRL+C to quit)
-- 30 -- 20230906 12:56:06 MODEL STATUS loading model
-- 32 -- 20230906 12:56:23 WEBUI Invalid HTTP request received.
-- 30 -- 20230906 12:56:44 MODEL STATUS test batch
20230906 12:57:22 30 finished python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi @:gpu00, retcode 0
/finished compiling as recognized by watchdog
20230906 12:57:23 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi
 -> pid 105
-- 105 -- 20230906 12:57:25 MODEL STATUS loading model
-- 105 -- 20230906 12:57:51 MODEL STATUS test batch
-- 32 -- 20230906 12:58:15 WEBUI Invalid HTTP request received.

I tried to run the vscode extension with and without api key:
image
I tried to use the extension, but it is inprogress forever:
image
The logs inside the container signal an issue, but do not specify what:

-- 32 -- 20230906 13:25:05 WEBUI 127.0.0.1:45898 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 32 -- 20230906 13:25:08 WEBUI Invalid HTTP request received.

Only if I launch the webui, then I can see:
image
Required memory exceeds the GPU's memory.

Could you please improve the logs, that a more detailed messasge is visible and provide clear warning, that there is not enough memory on graphic card?

@mitya52
Copy link
Member

mitya52 commented Sep 6, 2023

@Johnz86 hi!

'Required memory exceeds the GPU's memory' is just a warning, it does not affect inference at all. But CONTRASTcode/3B requires ~8Gb of VRAM at full context. Using 4Gb for large files can lead to OOM. This warning is unclear and we're fix it in the future.
I think your problem is in infurl: change https to http.
SERVER_API_TOKEN is not using by new docker container, you can remove it.

@Johnz86
Copy link
Author

Johnz86 commented Sep 7, 2023

I tried it with http, the Invalid HTTP request received. does no longer appear in logs. The issues is that no inference is happening, and I can not determine from logs or any response, what is the state of the process.
image
Is there any way to determine, If I should wait for the inference, or when the process does not work at all?

@mitya52
Copy link
Member

mitya52 commented Sep 7, 2023

@Johnz86 you can check error occured in refact.ai below chat (yellow box). Also please give server logs, it should be OOM or something like this.

@Johnz86
Copy link
Author

Johnz86 commented Sep 7, 2023

Here is an example of docker container logs:

PS C:\Users\z0034zpz> docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting
9b1d43ca00bbe3f68e05876e2c18da266348a81cdd851553494643b27ae9afcc
PS C:\Users\z0034zpz> docker logs -f 9b1d

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

20230907 09:06:14 adding job model-contrastcode-3b-multi-0.cfg
20230907 09:06:14 adding job enum_gpus.cfg
20230907 09:06:14 adding job filetune.cfg
20230907 09:06:14 adding job filetune_filter_only.cfg
20230907 09:06:14 adding job process_uploaded.cfg
20230907 09:06:14 adding job webgui.cfg
20230907 09:06:14 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi --compile
 -> pid 31
20230907 09:06:14 CVD= starting python -m self_hosting_machinery.scripts.enum_gpus
 -> pid 32
20230907 09:06:14 CVD= starting python -m self_hosting_machinery.webgui.webgui
 -> pid 33
-- 33 -- 20230907 09:06:14 WEBUI Started server process [33]
-- 33 -- 20230907 09:06:14 WEBUI Waiting for application startup.
-- 33 -- 20230907 09:06:14 WEBUI Application startup complete.
-- 33 -- 20230907 09:06:14 WEBUI Uvicorn running on http://0.0.0.0:8008 (Press CTRL+C to quit)
-- 31 -- 20230907 09:06:19 MODEL STATUS loading model
-- 31 -- 20230907 09:07:03 MODEL STATUS test batch
20230907 09:07:46 31 finished python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi @:gpu00, retcode 0
/finished compiling as recognized by watchdog
20230907 09:07:47 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi
 -> pid 111
-- 111 -- 20230907 09:07:50 MODEL STATUS loading model
-- 33 -- 20230907 09:07:51 WEBUI 172.17.0.1:41986 - "GET /v1/login HTTP/1.1" 200
-- 111 -- 20230907 09:08:17 MODEL STATUS test batch
-- 111 -- 20230907 09:08:52 MODEL STATUS serving CONTRASTcode/3b/multi
-- 111 -- 20230907 09:09:02 MODEL 10008.3ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:02 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:12 MODEL 10004.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:12 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:22 MODEL 10003.5ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:22 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 33 -- 20230907 09:09:26 WEBUI comp-SvVQSvapACW6 model resolve "gpt3.5" -> error "model is not loaded (2)" from XXX
-- 33 -- 20230907 09:09:26 WEBUI 172.17.0.1:41990 - "POST /v1/chat HTTP/1.1" 400
-- 111 -- 20230907 09:09:32 MODEL 10005.4ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:32 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:42 MODEL 10005.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:42 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:52 MODEL 10002.7ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:52 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:02 MODEL 10002.7ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:02 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:12 MODEL 10003.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:12 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:22 MODEL 10006.3ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:22 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200

Here is how it looks in ui:
image

@olegklimov
Copy link
Contributor

model is not loaded (2) -- it can't access the model, according the logs.

I guess the good way to go about solving this -- react to configuration changes faster.

#158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants