Failures on insufficient GPU memory #94

Johnz86 · 2023-09-06T13:30:10Z

This is my local setup:

NVIDIA-SMI 535.103	Driver Version: 537.13	CUDA Version: 12.2

GPU	Name	Persistence-M	Bus-Id	Disp.A	Volatile Uncorr. ECC	Fan	Temp	Perf	Pwr:Usage/Cap	Memory-Usage	GPU-Uti	Compute M	MIG M.
0	NVIDIA RTX A2000 Laptop GPU	On	00000000:01:00.0	On	N/A	N/A	49C	P0	11W / 40W	3886MiB / 4096MiB	2%	Default	N/A

The container seem to start and load the model:

PS C:\Users\z0034zpz> docker run -d --rm -p 8008:8008 --env SERVER_API_TOKEN=LocalTokenInDockerContainer -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting
a455ae67d1b7b829460361bec6d3530629f7d2c1577b5e5c495318d5f378223a
PS C:\Users\z0034zpz> docker logs -f a455

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

20230906 12:56:00 adding job model-contrastcode-3b-multi-0.cfg
20230906 12:56:00 adding job enum_gpus.cfg
20230906 12:56:00 adding job filetune.cfg
20230906 12:56:00 adding job filetune_filter_only.cfg
20230906 12:56:00 adding job process_uploaded.cfg
20230906 12:56:00 adding job webgui.cfg
20230906 12:56:00 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi --compile
 -> pid 30
20230906 12:56:00 CVD= starting python -m self_hosting_machinery.scripts.enum_gpus
 -> pid 31
20230906 12:56:00 CVD= starting python -m self_hosting_machinery.webgui.webgui
 -> pid 32
-- 32 -- 20230906 12:56:00 WEBUI Started server process [32]
-- 32 -- 20230906 12:56:00 WEBUI Waiting for application startup.
-- 32 -- 20230906 12:56:00 WEBUI Application startup complete.
-- 32 -- 20230906 12:56:00 WEBUI Uvicorn running on http://0.0.0.0:8008 (Press CTRL+C to quit)
-- 30 -- 20230906 12:56:06 MODEL STATUS loading model
-- 32 -- 20230906 12:56:23 WEBUI Invalid HTTP request received.
-- 30 -- 20230906 12:56:44 MODEL STATUS test batch
20230906 12:57:22 30 finished python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi @:gpu00, retcode 0
/finished compiling as recognized by watchdog
20230906 12:57:23 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi
 -> pid 105
-- 105 -- 20230906 12:57:25 MODEL STATUS loading model
-- 105 -- 20230906 12:57:51 MODEL STATUS test batch
-- 32 -- 20230906 12:58:15 WEBUI Invalid HTTP request received.

I tried to run the vscode extension with and without api key:

I tried to use the extension, but it is inprogress forever:

The logs inside the container signal an issue, but do not specify what:

-- 32 -- 20230906 13:25:05 WEBUI 127.0.0.1:45898 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 32 -- 20230906 13:25:08 WEBUI Invalid HTTP request received.

Only if I launch the webui, then I can see:

Required memory exceeds the GPU's memory.

Could you please improve the logs, that a more detailed messasge is visible and provide clear warning, that there is not enough memory on graphic card?

The text was updated successfully, but these errors were encountered:

mitya52 · 2023-09-06T16:08:50Z

@Johnz86 hi!

'Required memory exceeds the GPU's memory' is just a warning, it does not affect inference at all. But CONTRASTcode/3B requires ~8Gb of VRAM at full context. Using 4Gb for large files can lead to OOM. This warning is unclear and we're fix it in the future.
I think your problem is in infurl: change https to http.
SERVER_API_TOKEN is not using by new docker container, you can remove it.

Johnz86 · 2023-09-07T06:59:04Z

I tried it with http, the Invalid HTTP request received. does no longer appear in logs. The issues is that no inference is happening, and I can not determine from logs or any response, what is the state of the process.

Is there any way to determine, If I should wait for the inference, or when the process does not work at all?

mitya52 · 2023-09-07T08:53:04Z

@Johnz86 you can check error occured in refact.ai below chat (yellow box). Also please give server logs, it should be OOM or something like this.

Johnz86 · 2023-09-07T09:16:24Z

Here is an example of docker container logs:

PS C:\Users\z0034zpz> docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting
9b1d43ca00bbe3f68e05876e2c18da266348a81cdd851553494643b27ae9afcc
PS C:\Users\z0034zpz> docker logs -f 9b1d

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

20230907 09:06:14 adding job model-contrastcode-3b-multi-0.cfg
20230907 09:06:14 adding job enum_gpus.cfg
20230907 09:06:14 adding job filetune.cfg
20230907 09:06:14 adding job filetune_filter_only.cfg
20230907 09:06:14 adding job process_uploaded.cfg
20230907 09:06:14 adding job webgui.cfg
20230907 09:06:14 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi --compile
 -> pid 31
20230907 09:06:14 CVD= starting python -m self_hosting_machinery.scripts.enum_gpus
 -> pid 32
20230907 09:06:14 CVD= starting python -m self_hosting_machinery.webgui.webgui
 -> pid 33
-- 33 -- 20230907 09:06:14 WEBUI Started server process [33]
-- 33 -- 20230907 09:06:14 WEBUI Waiting for application startup.
-- 33 -- 20230907 09:06:14 WEBUI Application startup complete.
-- 33 -- 20230907 09:06:14 WEBUI Uvicorn running on http://0.0.0.0:8008 (Press CTRL+C to quit)
-- 31 -- 20230907 09:06:19 MODEL STATUS loading model
-- 31 -- 20230907 09:07:03 MODEL STATUS test batch
20230907 09:07:46 31 finished python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi @:gpu00, retcode 0
/finished compiling as recognized by watchdog
20230907 09:07:47 CVD=0 starting python -m self_hosting_machinery.inference.inference_worker --model CONTRASTcode/3b/multi
 -> pid 111
-- 111 -- 20230907 09:07:50 MODEL STATUS loading model
-- 33 -- 20230907 09:07:51 WEBUI 172.17.0.1:41986 - "GET /v1/login HTTP/1.1" 200
-- 111 -- 20230907 09:08:17 MODEL STATUS test batch
-- 111 -- 20230907 09:08:52 MODEL STATUS serving CONTRASTcode/3b/multi
-- 111 -- 20230907 09:09:02 MODEL 10008.3ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:02 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:12 MODEL 10004.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:12 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:22 MODEL 10003.5ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:22 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 33 -- 20230907 09:09:26 WEBUI comp-SvVQSvapACW6 model resolve "gpt3.5" -> error "model is not loaded (2)" from XXX
-- 33 -- 20230907 09:09:26 WEBUI 172.17.0.1:41990 - "POST /v1/chat HTTP/1.1" 400
-- 111 -- 20230907 09:09:32 MODEL 10005.4ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:32 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:42 MODEL 10005.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:42 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:09:52 MODEL 10002.7ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:09:52 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:02 MODEL 10002.7ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:02 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:12 MODEL 10003.0ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:12 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200
-- 111 -- 20230907 09:10:22 MODEL 10006.3ms http://127.0.0.1:8008/infengine-v1/completions-wait-batch WAIT
-- 33 -- 20230907 09:10:22 WEBUI 127.0.0.1:41988 - "POST /infengine-v1/completions-wait-batch HTTP/1.1" 200

Here is how it looks in ui:

olegklimov · 2023-10-03T11:46:37Z

model is not loaded (2) -- it can't access the model, according the logs.

I guess the good way to go about solving this -- react to configuration changes faster.

#158

olegklimov closed this as completed Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failures on insufficient GPU memory #94

Failures on insufficient GPU memory #94

Johnz86 commented Sep 6, 2023

mitya52 commented Sep 6, 2023

Johnz86 commented Sep 7, 2023

mitya52 commented Sep 7, 2023

Johnz86 commented Sep 7, 2023

olegklimov commented Oct 3, 2023

Failures on insufficient GPU memory #94

Failures on insufficient GPU memory #94

Comments

Johnz86 commented Sep 6, 2023

mitya52 commented Sep 6, 2023

Johnz86 commented Sep 7, 2023

mitya52 commented Sep 7, 2023

Johnz86 commented Sep 7, 2023

olegklimov commented Oct 3, 2023