Skip to content
This repository was archived by the owner on Aug 7, 2025. It is now read-only.
This repository was archived by the owner on Aug 7, 2025. It is now read-only.

Using CUDA_ VISIBLE_ DEVICES cannot set Nvidia control visibility #2515

@twwch

Description

@twwch

🐛 Describe the bug

The GPUs 0, 1, and 2 on my server are occupied by other programs, so I refer to this article https://pytorch.org/serve/configuration.html?highlight=ts +Config, using CUDA_ VISIBLE_ DEFICES has set the visibility of nvidia, but it will not work. The program still reads all Gups and encounters this exception RuntimeError: 507 - System out of memory
image

Error logs

2023-08-04T09:27:13,680 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-08-04T09:27:13,732 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml
2023-08-04T09:27:13,978 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.8.1
TS Home: /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages
Current directory: /data0/chenhao/codes/serve/examples/Huggingface_Transformers
Temp directory: /tmp
Metrics config path: /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 5
Number of CPUs: 192
Max heap size: 30688 M
Python executable: /home/zhangchong/miniconda3/envs/bs-model/bin/python
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /data0/chenhao/codes/serve/examples/Huggingface_Transformers/model_store
Initial Models: cama164w=cama164w.mar
Log dir: /data0/chenhao/codes/serve/examples/Huggingface_Transformers/logs
Metrics dir: /data0/chenhao/codes/serve/examples/Huggingface_Transformers/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 5
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.|http(s)?://.]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: log
Disable system metrics: false
Workflow Store: /data0/chenhao/codes/serve/examples/Huggingface_Transformers/model_store
Model config: N/A
2023-08-04T09:27:13,985 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2023-08-04T09:27:14,000 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: cama164w.mar
2023-08-04T09:31:25,024 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model cama164w
2023-08-04T09:31:25,024 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model cama164w
2023-08-04T09:31:25,024 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model cama164w loaded.
2023-08-04T09:31:25,025 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: cama164w, count: 5
2023-08-04T09:31:25,038 [DEBUG] W-9000-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,038 [DEBUG] W-9002-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9002, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,040 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-08-04T09:31:25,038 [DEBUG] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9004, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,038 [DEBUG] W-9001-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9001, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,038 [DEBUG] W-9003-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9003, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,154 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-08-04T09:31:25,154 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-08-04T09:31:25,156 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-08-04T09:31:25,156 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-08-04T09:31:25,157 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2023-08-04T09:31:25,371 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-08-04T09:31:26,174 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9000, pid=2578423
2023-08-04T09:31:26,175 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000
2023-08-04T09:31:26,182 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,182 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - [PID]2578423
2023-08-04T09:31:26,183 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,183 [DEBUG] W-9000-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,183 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,187 [INFO ] W-9000-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2023-08-04T09:31:26,195 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000.
2023-08-04T09:31:26,197 [INFO ] W-9000-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686197
2023-08-04T09:31:26,218 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:26,235 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9003, pid=2578421
2023-08-04T09:31:26,236 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9003
2023-08-04T09:31:26,242 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9004, pid=2578422
2023-08-04T09:31:26,243 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,243 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9004
2023-08-04T09:31:26,243 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - [PID]2578421
2023-08-04T09:31:26,243 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,243 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9002, pid=2578419
2023-08-04T09:31:26,243 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,244 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9002
2023-08-04T09:31:26,243 [DEBUG] W-9003-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9003-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,243 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9001, pid=2578420
2023-08-04T09:31:26,244 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9001
2023-08-04T09:31:26,244 [INFO ] W-9003-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9003
2023-08-04T09:31:26,247 [INFO ] W-9003-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686247
2023-08-04T09:31:26,247 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9003.
2023-08-04T09:31:26,250 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,250 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - [PID]2578422
2023-08-04T09:31:26,251 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,251 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,251 [DEBUG] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9004-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,251 [INFO ] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9004
2023-08-04T09:31:26,251 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,251 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,252 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - [PID]2578419
2023-08-04T09:31:26,252 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - [PID]2578420
2023-08-04T09:31:26,252 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,252 [DEBUG] W-9002-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9002-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,252 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,252 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,252 [DEBUG] W-9001-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,252 [INFO ] W-9002-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9002
2023-08-04T09:31:26,252 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,256 [INFO ] W-9001-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9001
2023-08-04T09:31:26,258 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9004.
2023-08-04T09:31:26,257 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:26,258 [INFO ] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686258
2023-08-04T09:31:26,259 [INFO ] W-9002-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686259
2023-08-04T09:31:26,259 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9002.
2023-08-04T09:31:26,260 [INFO ] W-9001-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686260
2023-08-04T09:31:26,260 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9001.
2023-08-04T09:31:26,276 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:26,277 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:26,277 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:27,925 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,926 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:271.55054473876953|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,926 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:150.51158142089844|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:35.7|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:96.89697265625|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:39689.0|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:97.20947265625|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:39817.0|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:98.03955078125|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:40157.0|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:1007783.95703125|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:17006.27734375|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:2.3|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:31,215 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,215 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,217 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:31:31,226 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,226 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,227 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:31:31,232 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,233 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,233 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:31:31,244 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,244 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,244 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:31:31,245 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,245 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,245 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:271.55013275146484|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:150.51199340820312|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:35.7|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:96.89697265625|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:39689.0|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:97.20947265625|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:39817.0|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:98.03955078125|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:40157.0|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:723539.37890625|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:301250.87109375|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:29.8|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Load model cama164w cuda OOM, exception CUDA error: invalid device ordinal
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py", line 131, in load_model
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - service = model_loader.load(
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_loader.py", line 135, in load
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - initialize_fn(service.context)
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/tmp/models/2c2150aaa38d4b6ba57b7a2023d4dae2/Transformer_handler_generalized.py", line 104, in initialize
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - self.model.to(self.device)
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1900, in to
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - return super().to(*args, **kwargs)
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - return self._apply(convert)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - module._apply(fn)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - module._apply(fn)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - param_applied = fn(param)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - RuntimeError: CUDA error: invalid device ordinal
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2023-08-04T09:33:00,171 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2023-08-04T09:33:00,171 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
2023-08-04T09:33:02,409 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Backend worker process died.
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py", line 253, in
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - worker.run_server()
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py", line 221, in run_server
2023-08-04T09:33:02,412 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - self.handle_connection(cl_socket)
2023-08-04T09:33:02,412 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py", line 189, in handle_connection
2023-08-04T09:33:02,412 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - raise RuntimeError("{} - {}".format(code, result))
2023-08-04T09:33:02,412 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - RuntimeError: 507 - System out of memory
2023-08-04T09:33:02,417 [DEBUG] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - s

Installation instructions

install torchserve from source: yes

Model Packaing

orch-model-archiver --model-name cama164w --version 1.0 --serialized-file /data/models/CaMA-164w/pytorch_model.bin --handler ./Transformer_handler_generalized.py --extra-files "/data/models/CaMA-164w/config.json,./setup_config.json"

config.properties

number_of_gpu=5

Versions


Environment headers

Torchserve branch:

torchserve==0.8.1
torch-model-archiver==0.8.1

Python version: 3.9 (64-bit runtime)
Python executable: /home/zhangchong/miniconda3/envs/bs-model/bin/python

Versions of relevant python libraries:
captum==0.6.0
numpy==1.25.2
nvgpu==0.10.0
psutil==5.9.5
requests==2.31.0
sentencepiece==0.1.99
torch==2.0.1
torch-model-archiver==0.8.1
torchserve==0.8.1
transformers==4.31.0
wheel==0.38.4
torch==2.0.1
**Warning: torchtext not present ..
**Warning: torchvision not present ..
**Warning: torchaudio not present ..

Java Version:

OS: Ubuntu 20.04 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: version 3.27.0

Is CUDA available: Yes
CUDA runtime version: 11.7.64
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
GPU 4: NVIDIA A100-SXM4-40GB
GPU 5: NVIDIA A100-SXM4-40GB
GPU 6: NVIDIA A100-SXM4-40GB
GPU 7: NVIDIA A100-SXM4-40GB
Nvidia driver version: 515.105.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.1

Repro instructions

export CUDA_DEVICE_ORDER="PCI_BUS_ID"

export CUDA_VISIBLE_DEVICES="3,4,5,6,7"

torchserve --start --model-store model_store --ts-config config.properties --models cama164w=cama164w.mar --ncs

Possible Solution

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagedIssue has been reviewed and triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions