Using CUDA_ VISIBLE_ DEVICES cannot set Nvidia control visibility

### 🐛 Describe the bug

The GPUs 0, 1, and 2 on my server are occupied by other programs, so I refer to this article https://pytorch.org/serve/configuration.html?highlight=ts +Config, using CUDA_ VISIBLE_ DEFICES has set the visibility of nvidia, but it will not work. The program still reads all Gups and encounters this exception RuntimeError: 507 - System out of memory
![image](https://github.com/pytorch/serve/assets/44161003/9697d496-6de6-479e-83fa-10f350e29051)


### Error logs

2023-08-04T09:27:13,680 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-08-04T09:27:13,732 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml
2023-08-04T09:27:13,978 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.8.1
TS Home: /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages
Current directory: /data0/chenhao/codes/serve/examples/Huggingface_Transformers
Temp directory: /tmp
Metrics config path: /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 5
Number of CPUs: 192
Max heap size: 30688 M
Python executable: /home/zhangchong/miniconda3/envs/bs-model/bin/python
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /data0/chenhao/codes/serve/examples/Huggingface_Transformers/model_store
Initial Models: cama164w=cama164w.mar
Log dir: /data0/chenhao/codes/serve/examples/Huggingface_Transformers/logs
Metrics dir: /data0/chenhao/codes/serve/examples/Huggingface_Transformers/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 5
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: log
Disable system metrics: false
Workflow Store: /data0/chenhao/codes/serve/examples/Huggingface_Transformers/model_store
Model config: N/A
2023-08-04T09:27:13,985 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-08-04T09:27:14,000 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: cama164w.mar
2023-08-04T09:31:25,024 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model cama164w
2023-08-04T09:31:25,024 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model cama164w
2023-08-04T09:31:25,024 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model cama164w loaded.
2023-08-04T09:31:25,025 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: cama164w, count: 5
2023-08-04T09:31:25,038 [DEBUG] W-9000-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,038 [DEBUG] W-9002-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9002, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,040 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-08-04T09:31:25,038 [DEBUG] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9004, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,038 [DEBUG] W-9001-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9001, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,038 [DEBUG] W-9003-cama164w_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/zhangchong/miniconda3/envs/bs-model/bin/python, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9003, --metrics-config, /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-08-04T09:31:25,154 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-08-04T09:31:25,154 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-08-04T09:31:25,156 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-08-04T09:31:25,156 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-08-04T09:31:25,157 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2023-08-04T09:31:25,371 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-08-04T09:31:26,174 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9000, pid=2578423
2023-08-04T09:31:26,175 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000
2023-08-04T09:31:26,182 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,182 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - [PID]2578423
2023-08-04T09:31:26,183 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,183 [DEBUG] W-9000-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,183 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,187 [INFO ] W-9000-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2023-08-04T09:31:26,195 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000.
2023-08-04T09:31:26,197 [INFO ] W-9000-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686197
2023-08-04T09:31:26,218 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:26,235 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9003, pid=2578421
2023-08-04T09:31:26,236 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9003
2023-08-04T09:31:26,242 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9004, pid=2578422
2023-08-04T09:31:26,243 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,243 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9004
2023-08-04T09:31:26,243 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - [PID]2578421
2023-08-04T09:31:26,243 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,243 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9002, pid=2578419
2023-08-04T09:31:26,243 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,244 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9002
2023-08-04T09:31:26,243 [DEBUG] W-9003-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9003-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,243 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - s_name_part0=/tmp/.ts.sock, s_name_part1=9001, pid=2578420
2023-08-04T09:31:26,244 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9001
2023-08-04T09:31:26,244 [INFO ] W-9003-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9003
2023-08-04T09:31:26,247 [INFO ] W-9003-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686247
2023-08-04T09:31:26,247 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9003.
2023-08-04T09:31:26,250 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,250 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - [PID]2578422
2023-08-04T09:31:26,251 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,251 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,251 [DEBUG] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9004-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,251 [INFO ] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9004
2023-08-04T09:31:26,251 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,251 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Successfully loaded /home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-08-04T09:31:26,252 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - [PID]2578419
2023-08-04T09:31:26,252 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - [PID]2578420
2023-08-04T09:31:26,252 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,252 [DEBUG] W-9002-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9002-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,252 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Torch worker started.
2023-08-04T09:31:26,252 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,252 [DEBUG] W-9001-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-cama164w_1.0 State change null -> WORKER_STARTED
2023-08-04T09:31:26,252 [INFO ] W-9002-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9002
2023-08-04T09:31:26,252 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Python runtime: 3.9.17
2023-08-04T09:31:26,256 [INFO ] W-9001-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9001
2023-08-04T09:31:26,258 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9004.
2023-08-04T09:31:26,257 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:26,258 [INFO ] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686258
2023-08-04T09:31:26,259 [INFO ] W-9002-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686259
2023-08-04T09:31:26,259 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9002.
2023-08-04T09:31:26,260 [INFO ] W-9001-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1691112686260
2023-08-04T09:31:26,260 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9001.
2023-08-04T09:31:26,276 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:26,277 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:26,277 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - model_name: cama164w, batchSize: 1
2023-08-04T09:31:27,925 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,926 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:271.55054473876953|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,926 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:150.51158142089844|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:35.7|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:96.89697265625|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:39689.0|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:97.20947265625|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:39817.0|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,927 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:98.03955078125|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:40157.0|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,928 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:0.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,929 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:1007783.95703125|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:17006.27734375|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:27,930 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:2.3|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112687
2023-08-04T09:31:31,215 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,215 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,217 [INFO ] W-9003-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:31:31,226 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,226 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,227 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:31:31,232 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,233 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,233 [INFO ] W-9000-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:31:31,244 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,244 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,244 [INFO ] W-9001-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:31:31,245 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Enabled tensor cores
2023-08-04T09:31:31,245 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-08-04T09:31:31,245 [INFO ] W-9002-cama164w_1.0-stdout MODEL_LOG - Transformers version 4.31.0
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:271.55013275146484|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:150.51199340820312|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:35.7|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:96.89697265625|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,578 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:39689.0|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:97.20947265625|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:39817.0|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:98.03955078125|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:40157.0|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,579 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:0.0048828125|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:2.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,580 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:1|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:2|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:3|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:4|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:5|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:6|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:7|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:723539.37890625|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:301250.87109375|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:32:27,581 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:29.8|#Level:Host|#hostname:txsh-gpu02,timestamp:1691112747
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Load model cama164w cuda OOM, exception CUDA error: invalid device ordinal
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-08-04T09:33:00,168 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py", line 131, in load_model
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     service = model_loader.load(
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_loader.py", line 135, in load
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     initialize_fn(service.context)
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/tmp/models/2c2150aaa38d4b6ba57b7a2023d4dae2/Transformer_handler_generalized.py", line 104, in initialize
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     self.model.to(self.device)
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1900, in to
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     return super().to(*args, **kwargs)
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
2023-08-04T09:33:00,169 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     return self._apply(convert)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     module._apply(fn)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     module._apply(fn)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     param_applied = fn(param)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - RuntimeError: CUDA error: invalid device ordinal
2023-08-04T09:33:00,170 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2023-08-04T09:33:00,171 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2023-08-04T09:33:00,171 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
2023-08-04T09:33:02,409 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - 
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Backend worker process died.
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py", line 253, in <module>
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     worker.run_server()
2023-08-04T09:33:02,410 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py", line 221, in run_server
2023-08-04T09:33:02,412 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     self.handle_connection(cl_socket)
2023-08-04T09:33:02,412 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -   File "/home/zhangchong/miniconda3/envs/bs-model/lib/python3.9/site-packages/ts/model_service_worker.py", line 189, in handle_connection
2023-08-04T09:33:02,412 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG -     raise RuntimeError("{} - {}".format(code, result))
2023-08-04T09:33:02,412 [INFO ] W-9004-cama164w_1.0-stdout MODEL_LOG - RuntimeError: 507 - System out of memory
2023-08-04T09:33:02,417 [DEBUG] W-9004-cama164w_1.0 org.pytorch.serve.wlm.WorkerThread - s

### Installation instructions

install torchserve from source: yes

### Model Packaing

orch-model-archiver --model-name cama164w  --version 1.0   --serialized-file /data/models/CaMA-164w/pytorch_model.bin  --handler ./Transformer_handler_generalized.py  --extra-files "/data/models/CaMA-164w/config.json,./setup_config.json"

### config.properties

number_of_gpu=5

### Versions

------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.8.1
torch-model-archiver==0.8.1

Python version: 3.9 (64-bit runtime)
Python executable: /home/zhangchong/miniconda3/envs/bs-model/bin/python

Versions of relevant python libraries:
captum==0.6.0
numpy==1.25.2
nvgpu==0.10.0
psutil==5.9.5
requests==2.31.0
sentencepiece==0.1.99
torch==2.0.1
torch-model-archiver==0.8.1
torchserve==0.8.1
transformers==4.31.0
wheel==0.38.4
torch==2.0.1
**Warning: torchtext not present ..
**Warning: torchvision not present ..
**Warning: torchaudio not present ..

Java Version:


OS: Ubuntu 20.04 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: version 3.27.0

Is CUDA available: Yes
CUDA runtime version: 11.7.64
GPU models and configuration: 
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
GPU 4: NVIDIA A100-SXM4-40GB
GPU 5: NVIDIA A100-SXM4-40GB
GPU 6: NVIDIA A100-SXM4-40GB
GPU 7: NVIDIA A100-SXM4-40GB
Nvidia driver version: 515.105.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.1

### Repro instructions

export CUDA_DEVICE_ORDER="PCI_BUS_ID"

export CUDA_VISIBLE_DEVICES="3,4,5,6,7"

torchserve --start --model-store model_store --ts-config config.properties --models cama164w=cama164w.mar --ncs

### Possible Solution

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using CUDA_ VISIBLE_ DEVICES cannot set Nvidia control visibility #2515

🐛 Describe the bug

Error logs

Installation instructions

Model Packaing

config.properties

Versions

Environment headers

Repro instructions

Possible Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using CUDA_ VISIBLE_ DEVICES cannot set Nvidia control visibility #2515

Description

🐛 Describe the bug

Error logs

Installation instructions

Model Packaing

config.properties

Versions

Environment headers

Repro instructions

Possible Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions