Torchserve gives errors while running docker image on k8s but not when running image locally #2300

stefanknegt · 2023-05-02T09:17:21Z

🐛 Describe the bug

When running Torchserve on a k8s cluster (Minikube locally), I get a lot of errors while running the exact same docker image locally works fine. The errors seem to be different every time. Sometimes they are about loading transformer models, for instance a .json file being corrupt (which it is not) but sometimes it is also just a lot of Torchserve java errors.

Error logs

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-05-02T08:43:35,845 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-05-02T08:43:36,299 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.7.0
TS Home: /usr/local/lib/python3.8/dist-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 3494 M
Python executable: /usr/bin/python3.8
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/model-server/model-store
Initial Models: qr-model.mar,bloom-model.mar
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/model-server/model-store
Model config: N/A
2023-05-02T08:43:36,351 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-05-02T08:43:36,480 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: qr-model.mar
2023-05-02T08:43:53,220 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 2.0 for model qr-model
2023-05-02T08:43:53,221 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 2.0 for model qr-model
2023-05-02T08:43:53,221 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model qr-model loaded.
2023-05-02T08:43:53,222 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: qr-model, count: 4
2023-05-02T08:43:53,266 [DEBUG] W-9001-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:43:53,266 [DEBUG] W-9000-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:43:53,274 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: bloom-model.mar
2023-05-02T08:43:53,306 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9002, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:43:53,293 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9003, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:15,400 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9002
2023-05-02T08:44:15,706 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2023-05-02T08:44:15,711 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Successfully loaded /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml.
2023-05-02T08:44:15,471 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2023-05-02T08:44:15,713 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - [PID]81
2023-05-02T08:44:15,716 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Torch worker started.
2023-05-02T08:44:15,717 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Python runtime: 3.8.10
2023-05-02T08:44:15,768 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Successfully loaded /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml.
2023-05-02T08:44:15,766 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Successfully loaded /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml.
2023-05-02T08:44:15,764 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9002-qr-model_2.0 State change null -> WORKER_STARTED
2023-05-02T08:44:15,773 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - [PID]73
2023-05-02T08:44:15,771 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - [PID]76
2023-05-02T08:44:15,776 [DEBUG] W-9000-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9000-qr-model_2.0 State change null -> WORKER_STARTED
2023-05-02T08:44:15,779 [DEBUG] W-9001-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9001-qr-model_2.0 State change null -> WORKER_STARTED
2023-05-02T08:44:15,779 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Torch worker started.
2023-05-02T08:44:15,780 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9003
2023-05-02T08:44:15,786 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Python runtime: 3.8.10
2023-05-02T08:44:15,778 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Torch worker started.
2023-05-02T08:44:15,799 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Python runtime: 3.8.10
2023-05-02T08:44:15,807 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Successfully loaded /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml.
2023-05-02T08:44:15,861 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - [PID]83
2023-05-02T08:44:15,865 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9003-qr-model_2.0 State change null -> WORKER_STARTED
2023-05-02T08:44:15,865 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Torch worker started.
2023-05-02T08:44:15,866 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Python runtime: 3.8.10
2023-05-02T08:44:15,986 [INFO ] W-9001-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2023-05-02T08:44:15,986 [INFO ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9003
2023-05-02T08:44:15,986 [INFO ] W-9000-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2023-05-02T08:44:15,986 [INFO ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9002
2023-05-02T08:44:16,671 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9003.
2023-05-02T08:44:16,675 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2023-05-02T08:44:16,678 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9002.
2023-05-02T08:44:16,693 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2023-05-02T08:44:16,707 [INFO ] W-9000-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1683017056706
2023-05-02T08:44:16,706 [INFO ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1683017056706
2023-05-02T08:44:16,706 [INFO ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1683017056706
2023-05-02T08:44:16,706 [INFO ] W-9001-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1683017056706
2023-05-02T08:44:17,185 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - model_name: qr-model, batchSize: 1
2023-05-02T08:44:17,185 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - model_name: qr-model, batchSize: 1
2023-05-02T08:44:17,185 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - model_name: qr-model, batchSize: 1
2023-05-02T08:44:17,205 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - model_name: qr-model, batchSize: 1
2023-05-02T08:44:35,964 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model bloom-model
2023-05-02T08:44:35,968 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bloom-model
2023-05-02T08:44:35,985 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model bloom-model loaded.
2023-05-02T08:44:35,995 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: bloom-model, count: 4
2023-05-02T08:44:36,162 [DEBUG] W-9004-bloom-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9004, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:36,105 [DEBUG] W-9006-bloom-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9006, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:36,282 [DEBUG] W-9007-bloom-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9007, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:36,103 [DEBUG] W-9005-bloom-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9005, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:36,474 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-05-02T08:44:37,266 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-05-02T08:44:37,273 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-05-02T08:44:37,362 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-05-02T08:44:37,368 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-05-02T08:44:37,380 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2023-05-02T08:44:47,762 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-05-02T08:44:49,871 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:100.0|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,886 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:63.14881896972656|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,891 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:33.694358825683594|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,892 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:34.8|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,892 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:3462.1875|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,893 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:10041.5|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,894 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:75.2|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:45:02,580 [INFO ] W-9003-qr-model_2.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9003-qr-model_2.0-stdout
2023-05-02T08:45:02,575 [INFO ] W-9003-qr-model_2.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9003-qr-model_2.0-stderr
2023-05-02T08:45:02,966 [INFO ] epollEventLoopGroup-5-4 org.pytorch.serve.wlm.WorkerThread - 9003 Worker disconnected. WORKER_STARTED
2023-05-02T08:45:03,081 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2023-05-02T08:45:07,475 [INFO ] W-9002-qr-model_2.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9002-qr-model_2.0-stdout
2023-05-02T08:45:07,475 [INFO ] W-9002-qr-model_2.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9002-qr-model_2.0-stderr
2023-05-02T08:45:07,477 [INFO ] epollEventLoopGroup-5-3 org.pytorch.serve.wlm.WorkerThread - 9002 Worker disconnected. WORKER_STARTED
2023-05-02T08:45:07,480 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2023-05-02T08:45:07,495 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException: null
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
	at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:191) [model-server.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
2023-05-02T08:45:03,082 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException: null
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
	at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:191) [model-server.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
2023-05-02T08:45:07,695 [WARN ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: qr-model, error: Worker died.
2023-05-02T08:45:07,696 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9003-qr-model_2.0 State change WORKER_STARTED -> WORKER_STOPPED
2023-05-02T08:45:07,696 [WARN ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: qr-model, error: Worker died.
2023-05-02T08:45:07,697 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9002-qr-model_2.0 State change WORKER_STARTED -> WORKER_STOPPED
2023-05-02T08:45:07,762 [WARN ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9003-qr-model_2.0-stderr
2023-05-02T08:45:07,763 [WARN ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9003-qr-model_2.0-stdout
2023-05-02T08:45:07,767 [WARN ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9002-qr-model_2.0-stderr
2023-05-02T08:45:07,767 [WARN ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9002-qr-model_2.0-stdout
2023-05-02T08:45:07,791 [INFO ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9002 in 1 seconds.

Installation instructions

I am using the following base image: pytorch/torchserve:0.7.0-cpu

Model Packaing

I am packaging the models locally and then adding the .mar files to the dockerfile.

torch-model-archiver \
  --model-name qr-model  \
  --version 2.0 \
  --serialized-file models/qr-model/qr_model_pytorch_model.bin \
  --handler qr_model_handler.py \
  --export-path model-store \

torch-model-archiver \
  --model-name bloom-model  \
  --version 1.0 \
  --serialized-file models/bloom-model/bloom_model_pytorch_model.bin \
  --handler bloom_model_handler.py \
  --export-path model-store \

config.properties

# Location where the models are stored
model_store=/home/model-server/model-store

load_models=qr-model.mar,bloom-model.mar

enable_metrics_api=true

Versions

------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.7.0b20221212
torch-model-archiver==0.7.0b20221212

Python version: 3.8 (64-bit runtime)
Python executable: /usr/bin/python

Versions of relevant python libraries:
captum==0.5.0
intel-extension-for-pytorch==1.13.0
numpy==1.23.5
nvgpu==0.9.0
psutil==5.9.4
pygit2==1.11.1
pylint==2.6.0
pytest==7.2.0
pytest-cov==4.0.0
pytest-mock==3.10.0
requests==2.28.1
requests-toolbelt==0.10.1
sentence-transformers==2.2.2
sentencepiece==0.1.98
torch==1.13.0+cpu
torch-model-archiver==0.7.0b20221212
torch-workflow-archiver==0.2.6b20221212
torchaudio==0.13.0+cpu
torchserve==0.7.0b20221212
torchtext==0.14.0
torchvision==0.14.0+cpu
transformers==4.25.1
wheel==0.38.4
torch==1.13.0+cpu
torchtext==0.14.0
torchvision==0.14.0+cpu
torchaudio==0.13.0+cpu

Java Version:


OS: Ubuntu 20.04.5 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: N/A

Repro instructions

Run the following docker file where models are archived using bash scripts below.

FROM pytorch/torchserve:0.7.0-cpu

COPY requirements.txt .

RUN python -m pip install --upgrade pip && pip install -r requirements.txt

COPY config.properties /home/model-server

COPY model-store/qr-model.mar model-store/
COPY model-store/bloom-model.mar model-store/

CMD ["torchserve", "--start", "--ts-config", "config.properties"]

cd models/qr-model/ || return

zip -r qr_model_pytorch_model.bin .

cd - || return

mkdir model-store

torch-model-archiver \
  --model-name qr-model  \
  --version 2.0 \
  --serialized-file models/qr-model/qr_model_pytorch_model.bin \
  --handler qr_model_handler.py \
  --export-path model-store \

rm models/qr-model/qr_model_pytorch_model.bin

cd models/bloom-model/ || return

zip -r bloom_model_pytorch_model.bin .

cd - || return

mkdir model-store

torch-model-archiver \
  --model-name bloom-model  \
  --version 1.0 \
  --serialized-file models/bloom-model/bloom_model_pytorch_model.bin \
  --handler bloom_model_handler.py \
  --export-path model-store \

rm models/bloom-model/bloom_model_pytorch_model.bin

Possible Solution

The only suspicion I have currently is that it has something to do with packaging the models on a Mac M1 and then serving them on a Linux server, could anyone tell me whether this could cause these errors?

The text was updated successfully, but these errors were encountered:

stefanknegt · 2023-05-02T11:16:56Z

Tested with archiving and building on linux, but same issue. Runs fine locally but gives errors on k8s (also tested different clusters).

Another strange behaviour that I observed is that when you set initial models equal to only one of the models, it sometimes also works on k8s and not only locally.

agunapal · 2023-05-03T23:50:38Z

@stefanknegt Do you mind sharing the yaml file you are using to deploy the cluster and how you are calling the inference api.

Also, its not clear what the 2nd and 3rd part of the repro steps are.

or is the entire thing a part of one Dockerfile?

stefanknegt · 2023-05-04T07:04:31Z

I have bash scripts (the second and third code snippet in the repro steps) that are used to make the .mar files. I am not calling the inference api, since it already 'breaks' before I can make calls to it (see logs in initial post).

In order to reproduce you can do the following:

Install Minikube (https://minikube.sigs.k8s.io/docs/start/)
Make two .mar files for random Transformer models (I've used this one https://huggingface.co/pdelobelle/robbert-v2-dutch-base) by downloading the Huggingface models, zipping the downloaded model files

cd FOLDER_DOWNLOADED_MODEL
zip -r MODEL_NAME_1.bin .
zip -r MODEL_NAME_2.bin .

and running the torch-model-archiver.

torch-model-archiver \
  --model-name MODEL_NAME_1 \
  --version 1.0 \
  --serialized-file models/MODEL_NAME_1/MODEL_NAME_1.bin \
  --handler model_handler1.py \
  --export-path model-store \

torch-model-archiver \
  --model-name MODEL_NAME_2 \
  --version 1.0 \
  --serialized-file models/MODEL_NAME_2/MODEL_NAME_2.bin \
  --handler model_handler2.py \
  --export-path model-store \

You have to do this twice since loading one model sometimes does work.

This is a minimal implementation of the handler (the code already breaks during initialisation so you do not need the preprocess functions etcetra). You can use the same handler for both models, you only need to change the self.model = line where the .bin file is loaded.

from __future__ import annotations

import zipfile

import numpy as np
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer


class ModelHandler(object):
    """ """

    def __init__(self):
        self._context = None
        self.initialized = False
        self.model = None

    def initialize(self, ctx):
        self.manifest = ctx.manifest
        properties = ctx.system_properties
        model_dir = properties.get("model_dir")

        try:
            with zipfile.ZipFile(model_dir + "/MODEL_NAME_1.bin", "r") as zip_ref:
                zip_ref.extractall(model_dir)
        except FileExistsError:
            # There can be multiple threads that try to unzip the file.
            # In case it already exists we do not need to raise an exception here
            print("tried unzipping again")
            return

        self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
        self.model.eval()

        self.initialized = True

    def preprocess(self, requests):
        return requests

    def inference(self, text):
        return text

    def postprocess(self, logits):
        return logits


_service = ModelHandler()


def handle(data, context):
    """
    Entry point for ModelHandler handler
    """
    try:
        if not _service.initialized:
            _service.initialize(context)

        if data is None:
            return None

        data = _service.preprocess(data)
        data = _service.inference(data)
        data = _service.postprocess(data)

        return data
    except Exception as e:
        raise Exception("Unable to process input data. " + str(e))

Build a docker container using the following Dockerfile

FROM pytorch/torchserve:0.7.0-cpu

COPY requirements.txt .

RUN python -m pip install --upgrade pip && pip install -r requirements.txt

COPY config.properties /home/model-server

COPY model-store/MODEL_NAME_1.mar model-store/
COPY model-store/MODEL_NAME_2.mar model-store/

CMD ["torchserve", "--start", "--ts-config", "config.properties"]

Load the image in minkube minikube image load IMAGE_NAME
Make a deployment kubectl create deployment torchserve --image=IMAGE_NAME
Inspect the logs of the created pod

@agunapal Thanks for helping me out. If you have any questions, please let me know.

agunapal · 2023-05-04T20:40:54Z

@stefanknegt I made this PR showing k8s mnist example with minikube

#2323

stefanknegt · 2023-05-08T13:18:33Z

@agunapal Do you have any thoughts on how I can fix this issue?

stefanknegt · 2023-05-19T11:55:57Z

@agunapal Is there anything I can do to get this fixed? Thanks!

lee-junjie · 2023-06-06T02:06:31Z

+1.

My code runs well on a local image, but fails on a k8s cluster with the same image.

However, unlike your random errors, the error from my side is always the same.

#2391

arnavmehta7 · 2023-07-07T16:11:23Z

Check if you guys have set self.init, and self.context = context.bla

stefanknegt · 2023-07-10T11:34:37Z

@arnavmehta7 See my code comment above, I have both.

jagadeeshi2i · 2023-08-22T08:11:21Z

Hi @stefanknegt
Please refer this example for serving large models. If your mar file generation is right then make sure default_response_timeout is set as required.

default_response_timeout=6000

sowmyay · 2023-09-12T06:30:21Z

I ran into a similar issue when deploying the torch server to Kubernetes. It turned out to be due to OOM.
Increasing the memory limits fixed the issue for me.

ronit29 · 2024-01-10T10:51:12Z

@sowmyay , right finding...I used Jconsole to figure this out, but there were no relevant logs in the exception.
Thanks for the help!!

stefanknegt changed the title ~~Backend worker monitoring thread interrupted or backend worker process died~~ Torchserve gives errors while running docker image on k8s but not when running image locally May 2, 2023

agunapal self-assigned this May 2, 2023

agunapal added the debugging label May 2, 2023

lee-junjie mentioned this issue Jun 6, 2023

Failed to start torchserve: Backend worker error #2391

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torchserve gives errors while running docker image on k8s but not when running image locally #2300

Torchserve gives errors while running docker image on k8s but not when running image locally #2300

stefanknegt commented May 2, 2023 •

edited

Loading

stefanknegt commented May 2, 2023 •

edited

Loading

agunapal commented May 3, 2023

stefanknegt commented May 4, 2023 •

edited

Loading

agunapal commented May 4, 2023

stefanknegt commented May 8, 2023

stefanknegt commented May 19, 2023

lee-junjie commented Jun 6, 2023

arnavmehta7 commented Jul 7, 2023

stefanknegt commented Jul 10, 2023

jagadeeshi2i commented Aug 22, 2023

sowmyay commented Sep 12, 2023

ronit29 commented Jan 10, 2024

Torchserve gives errors while running docker image on k8s but not when running image locally #2300

Torchserve gives errors while running docker image on k8s but not when running image locally #2300

Comments

stefanknegt commented May 2, 2023 • edited Loading

🐛 Describe the bug

Error logs

Installation instructions

Model Packaing

config.properties

Versions

Repro instructions

Possible Solution

stefanknegt commented May 2, 2023 • edited Loading

agunapal commented May 3, 2023

stefanknegt commented May 4, 2023 • edited Loading

agunapal commented May 4, 2023

stefanknegt commented May 8, 2023

stefanknegt commented May 19, 2023

lee-junjie commented Jun 6, 2023

arnavmehta7 commented Jul 7, 2023

stefanknegt commented Jul 10, 2023

jagadeeshi2i commented Aug 22, 2023

sowmyay commented Sep 12, 2023

ronit29 commented Jan 10, 2024

stefanknegt commented May 2, 2023 •

edited

Loading

stefanknegt commented May 2, 2023 •

edited

Loading

stefanknegt commented May 4, 2023 •

edited

Loading