Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torchserve gives errors while running docker image on k8s but not when running image locally #2300

Open
stefanknegt opened this issue May 2, 2023 · 12 comments
Assignees

Comments

@stefanknegt
Copy link

stefanknegt commented May 2, 2023

🐛 Describe the bug

When running Torchserve on a k8s cluster (Minikube locally), I get a lot of errors while running the exact same docker image locally works fine. The errors seem to be different every time. Sometimes they are about loading transformer models, for instance a .json file being corrupt (which it is not) but sometimes it is also just a lot of Torchserve java errors.

Error logs

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-05-02T08:43:35,845 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-05-02T08:43:36,299 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.7.0
TS Home: /usr/local/lib/python3.8/dist-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 3494 M
Python executable: /usr/bin/python3.8
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/model-server/model-store
Initial Models: qr-model.mar,bloom-model.mar
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/model-server/model-store
Model config: N/A
2023-05-02T08:43:36,351 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-05-02T08:43:36,480 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: qr-model.mar
2023-05-02T08:43:53,220 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 2.0 for model qr-model
2023-05-02T08:43:53,221 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 2.0 for model qr-model
2023-05-02T08:43:53,221 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model qr-model loaded.
2023-05-02T08:43:53,222 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: qr-model, count: 4
2023-05-02T08:43:53,266 [DEBUG] W-9001-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:43:53,266 [DEBUG] W-9000-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:43:53,274 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: bloom-model.mar
2023-05-02T08:43:53,306 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9002, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:43:53,293 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9003, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:15,400 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9002
2023-05-02T08:44:15,706 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2023-05-02T08:44:15,711 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Successfully loaded /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml.
2023-05-02T08:44:15,471 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2023-05-02T08:44:15,713 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - [PID]81
2023-05-02T08:44:15,716 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Torch worker started.
2023-05-02T08:44:15,717 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Python runtime: 3.8.10
2023-05-02T08:44:15,768 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Successfully loaded /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml.
2023-05-02T08:44:15,766 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Successfully loaded /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml.
2023-05-02T08:44:15,764 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9002-qr-model_2.0 State change null -> WORKER_STARTED
2023-05-02T08:44:15,773 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - [PID]73
2023-05-02T08:44:15,771 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - [PID]76
2023-05-02T08:44:15,776 [DEBUG] W-9000-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9000-qr-model_2.0 State change null -> WORKER_STARTED
2023-05-02T08:44:15,779 [DEBUG] W-9001-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9001-qr-model_2.0 State change null -> WORKER_STARTED
2023-05-02T08:44:15,779 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Torch worker started.
2023-05-02T08:44:15,780 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9003
2023-05-02T08:44:15,786 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Python runtime: 3.8.10
2023-05-02T08:44:15,778 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Torch worker started.
2023-05-02T08:44:15,799 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Python runtime: 3.8.10
2023-05-02T08:44:15,807 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Successfully loaded /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml.
2023-05-02T08:44:15,861 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - [PID]83
2023-05-02T08:44:15,865 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9003-qr-model_2.0 State change null -> WORKER_STARTED
2023-05-02T08:44:15,865 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Torch worker started.
2023-05-02T08:44:15,866 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Python runtime: 3.8.10
2023-05-02T08:44:15,986 [INFO ] W-9001-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2023-05-02T08:44:15,986 [INFO ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9003
2023-05-02T08:44:15,986 [INFO ] W-9000-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2023-05-02T08:44:15,986 [INFO ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9002
2023-05-02T08:44:16,671 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9003.
2023-05-02T08:44:16,675 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2023-05-02T08:44:16,678 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9002.
2023-05-02T08:44:16,693 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2023-05-02T08:44:16,707 [INFO ] W-9000-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1683017056706
2023-05-02T08:44:16,706 [INFO ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1683017056706
2023-05-02T08:44:16,706 [INFO ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1683017056706
2023-05-02T08:44:16,706 [INFO ] W-9001-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1683017056706
2023-05-02T08:44:17,185 [INFO ] W-9001-qr-model_2.0-stdout MODEL_LOG - model_name: qr-model, batchSize: 1
2023-05-02T08:44:17,185 [INFO ] W-9000-qr-model_2.0-stdout MODEL_LOG - model_name: qr-model, batchSize: 1
2023-05-02T08:44:17,185 [INFO ] W-9002-qr-model_2.0-stdout MODEL_LOG - model_name: qr-model, batchSize: 1
2023-05-02T08:44:17,205 [INFO ] W-9003-qr-model_2.0-stdout MODEL_LOG - model_name: qr-model, batchSize: 1
2023-05-02T08:44:35,964 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model bloom-model
2023-05-02T08:44:35,968 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model bloom-model
2023-05-02T08:44:35,985 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model bloom-model loaded.
2023-05-02T08:44:35,995 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: bloom-model, count: 4
2023-05-02T08:44:36,162 [DEBUG] W-9004-bloom-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9004, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:36,105 [DEBUG] W-9006-bloom-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9006, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:36,282 [DEBUG] W-9007-bloom-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9007, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:36,103 [DEBUG] W-9005-bloom-model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3.8, /usr/local/lib/python3.8/dist-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9005, --metrics-config, /usr/local/lib/python3.8/dist-packages/ts/configs/metrics.yaml]
2023-05-02T08:44:36,474 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-05-02T08:44:37,266 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2023-05-02T08:44:37,273 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-05-02T08:44:37,362 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2023-05-02T08:44:37,368 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-05-02T08:44:37,380 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2023-05-02T08:44:47,762 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2023-05-02T08:44:49,871 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:100.0|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,886 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:63.14881896972656|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,891 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:33.694358825683594|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,892 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:34.8|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,892 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:3462.1875|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,893 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:10041.5|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:44:49,894 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:75.2|#Level:Host|#hostname:torchserve-7f8d7779fc-7v75m,timestamp:1683017089
2023-05-02T08:45:02,580 [INFO ] W-9003-qr-model_2.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9003-qr-model_2.0-stdout
2023-05-02T08:45:02,575 [INFO ] W-9003-qr-model_2.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9003-qr-model_2.0-stderr
2023-05-02T08:45:02,966 [INFO ] epollEventLoopGroup-5-4 org.pytorch.serve.wlm.WorkerThread - 9003 Worker disconnected. WORKER_STARTED
2023-05-02T08:45:03,081 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2023-05-02T08:45:07,475 [INFO ] W-9002-qr-model_2.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9002-qr-model_2.0-stdout
2023-05-02T08:45:07,475 [INFO ] W-9002-qr-model_2.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9002-qr-model_2.0-stderr
2023-05-02T08:45:07,477 [INFO ] epollEventLoopGroup-5-3 org.pytorch.serve.wlm.WorkerThread - 9002 Worker disconnected. WORKER_STARTED
2023-05-02T08:45:07,480 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2023-05-02T08:45:07,495 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException: null
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
	at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:191) [model-server.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
2023-05-02T08:45:03,082 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException: null
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) ~[?:?]
	at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:191) [model-server.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
2023-05-02T08:45:07,695 [WARN ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: qr-model, error: Worker died.
2023-05-02T08:45:07,696 [DEBUG] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9003-qr-model_2.0 State change WORKER_STARTED -> WORKER_STOPPED
2023-05-02T08:45:07,696 [WARN ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: qr-model, error: Worker died.
2023-05-02T08:45:07,697 [DEBUG] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - W-9002-qr-model_2.0 State change WORKER_STARTED -> WORKER_STOPPED
2023-05-02T08:45:07,762 [WARN ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9003-qr-model_2.0-stderr
2023-05-02T08:45:07,763 [WARN ] W-9003-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9003-qr-model_2.0-stdout
2023-05-02T08:45:07,767 [WARN ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9002-qr-model_2.0-stderr
2023-05-02T08:45:07,767 [WARN ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9002-qr-model_2.0-stdout
2023-05-02T08:45:07,791 [INFO ] W-9002-qr-model_2.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9002 in 1 seconds.

Installation instructions

I am using the following base image: pytorch/torchserve:0.7.0-cpu

Model Packaing

I am packaging the models locally and then adding the .mar files to the dockerfile.

torch-model-archiver \
  --model-name qr-model  \
  --version 2.0 \
  --serialized-file models/qr-model/qr_model_pytorch_model.bin \
  --handler qr_model_handler.py \
  --export-path model-store \

torch-model-archiver \
  --model-name bloom-model  \
  --version 1.0 \
  --serialized-file models/bloom-model/bloom_model_pytorch_model.bin \
  --handler bloom_model_handler.py \
  --export-path model-store \

config.properties

# Location where the models are stored
model_store=/home/model-server/model-store

load_models=qr-model.mar,bloom-model.mar

enable_metrics_api=true

Versions

------------------------------------------------------------------------------------------
Environment headers
------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.7.0b20221212
torch-model-archiver==0.7.0b20221212

Python version: 3.8 (64-bit runtime)
Python executable: /usr/bin/python

Versions of relevant python libraries:
captum==0.5.0
intel-extension-for-pytorch==1.13.0
numpy==1.23.5
nvgpu==0.9.0
psutil==5.9.4
pygit2==1.11.1
pylint==2.6.0
pytest==7.2.0
pytest-cov==4.0.0
pytest-mock==3.10.0
requests==2.28.1
requests-toolbelt==0.10.1
sentence-transformers==2.2.2
sentencepiece==0.1.98
torch==1.13.0+cpu
torch-model-archiver==0.7.0b20221212
torch-workflow-archiver==0.2.6b20221212
torchaudio==0.13.0+cpu
torchserve==0.7.0b20221212
torchtext==0.14.0
torchvision==0.14.0+cpu
transformers==4.25.1
wheel==0.38.4
torch==1.13.0+cpu
torchtext==0.14.0
torchvision==0.14.0+cpu
torchaudio==0.13.0+cpu

Java Version:


OS: Ubuntu 20.04.5 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: N/A
CMake version: N/A

Repro instructions

Run the following docker file where models are archived using bash scripts below.

FROM pytorch/torchserve:0.7.0-cpu

COPY requirements.txt .

RUN python -m pip install --upgrade pip && pip install -r requirements.txt

COPY config.properties /home/model-server

COPY model-store/qr-model.mar model-store/
COPY model-store/bloom-model.mar model-store/

CMD ["torchserve", "--start", "--ts-config", "config.properties"]

cd models/qr-model/ || return

zip -r qr_model_pytorch_model.bin .

cd - || return

mkdir model-store

torch-model-archiver \
  --model-name qr-model  \
  --version 2.0 \
  --serialized-file models/qr-model/qr_model_pytorch_model.bin \
  --handler qr_model_handler.py \
  --export-path model-store \

rm models/qr-model/qr_model_pytorch_model.bin

cd models/bloom-model/ || return

zip -r bloom_model_pytorch_model.bin .

cd - || return

mkdir model-store

torch-model-archiver \
  --model-name bloom-model  \
  --version 1.0 \
  --serialized-file models/bloom-model/bloom_model_pytorch_model.bin \
  --handler bloom_model_handler.py \
  --export-path model-store \

rm models/bloom-model/bloom_model_pytorch_model.bin

Possible Solution

The only suspicion I have currently is that it has something to do with packaging the models on a Mac M1 and then serving them on a Linux server, could anyone tell me whether this could cause these errors?

@stefanknegt stefanknegt changed the title Backend worker monitoring thread interrupted or backend worker process died Torchserve gives errors while running docker image on k8s but not when running image locally May 2, 2023
@stefanknegt
Copy link
Author

stefanknegt commented May 2, 2023

Tested with archiving and building on linux, but same issue. Runs fine locally but gives errors on k8s (also tested different clusters).

Another strange behaviour that I observed is that when you set initial models equal to only one of the models, it sometimes also works on k8s and not only locally.

@agunapal agunapal self-assigned this May 2, 2023
@agunapal
Copy link
Collaborator

agunapal commented May 3, 2023

@stefanknegt Do you mind sharing the yaml file you are using to deploy the cluster and how you are calling the inference api.

Also, its not clear what the 2nd and 3rd part of the repro steps are.

or is the entire thing a part of one Dockerfile?

@stefanknegt
Copy link
Author

stefanknegt commented May 4, 2023

I have bash scripts (the second and third code snippet in the repro steps) that are used to make the .mar files. I am not calling the inference api, since it already 'breaks' before I can make calls to it (see logs in initial post).

In order to reproduce you can do the following:

  1. Install Minikube (https://minikube.sigs.k8s.io/docs/start/)
  2. Make two .mar files for random Transformer models (I've used this one https://huggingface.co/pdelobelle/robbert-v2-dutch-base) by downloading the Huggingface models, zipping the downloaded model files
cd FOLDER_DOWNLOADED_MODEL
zip -r MODEL_NAME_1.bin .
zip -r MODEL_NAME_2.bin .

and running the torch-model-archiver.

torch-model-archiver \
  --model-name MODEL_NAME_1 \
  --version 1.0 \
  --serialized-file models/MODEL_NAME_1/MODEL_NAME_1.bin \
  --handler model_handler1.py \
  --export-path model-store \
torch-model-archiver \
  --model-name MODEL_NAME_2 \
  --version 1.0 \
  --serialized-file models/MODEL_NAME_2/MODEL_NAME_2.bin \
  --handler model_handler2.py \
  --export-path model-store \

You have to do this twice since loading one model sometimes does work.

  1. This is a minimal implementation of the handler (the code already breaks during initialisation so you do not need the preprocess functions etcetra). You can use the same handler for both models, you only need to change the self.model = line where the .bin file is loaded.
from __future__ import annotations

import zipfile

import numpy as np
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer


class ModelHandler(object):
    """ """

    def __init__(self):
        self._context = None
        self.initialized = False
        self.model = None

    def initialize(self, ctx):
        self.manifest = ctx.manifest
        properties = ctx.system_properties
        model_dir = properties.get("model_dir")

        try:
            with zipfile.ZipFile(model_dir + "/MODEL_NAME_1.bin", "r") as zip_ref:
                zip_ref.extractall(model_dir)
        except FileExistsError:
            # There can be multiple threads that try to unzip the file.
            # In case it already exists we do not need to raise an exception here
            print("tried unzipping again")
            return

        self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)
        self.model.eval()

        self.initialized = True

    def preprocess(self, requests):
        return requests

    def inference(self, text):
        return text

    def postprocess(self, logits):
        return logits


_service = ModelHandler()


def handle(data, context):
    """
    Entry point for ModelHandler handler
    """
    try:
        if not _service.initialized:
            _service.initialize(context)

        if data is None:
            return None

        data = _service.preprocess(data)
        data = _service.inference(data)
        data = _service.postprocess(data)

        return data
    except Exception as e:
        raise Exception("Unable to process input data. " + str(e))
  1. Build a docker container using the following Dockerfile
FROM pytorch/torchserve:0.7.0-cpu

COPY requirements.txt .

RUN python -m pip install --upgrade pip && pip install -r requirements.txt

COPY config.properties /home/model-server

COPY model-store/MODEL_NAME_1.mar model-store/
COPY model-store/MODEL_NAME_2.mar model-store/

CMD ["torchserve", "--start", "--ts-config", "config.properties"]
  1. Load the image in minkube minikube image load IMAGE_NAME
  2. Make a deployment kubectl create deployment torchserve --image=IMAGE_NAME
  3. Inspect the logs of the created pod

@agunapal Thanks for helping me out. If you have any questions, please let me know.

@agunapal
Copy link
Collaborator

agunapal commented May 4, 2023

@stefanknegt I made this PR showing k8s mnist example with minikube

#2323

@stefanknegt
Copy link
Author

@agunapal Do you have any thoughts on how I can fix this issue?

@stefanknegt
Copy link
Author

@agunapal Is there anything I can do to get this fixed? Thanks!

@lee-junjie
Copy link

+1.

My code runs well on a local image, but fails on a k8s cluster with the same image.

However, unlike your random errors, the error from my side is always the same.

#2391

@arnavmehta7
Copy link
Contributor

Check if you guys have set self.init, and self.context = context.bla

@stefanknegt
Copy link
Author

@arnavmehta7 See my code comment above, I have both.

@jagadeeshi2i
Copy link
Collaborator

Hi @stefanknegt
Please refer this example for serving large models. If your mar file generation is right then make sure default_response_timeout is set as required.

default_response_timeout=6000

@sowmyay
Copy link

sowmyay commented Sep 12, 2023

I ran into a similar issue when deploying the torch server to Kubernetes. It turned out to be due to OOM.
Increasing the memory limits fixed the issue for me.

@ronit29
Copy link

ronit29 commented Jan 10, 2024

@sowmyay , right finding...I used Jconsole to figure this out, but there were no relevant logs in the exception.
Thanks for the help!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants