Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IllegalMonitorState Exceptions on running benchmark tests #82

Closed
chauhang opened this issue Mar 7, 2020 · 11 comments · Fixed by #172
Closed

IllegalMonitorState Exceptions on running benchmark tests #82

chauhang opened this issue Mar 7, 2020 · 11 comments · Fixed by #172
Assignees
Labels
bug Something isn't working

Comments

@chauhang
Copy link
Contributor

chauhang commented Mar 7, 2020

Seeing many illegalmonitorstate exceptions on running the benchmarks tests. Tested on p2.8xlarge
with the default configurations using the config.properties in the benchmark folder and adding model store path to it.

Output

(pytorch_p36) ubuntu@ip-172-31-41-247:~/serve/benchmarks$ python benchmark.py throughput --ts http://127.0.0.1:8080
Running benchmark throughput with model resnet-18
Processing jmeter output
Output available at /tmp/TSBenchmark/out/throughput/resnet-18
Report generated at /tmp/TSBenchmark/out/throughput/resnet-18/report/index.html
[{'throughput_resnet-18_Inference_Request_Average': 670,
'throughput_resnet-18_Inference_Request_Median': 526,
'throughput_resnet-18_Inference_Request_Throughput': 137.0,
'throughput_resnet-18_Inference_Request_aggregate_report_90_line': 681,
'throughput_resnet-18_Inference_Request_aggregate_report_99_line': 8585,
'throughput_resnet-18_Inference_Request_aggregate_report_error': '0.00%'}]

Jmeter reports: Don't show any errors
https://sagemaker-pt.s3-us-west-2.amazonaws.com/bench.zip

Even though hits/sec reaches 200/sec, in the metrics logged on the console there is never more than one active request at any time. So unclear if the metrics are getting logged properly. The Requests2xx.Count is always 1 as below:

2020-03-07 01:58:50,800 [INFO ] W-9015-resnet-18_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-41-247,timestamp:null

Errors in TrochServe console logs
2020-03-07 01:58:51,109 [INFO ] W-9008-resnet-18_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-41-247,timestamp:null
2020-03-07 01:58:51,109 [DEBUG] W-9008-resnet-18_1.0 org.pytorch.serve.wlm.Job - Waiting time: 0, Backend time: 38
2020-03-07 01:58:51,115 [INFO ] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 40
2020-03-07 01:58:51,115 [INFO ] W-9023-resnet-18_1.0 ACCESS_LOG - /127.0.0.1:50756 "POST /predictions/resnet-18 HTTP/1.1" 200 50
2020-03-07 01:58:51,115 [INFO ] W-9023-resnet-18_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:ip-172-31-41-247,timestamp:null
2020-03-07 01:58:51,115 [DEBUG] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.Job - Waiting time: 0, Backend time: 49
2020-03-07 01:58:51,115 [INFO ] W-9023-resnet-18_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:36.98|#ModelName:resnet-18,Level:Model|#hostname:ip-172-31-41-247,requestID:63c3d128-d2ab-4e60-8ab7-fa9905b6de93,timestamp:1583546331
2020-03-07 01:58:51,130 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.ModelVersionedRefs - Removed model: resnet-18 version: 1.0
2020-03-07 01:58:51,131 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9031-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:51,133 [WARN ] W-9031-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:51,145 [INFO ] epollEventLoopGroup-4-32 org.pytorch.serve.wlm.WorkerThread - 9031 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:51,146 [DEBUG] W-9031-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9031-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:51,148 [DEBUG] W-9031-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:51,315 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9030-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:51,316 [WARN ] W-9030-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:51,317 [INFO ] epollEventLoopGroup-4-29 org.pytorch.serve.wlm.WorkerThread - 9030 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:51,317 [DEBUG] W-9030-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9030-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:51,318 [DEBUG] W-9030-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:51,584 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9029-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:51,585 [WARN ] W-9029-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:51,585 [INFO ] epollEventLoopGroup-4-31 org.pytorch.serve.wlm.WorkerThread - 9029 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:51,586 [DEBUG] W-9029-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9029-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:51,587 [DEBUG] W-9029-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:51,855 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9028-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:51,855 [WARN ] W-9028-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:51,856 [INFO ] epollEventLoopGroup-4-28 org.pytorch.serve.wlm.WorkerThread - 9028 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:51,857 [DEBUG] W-9028-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9028-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:51,858 [DEBUG] W-9028-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:52,124 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9027-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:52,125 [WARN ] W-9027-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:52,126 [INFO ] epollEventLoopGroup-4-30 org.pytorch.serve.wlm.WorkerThread - 9027 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:52,127 [DEBUG] W-9027-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9027-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:52,129 [DEBUG] W-9027-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:52,394 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9026-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:52,394 [WARN ] W-9026-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:52,395 [INFO ] epollEventLoopGroup-4-26 org.pytorch.serve.wlm.WorkerThread - 9026 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:52,395 [DEBUG] W-9026-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9026-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:52,396 [DEBUG] W-9026-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:52,664 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9025-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:52,664 [WARN ] W-9025-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:52,665 [INFO ] epollEventLoopGroup-4-27 org.pytorch.serve.wlm.WorkerThread - 9025 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:52,665 [DEBUG] W-9025-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9025-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:52,666 [DEBUG] W-9025-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:52,932 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9024-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:52,932 [WARN ] W-9024-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:52,933 [INFO ] epollEventLoopGroup-4-24 org.pytorch.serve.wlm.WorkerThread - 9024 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:52,933 [DEBUG] W-9024-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9024-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:52,934 [DEBUG] W-9024-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:53,202 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9023-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:53,202 [WARN ] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:53,203 [INFO ] epollEventLoopGroup-4-25 org.pytorch.serve.wlm.WorkerThread - 9023 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:53,203 [DEBUG] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9023-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:53,204 [DEBUG] W-9023-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:53,359 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9022-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:53,359 [WARN ] W-9022-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:53,360 [INFO ] epollEventLoopGroup-4-17 org.pytorch.serve.wlm.WorkerThread - 9022 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:53,360 [DEBUG] W-9022-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9022-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:53,361 [DEBUG] W-9022-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:53,626 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9021-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:53,626 [WARN ] W-9021-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:53,627 [INFO ] epollEventLoopGroup-4-19 org.pytorch.serve.wlm.WorkerThread - 9021 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:53,627 [DEBUG] W-9021-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9021-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:53,628 [DEBUG] W-9021-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:53,892 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9020-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:53,893 [WARN ] W-9020-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:53,893 [INFO ] epollEventLoopGroup-4-21 org.pytorch.serve.wlm.WorkerThread - 9020 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:53,893 [DEBUG] W-9020-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9020-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:53,895 [DEBUG] W-9020-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:54,158 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9019-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:54,158 [WARN ] W-9019-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:54,159 [INFO ] epollEventLoopGroup-4-18 org.pytorch.serve.wlm.WorkerThread - 9019 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:54,159 [DEBUG] W-9019-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9019-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:54,159 [DEBUG] W-9019-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:54,427 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9018-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:54,428 [INFO ] epollEventLoopGroup-4-4 org.pytorch.serve.wlm.WorkerThread - 9018 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:54,429 [DEBUG] W-9018-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Shutting down the thread .. Scaling down.
2020-03-07 01:58:54,429 [DEBUG] W-9018-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9018-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:54,429 [DEBUG] W-9018-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:54,696 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9017-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:54,697 [WARN ] W-9017-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:54,698 [INFO ] epollEventLoopGroup-4-9 org.pytorch.serve.wlm.WorkerThread - 9017 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:54,698 [DEBUG] W-9017-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9017-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:54,698 [DEBUG] W-9017-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:54,964 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9016-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:54,965 [WARN ] W-9016-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:54,965 [INFO ] epollEventLoopGroup-4-13 org.pytorch.serve.wlm.WorkerThread - 9016 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:54,965 [DEBUG] W-9016-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9016-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:54,965 [DEBUG] W-9016-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:55,235 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9015-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:55,235 [WARN ] W-9015-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:55,235 [INFO ] epollEventLoopGroup-4-22 org.pytorch.serve.wlm.WorkerThread - 9015 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:55,235 [DEBUG] W-9015-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9015-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:55,236 [DEBUG] W-9015-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:55,389 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9014-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:55,389 [WARN ] W-9014-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:55,390 [INFO ] epollEventLoopGroup-4-5 org.pytorch.serve.wlm.WorkerThread - 9014 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:55,390 [DEBUG] W-9014-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9014-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:55,390 [DEBUG] W-9014-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:55,654 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9013-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:55,654 [WARN ] W-9013-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:55,654 [INFO ] epollEventLoopGroup-4-8 org.pytorch.serve.wlm.WorkerThread - 9013 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:55,655 [DEBUG] W-9013-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9013-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:55,655 [DEBUG] W-9013-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:55,919 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9012-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:55,919 [WARN ] W-9012-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:55,920 [INFO ] epollEventLoopGroup-4-11 org.pytorch.serve.wlm.WorkerThread - 9012 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:55,920 [DEBUG] W-9012-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9012-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:55,920 [DEBUG] W-9012-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:56,185 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9011-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:56,185 [WARN ] W-9011-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:56,185 [INFO ] epollEventLoopGroup-4-14 org.pytorch.serve.wlm.WorkerThread - 9011 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:56,186 [DEBUG] W-9011-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9011-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:56,187 [DEBUG] W-9011-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:56,449 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9010-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:56,449 [WARN ] W-9010-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:56,449 [INFO ] epollEventLoopGroup-4-20 org.pytorch.serve.wlm.WorkerThread - 9010 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:56,450 [DEBUG] W-9010-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9010-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:56,450 [DEBUG] W-9010-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:56,712 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9009-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:56,712 [INFO ] epollEventLoopGroup-4-15 org.pytorch.serve.wlm.WorkerThread - 9009 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:56,712 [WARN ] W-9009-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:56,713 [DEBUG] W-9009-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9009-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:56,713 [DEBUG] W-9009-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:56,976 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9008-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:56,976 [INFO ] epollEventLoopGroup-4-1 org.pytorch.serve.wlm.WorkerThread - 9008 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:56,976 [WARN ] W-9008-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:56,977 [DEBUG] W-9008-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9008-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:56,977 [DEBUG] W-9008-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:57,240 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9007-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:57,240 [INFO ] epollEventLoopGroup-4-23 org.pytorch.serve.wlm.WorkerThread - 9007 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:57,240 [WARN ] W-9007-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:57,240 [DEBUG] W-9007-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9007-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:57,240 [DEBUG] W-9007-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:57,391 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9006-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:57,391 [INFO ] epollEventLoopGroup-4-2 org.pytorch.serve.wlm.WorkerThread - 9006 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:57,391 [WARN ] W-9006-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:57,391 [DEBUG] W-9006-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9006-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:57,392 [DEBUG] W-9006-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:57,655 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9005-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:57,656 [INFO ] epollEventLoopGroup-4-6 org.pytorch.serve.wlm.WorkerThread - 9005 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:57,656 [WARN ] W-9005-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:57,656 [DEBUG] W-9005-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9005-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:57,656 [DEBUG] W-9005-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:57,919 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9004-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:57,919 [INFO ] epollEventLoopGroup-4-16 org.pytorch.serve.wlm.WorkerThread - 9004 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:57,919 [DEBUG] W-9004-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Shutting down the thread .. Scaling down.
2020-03-07 01:58:57,920 [DEBUG] W-9004-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9004-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:57,920 [DEBUG] W-9004-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:58,182 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9003-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:58,182 [WARN ] W-9003-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:58,182 [INFO ] epollEventLoopGroup-4-7 org.pytorch.serve.wlm.WorkerThread - 9003 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:58,183 [DEBUG] W-9003-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9003-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:58,183 [DEBUG] W-9003-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:58,443 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9002-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:58,443 [WARN ] W-9002-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker thread exception.
java.lang.IllegalMonitorStateException
at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:151)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1261)
at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)
at org.pytorch.serve.wlm.Model.pollBatch(Model.java:175)
at org.pytorch.serve.wlm.BatchAggregator.getRequest(BatchAggregator.java:33)
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-03-07 01:58:58,443 [INFO ] epollEventLoopGroup-4-3 org.pytorch.serve.wlm.WorkerThread - 9002 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:58,444 [DEBUG] W-9002-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9002-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:58,444 [DEBUG] W-9002-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:58,706 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9001-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:58,706 [DEBUG] W-9001-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Shutting down the thread .. Scaling down.
2020-03-07 01:58:58,706 [INFO ] epollEventLoopGroup-4-10 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:58,707 [DEBUG] W-9001-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:58,707 [DEBUG] W-9001-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:58,968 [DEBUG] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet-18_1.0 State change WORKER_MODEL_LOADED -> WORKER_SCALED_DOWN
2020-03-07 01:58:58,968 [DEBUG] W-9000-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Shutting down the thread .. Scaling down.
2020-03-07 01:58:58,968 [INFO ] epollEventLoopGroup-4-12 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_SCALED_DOWN
2020-03-07 01:58:58,968 [DEBUG] W-9000-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-resnet-18_1.0 State change WORKER_SCALED_DOWN -> WORKER_STOPPED
2020-03-07 01:58:58,969 [DEBUG] W-9000-resnet-18_1.0 org.pytorch.serve.wlm.WorkerThread - Worker terminated due to scale-down call.
2020-03-07 01:58:59,344 [INFO ] epollEventLoopGroup-3-17 org.pytorch.serve.wlm.ModelManager - Model resnet-18 unregistered.

Corresponding GPU / CPU utilizations via Cloudwatch (using the gpumon.py script from AWS)
Screen Shot 2020-03-06 at 6 38 39 PM

@mycpuorg
Copy link
Collaborator

mycpuorg commented Mar 8, 2020

@dhaniram-kshirsagar can you attach your benchmark report please?

@mycpuorg mycpuorg added the enhancement New feature or request label Mar 8, 2020
@harshbafna
Copy link
Contributor

@chauhang : The IllegalMonitorState exceptions are observed while unregistering a model. In this case the exceptions are observed for each worker when benchmark script unregisters the resnet-18 model which is used for running the benchmark tests.

We will look into the metrics logging related issue.

@chauhang
Copy link
Contributor Author

chauhang commented Mar 9, 2020

@dhaniram-kshirsagar can you attach your benchmark report please?

Please see the s3 url for the Jmeter reports.

@chauhang
Copy link
Contributor Author

chauhang commented Mar 9, 2020

@chauhang : The IllegalMonitorState exceptions are observed while unregistering a model. In this case the exceptions are observed for each worker when benchmark script unregisters the resnet-18 model which is used for running the benchmark tests.

We will look into the metrics logging related issue.

@harshbafna This looks like a bug fix. Can you explain why it is marked as Enhancement?

@mycpuorg mycpuorg added bug Something isn't working and removed enhancement New feature or request labels Mar 11, 2020
@mycpuorg
Copy link
Collaborator

It's a bug.

@lokeshgupta1975 lokeshgupta1975 added this to High priority in IssuesBacklog Mar 12, 2020
@fbbradheintz fbbradheintz changed the title IllegalMonitorState Exceptions on running benchmark tests on GPU machine IllegalMonitorState Exceptions on running benchmark tests Mar 12, 2020
@fbbradheintz
Copy link
Contributor

This happens on CPU as well.

@harshbafna
Copy link
Contributor

@chauhang, @fbbradheintz :

As indicated in the previous comment the IllegalMonitorState exception is expected behavior in the backend logs when a model is unregistered and workers are scaled down.

The Requests2XX, Requests4XX and Requests5XX metric logs are logged per request and does not represents a collective count for all the requests.

We will update the system metric documentation and add the following description :

Metric Name Dimension Unit Semantics
Requests2XX host count logged for every request responded in 200-300 status code range
Requests4XX host count logged for every request responded in 400-500 status code range
Requests5XX host count logged for every request responded with status code above 500

@harshbafna
Copy link
Contributor

Update documentation in following commit in stage_release branch : 410311e

@harshbafna harshbafna added staged_release merged to stage_release triaged_wait Waiting for the Reporter's resp labels Mar 13, 2020
@fbbradheintz
Copy link
Contributor

Removing the launch blocker tags for this, since the benchmark still runs. I'm not closing it, though, because we don't know whether this condition affects subsequent benchmarks (e.g., by not releasing system resources).

@fbbradheintz fbbradheintz removed the triaged_wait Waiting for the Reporter's resp label Mar 16, 2020
@harshbafna harshbafna self-assigned this Mar 17, 2020
@harshbafna
Copy link
Contributor

PR #172 fixes the exception trace logged on worker scale down / model unregistration.

@fbbradheintz fbbradheintz moved this from To do to In Testing in TorchServe v0.1.1 Issues Lifecycle May 1, 2020
@fbbradheintz fbbradheintz added triaged_wait Waiting for the Reporter's resp and removed staged_release merged to stage_release triaged_wait Waiting for the Reporter's resp labels May 1, 2020
@fbbradheintz fbbradheintz moved this from In Testing to In progress in TorchServe v0.1.1 Issues Lifecycle May 1, 2020
@fbbradheintz fbbradheintz moved this from In progress to In Testing in TorchServe v0.1.1 Issues Lifecycle May 2, 2020
@chauhang
Copy link
Contributor Author

@harshbafna Please see comments in the PR and see if no other side effects are caused due to removing code for terminating the child processes.

@chauhang chauhang moved this from In Testing to Changes requested in TorchServe v0.1.1 Issues Lifecycle May 10, 2020
@harshbafna harshbafna linked a pull request Jun 1, 2020 that will close this issue
2 tasks
@maaquib maaquib closed this as completed Jun 9, 2020
IssuesBacklog automation moved this from High priority to Closed Jun 9, 2020
TorchServe v0.1.1 Issues Lifecycle automation moved this from Changes requested to Done Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants