TorchServe fails to start multiple workers threads on multiple GPUs with large model. #71

fbbradheintz · 2020-02-26T17:41:33Z

On a c5.12xlarge instance, I was able to run 16 instances of the FairSeq English-to-German translation model, all simultaneously running translations. This model's weights take up about 2.5GB on disk (though its resident footprint in memory seems smaller).

Attempting a similar feat on a p3.8xlarge turned out to be impossible. I could get a single instance running, but if I attempted to get even 4 workers running, they crash repeatedly with OOMEs:

2020-02-26 02:46:34,454 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Torch worker started.
2020-02-26 02:46:34,454 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Python runtime: 3.6.6
2020-02-26 02:46:34,454 [DEBUG] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-fairseq_model_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2020-02-26 02:46:34,454 [INFO ] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9001
2020-02-26 02:46:34,455 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Connection accepted: /tmp/.ts.sock.9001.
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Backend worker process die.
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Traceback (most recent call last):
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/model_service_worker.py", line 163, in <module>
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     worker.run_server()
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/model_service_worker.py", line 141, in run_server
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     self.handle_connection(cl_socket)
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/model_service_worker.py", line 105, in handle_connection
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service, result, code = self.load_model(msg)
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/model_service_worker.py", line 83, in load_model
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     service = model_loader.load(model_name, model_dir, handler, gpu, batch_size)
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ts/model_loader.py", line 107, in load
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     entry_point(None, service.context)
2020-02-26 02:46:38,734 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/fa2ad7fda70376da33595966d0cf3c38702ea6d1/fairseq_handler.py", line 120, in handle
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     raise e
2020-02-26 02:46:38,735 [INFO ] epollEventLoopGroup-4-15 org.pytorch.serve.wlm.WorkerThread - 9001 Worker disconnected. WORKER_STARTED
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/fa2ad7fda70376da33595966d0cf3c38702ea6d1/fairseq_handler.py", line 109, in handle
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     _service.initialize(context)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/tmp/models/fa2ad7fda70376da33595966d0cf3c38702ea6d1/fairseq_handler.py", line 73, in initialize
2020-02-26 02:46:38,735 [DEBUG] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:128)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     state_dict = torch.load(model_pt_path, map_location=self.device)
2020-02-26 02:46:38,735 [WARN ] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: fairseq_model, error: Worker died.
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 529, in load
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
2020-02-26 02:46:38,735 [DEBUG] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-fairseq_model_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 702, in _legacy_load
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     result = unpickler.load()
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 665, in persistent_load
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     deserialized_objects[root_key] = restore_location(obj, location)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 21 seconds.
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 740, in restore_location
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     return default_restore_location(storage, str(map_location))
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 156, in default_restore_location
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     result = fn(storage, location)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 136, in _cuda_deserialize
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     return storage_type(obj.size())
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -   File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__.py", line 480, in _lazy_new
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle -     return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)

On digging through the logs, it appears that it's attempting to start all workers on the same GPU. The following is the output of grep GPU ts_log.log:

2020-02-26 02:45:13,375 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 3.03 GiB already allocated; 19.12 MiB free; 3.03 GiB reserved in total by PyTorch)
2020-02-26 02:45:13,375 [INFO ] W-9000-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 3.16 GiB already allocated; 19.12 MiB free; 3.16 GiB reserved in total by PyTorch)
2020-02-26 02:45:13,383 [INFO ] W-9003-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 2.60 GiB already allocated; 19.12 MiB free; 2.60 GiB reserved in total by PyTorch)
2020-02-26 02:45:26,382 [INFO ] W-9003-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 15.75 GiB total capacity; 3.05 GiB already allocated; 19.12 MiB free; 3.05 GiB reserved in total by PyTorch)
2020-02-26 02:45:26,519 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 15.75 GiB total capacity; 2.47 GiB already allocated; 19.12 MiB free; 2.47 GiB reserved in total by PyTorch)
2020-02-26 02:45:38,899 [INFO ] W-9003-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 2.89 GiB already allocated; 7.12 MiB free; 2.89 GiB reserved in total by PyTorch)
2020-02-26 02:45:38,904 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.75 GiB total capacity; 2.65 GiB already allocated; 7.12 MiB free; 2.65 GiB reserved in total by PyTorch)
2020-02-26 02:45:52,201 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 15.75 GiB total capacity; 3.05 GiB already allocated; 19.12 MiB free; 3.05 GiB reserved in total by PyTorch)
2020-02-26 02:45:59,593 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)
2020-02-26 02:46:08,962 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)
2020-02-26 02:46:21,358 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)
2020-02-26 02:46:38,735 [INFO ] W-9001-fairseq_model_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - RuntimeError: CUDA out of memory. Tried to allocate 802.00 MiB (GPU 0; 15.75 GiB total capacity; 1.69 GiB already allocated; 605.12 MiB free; 1.69 GiB reserved in total by PyTorch)

Note that multiple workers (W-9000, W-9001, W-9003) are shown, but only one GPU turns up (GPU 0). The p3.8xlarge has 4 GPUs.

I attempted to use arguments of the Management API, such as number_gpu=4, to fix this, but nothing worked. Same result every time.

The text was updated successfully, but these errors were encountered:

mycpuorg · 2020-02-26T18:02:21Z

We only do a round robin distribution of workers on GPUs:
https://github.com/pytorch/serve/blob/master/frontend/server/src/main/java/org/pytorch/serve/wlm/WorkLoadManager.java#L152

Looks like it needs testing with large models.

fbbradheintz · 2020-02-26T18:37:05Z

It would be helpful for testing and diagnosis if we also got logging when a worker successfully attached to a GPU - right now I can only see the intended device target when there's a failure. There's no way to verify correct behavior from the logs.

mycpuorg · 2020-02-28T02:41:20Z

@fbbradheintz Can you share the model archive you tested this with?

fbbradheintz · 2020-02-28T19:43:49Z

The files are very large. I'll be a while uploading.

fbbradheintz · 2020-02-28T19:58:53Z

@mycpuorg - Your email inbox should contain a Dropbox link to a folder containing:

CPU and GPU copies of the Fairseq en->de translation model
Code for this model
A model handler
Sample input
A README with the command line I used for testing.

Please let me know if there's anything else you need in relation to this.

harshbafna · 2020-03-03T14:30:56Z

We are currently analyzing this issue with the shared mar file.

harshbafna · 2020-03-03T17:24:02Z

@fbbradheintz :

We need to specify the gpu id while creating the torch.device object so that every worker uses a different GPU. Following small change in your custom model handler resolves the issue :

Set device as None in the handler's __init__ method :

self.device = None

Add following line in your handler's initialize method. This will allow every worker to use different GPU device:

self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

After the above changes I was able to register the fairseq model with TorchServe.

ubuntu@ip-172-31-13-12:~$ curl http://127.0.0.1:8081/models/fairseq_model
[
  {
    "modelName": "fairseq_model",
    "modelVersion": "1.0",
    "modelUrl": "fairseq_gpu.mar",
    "runtime": "python3",
    "minWorkers": 4,
    "maxWorkers": 4,
    "batchSize": 1,
    "maxBatchDelay": 100,
    "loadedAtStartup": true,
    "workers": [
      {
        "id": "9000",
        "startTime": "2020-03-03T17:03:04.999Z",
        "status": "READY",
        "gpu": true,
        "memoryUsage": 0
      },
      {
        "id": "9001",
        "startTime": "2020-03-03T17:03:05.001Z",
        "status": "READY",
        "gpu": true,
        "memoryUsage": 0
      },
      {
        "id": "9002",
        "startTime": "2020-03-03T17:03:05.001Z",
        "status": "READY",
        "gpu": true,
        "memoryUsage": 0
      },
      {
        "id": "9003",
        "startTime": "2020-03-03T17:03:05.002Z",
        "status": "READY",
        "gpu": true,
        "memoryUsage": 0
      }
    ]
  }
]

Could you please try the above changes and confirm?

Note: We have updated the MNIST example's custom handler to use similar change as well in "stage_release" branch.

chauhang · 2020-03-04T01:24:01Z

@harshbafna Is this the java properties or the context.system_properties? It will help to update one of the examples to use gpus using the proposed sol

harshbafna · 2020-03-04T02:31:56Z

@chauhang : It is contex.system_properties. As indicated in the previous comment. We have already updated the custom handler example (MNIST digit classifier) in "stage_release" branch

https://github.com/pytorch/serve/blob/stage_release/examples/image_classifier/mnist/mnist_handler.py

All the default handlers have been updated to use the same as well.

fbbradheintz · 2020-03-05T23:16:31Z

I can confirm that this works with the changes prescribed by @harshbafna - thank you for the added guidance.

The thing I don't see is documentation. If we don't tell people about this quirk in custom handlers, they're going to the same thing I did, and file a bug like I did.

Please document this, preferably in examples/image_classifier/mnist/README.md, and then we can close this after merge.

harshbafna · 2020-03-06T14:02:16Z

@fbbradheintz ,
Updated documentation in stage_release branch. Instead of adding the details in mnist/README we have added the details in custom_service documentation and referenced the mnist_handler there.

Could you please review the following commit : 7a029ef

fbbradheintz · 2020-03-09T18:58:25Z

Looks good! Please close after merging.

Stage release - Merge Periodic Merge From Stage Release Branch. Listing the highlights of this merge below. When things settle down a bit I expect this will slow down and we can give a merge a new rc version. Highlights: Fixed: Benchmarks have dependency on Mxnet #72 TorchServe fails to start multiple workers threads on multiple GPUs with large model #71 Java concurrency crash when attempting batch processing #66 Add handler for audio models with an example #60 Undocumented options for config.properties #55 Can't log custom metrics #53 Add example for Custom Service #49 WIP: benchmark dependencies install script failing on fresh ubuntu 18.04 #36 Unable to install on GPU machine #30 Incorrect docs for --model-store option #23

okgrammer · 2020-04-27T04:45:34Z

Hi @fbbradheintz, I am interested in deploying my Fairseq models with TorchServe. Would it be possible to share some examples or pointers that you used above? I noticed that in the MNIST example, custom model file is given but it is only a single file. What should be the correct way to do it when we have multiple files for the model definition like in Fairseq?

harshbafna · 2020-04-27T05:02:30Z

@okgrammer, you can supply all the other dependency files using --extra-files param as a comma separated list while creating the model-archive.

You can refer the image_segmenter example, where it supplies multiple .py files using --extra-files parameter while creating the model-archive.

fbbradheintz changed the title ~~TorchServe fails to start multiple workers threads on GPU with large model.~~ TorchServe fails to start multiple workers threads on multiple GPUs with large model. Feb 27, 2020

fbbradheintz added launch blocker bug Something isn't working labels Feb 27, 2020

harshbafna added the triaged_wait Waiting for the Reporter's resp label Mar 3, 2020

mycpuorg added the staged_release merged to stage_release label Mar 3, 2020

mycpuorg mentioned this issue Mar 3, 2020

Avoid reading GPU ID from properties for RR worker assignment #79

Closed

mycpuorg added this to the Experimental_Launch milestone Mar 6, 2020

fbbradheintz added close_after_merge and removed triaged_wait Waiting for the Reporter's resp labels Mar 9, 2020

lokeshgupta1975 added this to High priority in IssuesBacklog Mar 12, 2020

mycpuorg mentioned this issue Mar 12, 2020

Stage release - Merge #96

Merged

mycpuorg closed this as completed Mar 12, 2020

IssuesBacklog automation moved this from High priority to Closed Mar 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchServe fails to start multiple workers threads on multiple GPUs with large model. #71

TorchServe fails to start multiple workers threads on multiple GPUs with large model. #71

fbbradheintz commented Feb 26, 2020

mycpuorg commented Feb 26, 2020

fbbradheintz commented Feb 26, 2020 •

edited

mycpuorg commented Feb 28, 2020

fbbradheintz commented Feb 28, 2020

fbbradheintz commented Feb 28, 2020

harshbafna commented Mar 3, 2020

harshbafna commented Mar 3, 2020 •

edited

chauhang commented Mar 4, 2020

harshbafna commented Mar 4, 2020

fbbradheintz commented Mar 5, 2020

harshbafna commented Mar 6, 2020

fbbradheintz commented Mar 9, 2020

okgrammer commented Apr 27, 2020

harshbafna commented Apr 27, 2020

TorchServe fails to start multiple workers threads on multiple GPUs with large model. #71

TorchServe fails to start multiple workers threads on multiple GPUs with large model. #71

Comments

fbbradheintz commented Feb 26, 2020

mycpuorg commented Feb 26, 2020

fbbradheintz commented Feb 26, 2020 • edited

mycpuorg commented Feb 28, 2020

fbbradheintz commented Feb 28, 2020

fbbradheintz commented Feb 28, 2020

harshbafna commented Mar 3, 2020

harshbafna commented Mar 3, 2020 • edited

chauhang commented Mar 4, 2020

harshbafna commented Mar 4, 2020

fbbradheintz commented Mar 5, 2020

harshbafna commented Mar 6, 2020

fbbradheintz commented Mar 9, 2020

okgrammer commented Apr 27, 2020

harshbafna commented Apr 27, 2020

fbbradheintz commented Feb 26, 2020 •

edited

harshbafna commented Mar 3, 2020 •

edited