Dynamic batching? #1132

johann-petrak · 2021-06-19T20:50:03Z

Torch serve mentions it is derived from the Multi Model Server https://github.com/awslabs/multi-model-server

As far as I remember, the MMS allows dynamic batching: the method for processing instances always gets an array of instances.
Depending on the configuration, if the server receives more than BATCHSIZE requests within a configurable timespan, then these requess are dynamically collected into batches, run through the model and returned individually again.

This is a crucial feature for models where running single instances through the model is highly inefficient.

I could not figure out if/how this is supported by torch serve already, and I could not find anything in the documentation about this either.

Could somebody confirm that this is actually missing in torch serve or tell me where to find information about it if it is already implemented?

msaroufim · 2021-06-20T20:32:57Z

Hi @johann-petrak this PR should have instructions to do this #1125 which should get merged soon - let me know if this is indeed what you were looking for

johann-petrak · 2021-06-20T21:05:18Z

Thanks @msaroufim this looks exactly like the thing I was looking for! Looking forward to this getting merged and released as the component I was originally looking at is the AWS SageMaker Pytorch inference container which depends on torch serve.

msaroufim added the triaged_wait Waiting for the Reporter's resp label Jun 20, 2021

msaroufim self-assigned this Jun 21, 2021

msaroufim closed this as completed Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic batching? #1132

Dynamic batching? #1132

johann-petrak commented Jun 19, 2021

msaroufim commented Jun 20, 2021

johann-petrak commented Jun 20, 2021

Dynamic batching? #1132

Dynamic batching? #1132

Comments

johann-petrak commented Jun 19, 2021

msaroufim commented Jun 20, 2021

johann-petrak commented Jun 20, 2021