Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic batching? #1132

Closed
johann-petrak opened this issue Jun 19, 2021 · 2 comments
Closed

Dynamic batching? #1132

johann-petrak opened this issue Jun 19, 2021 · 2 comments
Assignees
Labels
triaged_wait Waiting for the Reporter's resp

Comments

@johann-petrak
Copy link

Torch serve mentions it is derived from the Multi Model Server https://github.com/awslabs/multi-model-server

As far as I remember, the MMS allows dynamic batching: the method for processing instances always gets an array of instances.
Depending on the configuration, if the server receives more than BATCHSIZE requests within a configurable timespan, then these requess are dynamically collected into batches, run through the model and returned individually again.

This is a crucial feature for models where running single instances through the model is highly inefficient.

I could not figure out if/how this is supported by torch serve already, and I could not find anything in the documentation about this either.

Could somebody confirm that this is actually missing in torch serve or tell me where to find information about it if it is already implemented?

@msaroufim
Copy link
Member

Hi @johann-petrak this PR should have instructions to do this #1125 which should get merged soon - let me know if this is indeed what you were looking for

@msaroufim msaroufim added the triaged_wait Waiting for the Reporter's resp label Jun 20, 2021
@johann-petrak
Copy link
Author

Thanks @msaroufim this looks exactly like the thing I was looking for! Looking forward to this getting merged and released as the component I was originally looking at is the AWS SageMaker Pytorch inference container which depends on torch serve.

@msaroufim msaroufim self-assigned this Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged_wait Waiting for the Reporter's resp
Projects
None yet
Development

No branches or pull requests

2 participants