You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I remember, the MMS allows dynamic batching: the method for processing instances always gets an array of instances.
Depending on the configuration, if the server receives more than BATCHSIZE requests within a configurable timespan, then these requess are dynamically collected into batches, run through the model and returned individually again.
This is a crucial feature for models where running single instances through the model is highly inefficient.
I could not figure out if/how this is supported by torch serve already, and I could not find anything in the documentation about this either.
Could somebody confirm that this is actually missing in torch serve or tell me where to find information about it if it is already implemented?
The text was updated successfully, but these errors were encountered:
Hi @johann-petrak this PR should have instructions to do this #1125 which should get merged soon - let me know if this is indeed what you were looking for
Thanks @msaroufim this looks exactly like the thing I was looking for! Looking forward to this getting merged and released as the component I was originally looking at is the AWS SageMaker Pytorch inference container which depends on torch serve.
Torch serve mentions it is derived from the Multi Model Server https://github.com/awslabs/multi-model-server
As far as I remember, the MMS allows dynamic batching: the method for processing instances always gets an array of instances.
Depending on the configuration, if the server receives more than BATCHSIZE requests within a configurable timespan, then these requess are dynamically collected into batches, run through the model and returned individually again.
This is a crucial feature for models where running single instances through the model is highly inefficient.
I could not figure out if/how this is supported by torch serve already, and I could not find anything in the documentation about this either.
Could somebody confirm that this is actually missing in torch serve or tell me where to find information about it if it is already implemented?
The text was updated successfully, but these errors were encountered: