Native Support for KServe Open Inference REST/gRPC Protocol #2373

yuzisun · 2023-05-31T08:19:57Z

🚀 The feature

KServe now has rebranded the v2 inference protocol to open inference protocol specification, can we implement OIP natively in torchserve like other model servers such as Triton, MLServer, OpenVino, AMD Inference Serve ?

Motivation, pitch

Currently torchserve utilizes the KServe python server in front of the TorchServe Netty server to adapt to the KServe v1, v2 REST protocol. However, I think this extra layer provides minimal value and cause numerous issues for maintenance and performance. The KServe python SDK is primarily designed for native python inference runtimes when user wants to implement arbitrary inference code with pre/post processing. Similarly, TorchServe provides comparable custom handlers. Therefore, there is no good reason why we need both and route all the kserve inference requests through kserve python server -> Netty -> torchserve python worker.

Alternatives

No response

Additional context

Can we remove the kserve python wrapper all together?
Seems like we are able to send the kserve inference request directly to Netty server as v1/v2 REST requests are handled here
for gRPC I guess we need to implement the OIP grpc specification natively
here

The text was updated successfully, but these errors were encountered:

yuzisun · 2023-06-14T00:21:43Z

@msaroufim Can we setup a call with you to discuss this ?

msaroufim · 2023-06-14T00:29:37Z

Hi @yuzisun sure! I'd be happy to, just forwarded this to the core team lemme get back to you with a few times that work

Might be easiest to email me its marksaroufim@meta.com

gavrissh · 2023-06-14T06:18:48Z

Quick suggestion - During my experiments I noticed that in torchserve with KServe v1, v2 REST protocol, we cannot use dynamic batching done by the Netty server. This is causing a performance difference as well compared to using raw inputs.
Batching support would be ideal. Thanks!

yuzisun · 2023-06-14T06:31:57Z

Make sense, the goal of this issue is to totally remove the kserve wrapper and implement OIP natively with TorchServe. @gavrishp I think it is possible to just disable the kserve wrapper and send requests directly to Netty server using KServe v1, v2 REST protocol, can you try that ?

yuzisun changed the title ~~Native Support for KServe Open Inference Protocol for REST/gRPC~~ Native Support for KServe Open Inference REST/gRPC Protocol May 31, 2023

msaroufim added kubernetes triaged Issue has been reviewed and triaged labels May 31, 2023

yuzisun mentioned this issue Jun 14, 2023

Kserve with TorchServe Performance is very slow compared to standalone kserve/kserve#2983

Open

andyi2it mentioned this issue Sep 21, 2023

Open Inference Protocol Implementation. #2609

Merged

10 tasks

lxning closed this as completed in #2609 Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Support for KServe Open Inference REST/gRPC Protocol #2373

Native Support for KServe Open Inference REST/gRPC Protocol #2373

yuzisun commented May 31, 2023 •

edited

Loading

yuzisun commented Jun 14, 2023

msaroufim commented Jun 14, 2023 •

edited

Loading

gavrissh commented Jun 14, 2023

yuzisun commented Jun 14, 2023 •

edited

Loading

Native Support for KServe Open Inference REST/gRPC Protocol #2373

Native Support for KServe Open Inference REST/gRPC Protocol #2373

Comments

yuzisun commented May 31, 2023 • edited Loading

🚀 The feature

Motivation, pitch

Alternatives

Additional context

yuzisun commented Jun 14, 2023

msaroufim commented Jun 14, 2023 • edited Loading

gavrissh commented Jun 14, 2023

yuzisun commented Jun 14, 2023 • edited Loading

yuzisun commented May 31, 2023 •

edited

Loading

msaroufim commented Jun 14, 2023 •

edited

Loading

yuzisun commented Jun 14, 2023 •

edited

Loading