Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton naming #747

Merged
merged 3 commits into from Jun 6, 2020
Merged

Triton naming #747

merged 3 commits into from Jun 6, 2020

Conversation

deadeyegoodwin
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:


@k8s-ci-robot
Copy link
Contributor

Hi @deadeyegoodwin. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@deadeyegoodwin
Copy link
Contributor Author

/assign @ellis-bigelow

@deadeyegoodwin
Copy link
Contributor Author

@ellis-bigelow @rakelkar
I'm not sure I got all the references or fixed them correctly. There are other mentions of TensorRT but some seem related to integration with TF and those should stay as TensorRT. It seems like TensorRTSpec should be renamed but not sure.

@deadeyegoodwin
Copy link
Contributor Author

There is also a related PR for KubeFlow website: kubeflow/website#1846

@kubeflow-bot
Copy link

This change is Reviewable

docs/samples/triton/README.md Outdated Show resolved Hide resolved
docs/samples/triton/README.md Outdated Show resolved Hide resolved
@yuzisun
Copy link
Member

yuzisun commented Mar 30, 2020

/ok-to-test

@ellistarn
Copy link
Collaborator

Yeah we're definitely missing changes for https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha2/framework_tensorrt.go
I'd run make test after making the changes.

@yuzisun
Copy link
Member

yuzisun commented Mar 30, 2020

Yeah we're definitely missing changes for https://github.com/kubeflow/kfserving/blob/master/pkg/apis/serving/v1alpha2/framework_tensorrt.go
I'd run make test after making the changes.

Good point, gonna add a e2e test in my other triton example PR.

@deadeyegoodwin
Copy link
Contributor Author

So do I need to make changes to framework_tensorrt.go, or @yuzisun are you going to handle that in your followup PR?

@yuzisun
Copy link
Member

yuzisun commented Apr 1, 2020

So do I need to make changes to framework_tensorrt.go, or @yuzisun are you going to handle that in your followup PR?

@deadeyegoodwin I can take care of that in my PR, once I am done we can merge together.

@rakelkar
Copy link
Contributor

rakelkar commented Apr 2, 2020

Not sure this change makes sense to me? In my understanding KFServing "Frameworks" are ML frameworks - which inference server we use to inference them should be an implementation detail.

Triton is an inference server, and we should certainly update the TensorRT framework to use it (and possibly some other or all other frameworks to use it as well e.g. #744 ) but I don't think Triton is an ML framework.

So it seems logical to me to include TensorRT as a framework rather (assuming NVIDIA cares enough to support it) and consider updating the TensorRT sample to use an actual TRT model.plan instead of TF as it currently does.

@ellis-bigelow @cliveseldon @animeshsingh @yuzisun

/hold

@yuzisun
Copy link
Member

yuzisun commented Apr 2, 2020

@rakelkar I think currently the way we organize is really by inference server, maybe the naming is a bit confusing which sounds like ML framework. e.g when you use onnx the actually model can be pytorch but served with onnxruntime; when you use pytorch it is really using KFServing maintained pytorch server to serve the pytorch model. Are we suggesting we should make it two levels, like user first specify ML framework and then choose which inference server to serve the model ?

Predictor:
   pytorch:
       storageUri: XXX
       inferenceServer: triton/onnx/kfserving // A list choices which can serve pytorch model

@ellistarn
Copy link
Collaborator

ellistarn commented Apr 2, 2020

I agree with Rakesh's point. QQ though, is the TensorRT name unchanged re: the framework?

@ellistarn
Copy link
Collaborator

@yuzisun , I'd lean away from specifying inferenceserver in framework. This multilayered model seems more complex than it's worth.

@rakelkar
Copy link
Contributor

rakelkar commented Apr 2, 2020

@yuzisun when we say ONNX - it has to be a model serialized in ONNX format.. it doesnt matter how you got to that point... and from user perspective it doesnt matter whether you use the C++ ONNXServer for inference or your own Python Server that hosts the ONNX runtime... that is a serving infra decision...

So IMO framework maps directly to the serialized model format...

@rakelkar
Copy link
Contributor

rakelkar commented Apr 2, 2020

@yuzisun , I'd lean away from specifying inferenceserver in framework. This multilayered model seems more complex than it's worth.

I agree with Rakesh's point. QQ though, is the TensorRT name unchanged re: the framework?

I believe they only changed the name of the inference server (which makes sense since it supports multiple model formats)...

@yuzisun
Copy link
Member

yuzisun commented Apr 2, 2020

@yuzisun , I'd lean away from specifying inferenceserver in framework. This multilayered model seems more complex than it's worth.

I agree with Rakesh's point. QQ though, is the TensorRT name unchanged re: the framework?

I believe they only changed the name of the inference server (which makes sense since it supports multiple model formats)...

So when user want to serve a tensorflow model with triton inference server, how do we specify that on our spec?

@rakelkar
Copy link
Contributor

rakelkar commented Apr 2, 2020

like user first specify ML framework and then choose which inference server to serve the model

btw you can achieve this today by changing the configmap to point to a different inference server for the framework - but it is platform-wide opinion and not per-model... I guess our stance is the platform chooses for the 90% ... and edge cases are covered via custom? This is what cloud providers do too btw (e.g. sagemaker)

wdyt?

@deadeyegoodwin
Copy link
Contributor Author

TensorRT "inference accelerator" is still called "TensorRT": https://developer.nvidia.com/tensorrt

@yuzisun
Copy link
Member

yuzisun commented Apr 2, 2020

like user first specify ML framework and then choose which inference server to serve the model

btw you can achieve this today by changing the configmap to point to a different inference server for the framework - but it is platform-wide opinion and not per-model... I guess our stance is the platform chooses for the 90% ... and edge cases are covered via custom? This is what cloud providers do too btw (e.g. sagemaker)

wdyt?

I would agree with you, but we are not there yet I think, also currently it is not simply a change in configmap with the image for a given ML framework, the startup command and parameters are pretty hard coded for a given inference server in the controller. Even that could change in the future I would still think giving user the ability to choose which inference server to use per model is needed, like for pytorch what if the model can not be converted to onnx and user just want to serve on a simple KFServing python server without going custom route.

@rakelkar
Copy link
Contributor

rakelkar commented Apr 2, 2020

model can not be converted to onnx and user just want to serve on a simple KFServing python server without going custom route

But for that we already support pytorch as a framework and have a PYT server (that loads native PYT serialized models and does not require ONNX)?

@yuzisun
Copy link
Member

yuzisun commented Apr 3, 2020

model can not be converted to onnx and user just want to serve on a simple KFServing python server without going custom route

But for that we already support pytorch as a framework and have a PYT server (that loads native PYT serialized models and does not require ONNX)?

ya just making an example, or a more relevant question regarding to triton inference server is that since it is a multi-framework inference server how can we allow user to serve tensorflow/pytorch model on triton inference server by leveraging their auto-batching, multi-model co-hosting features without having to convert to the accelerated model format TensorRT. Currently the only way to use triton inference server it to put under tensorrt spec but it is confusing and the model actually may not be in tensorRT format, how do you think we can proceed from here?

@ellistarn
Copy link
Collaborator

ellistarn commented Apr 3, 2020

I think model server might be a more accurate descriptor. If it's capable of serving many different model formats, triton definitely seems more clear than tensorrt to me.

The only place I hesitate, is I don't want to change xgboost or sklearn to have a server suffix, nor do I want to remove them. Those specs are framework-based. Can we simply treat triton as a slight exception to the framework rule for now and we can resolve this more holistically once we have a clearer understanding of where the industry is moving with model servers.

@rakelkar
Copy link
Contributor

rakelkar commented Apr 3, 2020

Can we simply treat triton as a slight exception to the framework rule for now and we can resolve this more holistically once we have a clearer understanding of where the industry is moving with model servers

Sounds fair!

/unhold

@yuzisun
Copy link
Member

yuzisun commented Jun 2, 2020

oh needs to move the test model from gs://kfserving-samples/models/tensorrt to gs://kfserving-samples/models/triton

@ellistarn
Copy link
Collaborator

Are we planning another 0.3.x release before the v1beta1 API changes? Much of this renaming will happen there.

@edenbuaa
Copy link
Contributor

edenbuaa commented Jun 4, 2020

Are we planning another 0.3.x release before the v1beta1 API changes? Much of this renaming will happen there.

Also want to know.

@yuzisun yuzisun changed the title WIP: Triton naming Triton naming Jun 5, 2020
@yuzisun
Copy link
Member

yuzisun commented Jun 5, 2020

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yuzisun
Copy link
Member

yuzisun commented Jun 6, 2020

/retest

@k8s-ci-robot k8s-ci-robot merged commit 23e2a03 into kserve:master Jun 6, 2020
2 of 3 checks passed
@edenbuaa
Copy link
Contributor

edenbuaa commented Jun 9, 2020

@yuzisun Could the kfserving v0.3.0 support triton server now?

ellistarn pushed a commit to ellistarn/kfserving that referenced this pull request Jul 28, 2020
* Triton renaming

* Update docs/samples/triton/simple_string/triton.yaml

Co-authored-by: Dan Sun <dsun20@bloomberg.net>

* Update test_triton.py

Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants