Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add all methods to vertex API #192

Merged
merged 4 commits into from Mar 21, 2024
Merged

Conversation

OlivierDehaene
Copy link
Member

@drbh what do you think?

@philschmid FYI this will modify the API to:

curl 127.0.0.1:3000/vertex \
    -X POST \
    -d '{"instances": [{"type": "embed", "inputs": "I like you."}]}' \
    -H 'Content-Type: application/json'
    
 # [{"type":"embed","result":[[-0.057003036,-0.022593308, ...]] }]

@philschmid
Copy link
Member

We should not make it different from what people would get on GKE? I think we should rather make it configurable what "type" is used via env, e.g. INFERENCE_TYPE="embed", "rank",.... We will have the same question for sagemaker too.

@OlivierDehaene
Copy link
Member Author

Then how do you use the multiple routes? You just decide a start time what's the route you want to serve? I think that's an inferior solution in every aspect.

@OlivierDehaene
Copy link
Member Author

type here allow you to use all TEI functionalities with a single route. embed, embed_all, embed_sparse, tokenize...

@drbh
Copy link
Collaborator

drbh commented Mar 7, 2024

@OlivierDehaene approved and looks good but deferring to @philschmid on the best interface

drbh
drbh previously approved these changes Mar 7, 2024
Copy link
Collaborator

@drbh drbh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@OlivierDehaene
Copy link
Member Author

Not my fault all the clouds can freaking figure out that oh surprise you may want to have more than a single route on your service...

@philschmid
Copy link
Member

Then how do you use the multiple routes? You just decide a start time what's the route you want to serve? I think that's an inferior solution in every aspect.

You deploy 1 model, so you define 1 task, similar to other models, like token-classification. If you want to change, change your deployment I don't think that's an issue for companies.
It's way more confusing if payloads differen between solutions and make sure we document those correctly. I would not advise having a different request for Vertex AI and GKE. Thats not a good UX. Especially since the UX on Google Cloud to deploy is almost identical.

The single route is something all Cloud ML Services currently have with Vertex and SageMaker.

@OlivierDehaene
Copy link
Member Author

You deploy 1 model, so you define 1 task, similar to other models, like token-classification. If you want to change, change your deployment I don't think that's an issue for companies.

Except that's not the case. Some users use both /embed and /embed_all with one deployment. A lot of users use the method for the deployment and /tokenize at the same time.

I would not advise having a different request for Vertex AI and GKE

Is it not already the case? Does GKE use the /vertex route? If not they don't have the instances shenanigans (@drbh is it mandatory to wrap the payload in instances?).

The single route is something all Cloud ML Services currently have with Vertex and SageMaker.

Yeah and that's a poor design decision that everybody now need to live with.

@philschmid
Copy link
Member

Does GKE use the /vertex route?

No GKE can use all existing stuff. There is no instances. I commented in the other PR that we don't need any /vertex route at all and just map the right /route, e.g. /embed when we are on Vertex AI.

@OlivierDehaene
Copy link
Member Author

OlivierDehaene commented Mar 10, 2024

What? #183 (review)

My bad we need a new route since the Vertex Request has the instances object again.

Which one is it then?

@OlivierDehaene
Copy link
Member Author

So if I understood you correctly:

  • GKE can use the container as is
  • Vertex deployments cannot because we need to add instances and predictions to the payloads.

So WHATEVER we do we will have different APIs between the two.
Given the above, I think it is better to allow user the ability to use all routes with an additional type than just mapping arbitrarily a route.

@OlivierDehaene
Copy link
Member Author

@philschmid

@philschmid
Copy link
Member

GKE can use the container as is

Yes

Vertex deployments cannot because we need to add instances and predictions to the payloads.

Yes

Given the above, I think it is better to allow user the ability to use all routes with an additional type than just mapping arbitrarily a route.

I would not do this since it would be different from what we did in TGI. Even if they have a different payload its the "same" just put into instances adding a "type" would make it more different and complex. I rather would define the "type" when a model is deployed.

@OlivierDehaene OlivierDehaene merged commit a57cf61 into main Mar 21, 2024
3 checks passed
@OlivierDehaene OlivierDehaene deleted the feat/vertex_complete branch March 21, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants