feat: add all methods to vertex API #192

OlivierDehaene · 2024-03-07T15:13:51Z

@drbh what do you think?

@philschmid FYI this will modify the API to:

curl 127.0.0.1:3000/vertex \
    -X POST \
    -d '{"instances": [{"type": "embed", "inputs": "I like you."}]}' \
    -H 'Content-Type: application/json'
    
 # [{"type":"embed","result":[[-0.057003036,-0.022593308, ...]] }]

philschmid · 2024-03-07T15:16:36Z

We should not make it different from what people would get on GKE? I think we should rather make it configurable what "type" is used via env, e.g. INFERENCE_TYPE="embed", "rank",.... We will have the same question for sagemaker too.

OlivierDehaene · 2024-03-07T15:32:43Z

Then how do you use the multiple routes? You just decide a start time what's the route you want to serve? I think that's an inferior solution in every aspect.

OlivierDehaene · 2024-03-07T15:33:52Z

type here allow you to use all TEI functionalities with a single route. embed, embed_all, embed_sparse, tokenize...

drbh · 2024-03-07T15:34:33Z

@OlivierDehaene approved and looks good but deferring to @philschmid on the best interface

drbh

lgtm

OlivierDehaene · 2024-03-07T15:37:08Z

Not my fault all the clouds can freaking figure out that oh surprise you may want to have more than a single route on your service...

philschmid · 2024-03-07T16:05:12Z

Then how do you use the multiple routes? You just decide a start time what's the route you want to serve? I think that's an inferior solution in every aspect.

You deploy 1 model, so you define 1 task, similar to other models, like token-classification. If you want to change, change your deployment I don't think that's an issue for companies.
It's way more confusing if payloads differen between solutions and make sure we document those correctly. I would not advise having a different request for Vertex AI and GKE. Thats not a good UX. Especially since the UX on Google Cloud to deploy is almost identical.

The single route is something all Cloud ML Services currently have with Vertex and SageMaker.

OlivierDehaene · 2024-03-07T17:07:19Z

You deploy 1 model, so you define 1 task, similar to other models, like token-classification. If you want to change, change your deployment I don't think that's an issue for companies.

Except that's not the case. Some users use both /embed and /embed_all with one deployment. A lot of users use the method for the deployment and /tokenize at the same time.

I would not advise having a different request for Vertex AI and GKE

Is it not already the case? Does GKE use the /vertex route? If not they don't have the instances shenanigans (@drbh is it mandatory to wrap the payload in instances?).

The single route is something all Cloud ML Services currently have with Vertex and SageMaker.

Yeah and that's a poor design decision that everybody now need to live with.

philschmid · 2024-03-08T11:13:33Z

Does GKE use the /vertex route?

No GKE can use all existing stuff. There is no instances. I commented in the other PR that we don't need any /vertex route at all and just map the right /route, e.g. /embed when we are on Vertex AI.

OlivierDehaene · 2024-03-10T11:22:27Z

What? #183 (review)

My bad we need a new route since the Vertex Request has the instances object again.

Which one is it then?

OlivierDehaene · 2024-03-10T11:30:03Z

So if I understood you correctly:

GKE can use the container as is
Vertex deployments cannot because we need to add instances and predictions to the payloads.

So WHATEVER we do we will have different APIs between the two.
Given the above, I think it is better to allow user the ability to use all routes with an additional type than just mapping arbitrarily a route.

OlivierDehaene · 2024-03-18T16:00:34Z

@philschmid

philschmid · 2024-03-18T16:22:50Z

GKE can use the container as is

Yes

Vertex deployments cannot because we need to add instances and predictions to the payloads.

Yes

Given the above, I think it is better to allow user the ability to use all routes with an additional type than just mapping arbitrarily a route.

I would not do this since it would be different from what we did in TGI. Even if they have a different payload its the "same" just put into instances adding a "type" would make it more different and complex. I rather would define the "type" when a model is deployed.

drbh previously approved these changes Mar 7, 2024

View reviewed changes

OlivierDehaene dismissed drbh’s stale review via 7441c45 March 7, 2024 15:44

OlivierDehaene added 4 commits March 21, 2024 17:08

feat: add all methods to vertex API

f75fffc

update openapi

b1efb4b

use the model type to match safely

094f191

feat: add /decode route

339dc4a

OlivierDehaene force-pushed the feat/vertex_complete branch from c2c984c to 339dc4a Compare March 21, 2024 16:20

OlivierDehaene merged commit a57cf61 into main Mar 21, 2024
3 checks passed

OlivierDehaene deleted the feat/vertex_complete branch March 21, 2024 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add all methods to vertex API #192

feat: add all methods to vertex API #192

OlivierDehaene commented Mar 7, 2024

philschmid commented Mar 7, 2024

OlivierDehaene commented Mar 7, 2024

OlivierDehaene commented Mar 7, 2024

drbh commented Mar 7, 2024

drbh left a comment

OlivierDehaene commented Mar 7, 2024

philschmid commented Mar 7, 2024

OlivierDehaene commented Mar 7, 2024

philschmid commented Mar 8, 2024

OlivierDehaene commented Mar 10, 2024 •

edited

OlivierDehaene commented Mar 10, 2024

OlivierDehaene commented Mar 18, 2024

philschmid commented Mar 18, 2024

feat: add all methods to vertex API #192

feat: add all methods to vertex API #192

Conversation

OlivierDehaene commented Mar 7, 2024

philschmid commented Mar 7, 2024

OlivierDehaene commented Mar 7, 2024

OlivierDehaene commented Mar 7, 2024

drbh commented Mar 7, 2024

drbh left a comment

Choose a reason for hiding this comment

OlivierDehaene commented Mar 7, 2024

philschmid commented Mar 7, 2024

OlivierDehaene commented Mar 7, 2024

philschmid commented Mar 8, 2024

OlivierDehaene commented Mar 10, 2024 • edited

OlivierDehaene commented Mar 10, 2024

OlivierDehaene commented Mar 18, 2024

philschmid commented Mar 18, 2024

OlivierDehaene commented Mar 10, 2024 •

edited