[question] Standard API endpoints? #845

vsoch · 2022-02-14T20:30:19Z

Hi! Is there any work in the online ML community to derive a standard set of API endpoints / interactions for a service (and then implementations can use and add extra as needed?) An example in the containers community would be the OCI distribution-spec: https://github.com/opencontainers/distribution-spec and I've made one for workflows too: https://github.com/panoptes-organization/monitor-schema/blob/main/spec.md.

I want to ask because if there are a bunch of us making similar servers, it might make sense to go off of the same or similar design. Thank you!

MaxHalford · 2022-02-18T16:37:34Z

Hey!

I don't believe there is such a standard, which says a lot about the maturity of the field. But I may be wrong :)

I'm sure it's not too hard to work out some specs. Is there an established format to write down such a spec? Like an RFC?

vsoch · 2022-02-18T17:50:31Z

I've never gone through creating a formal RFC, although I did create an RFC template for opencontainers! https://specs.opencontainers.org/image-spec/?v=v1.0.1. I think early work probably wouldn't need to be official - my thinking is I'll write up a spec.md doc alongside what I'm testing and see if anyone else is interested.

vsoch · 2022-03-01T05:58:20Z

heyo! So I started a very basic spec, and it's based off of chantilly and then django-river-ml. I tried to keep it as simple as possible since it's the first shot 👉 https://vsoch.github.io/riverapi/getting_started/spec.html please provide any feedback / point people in this direction that might be interested to help or think more about it! Closing the issue since my question is answered and resolved.

MaxHalford · 2022-03-02T07:46:14Z

Very cool!

I take it this overlaps with tools like OpenAPI and Swagger. But those are generated once the implementation is done; they're not specs.

The routes looks good to me. One thing though: for me there should also be a /label route. You give it a label and an ID, and those are matched with the features passed during a prediction. You don't pass the features directly in the /learn route. I know this is counter-intuitive, but it makes sense when you think about it. Maybe you already know what I mean, and so I won't expand. But please let me know if this isn't clear.

vsoch · 2022-03-02T10:29:34Z

I take it this overlaps with tools like OpenAPI and Swagger. But those are generated once the implementation is done; they're not specs.

Oh indeed! Yes I can add that view to the django plugin - there are easy ways to do that.

The routes looks good to me. One thing though: for me there should also be a /label route. You give it a label and an ID, and those are matched with the features passed during a prediction. You don't pass the features directly in the /learn route. I know this is counter-intuitive, but it makes sense when you think about it. Maybe you already know what I mean, and so I won't expand. But please let me know if this isn't clear.

So you are saying /label would be the route to provide features with a ground truth, and /learn doesn't require it? I don't know exactly what you mean so the additional explanation would be helpful! I can definitely make more time later this week to hack on this bit.

MaxHalford · 2022-03-02T10:35:33Z

So you are saying /label would be the route to provide features with a ground truth, and /learn doesn't require it? I don't know exactly what you mean so the additional explanation would be helpful! I can definitely make more time later this week to hack on this bit.

So this I what I try to explain in my talks, but it's not an easy concept. Basically:

/predict takes as input an ID and a set of features.
/label takes as input an ID and a label.

Under the hood, the features and the label can be joined to make the model learn. This is helpful because it avoids stuttering: the features are passed once in /predict, and not a second time in /label. This isn't just convenient. It's the more correct way to proceed because it avoids having feature discrepancies between /predict and /label.

It is up to the system to decide what to do when /label happens. It can update the model synchronously, essentially doing what a /learn route would do. Or it can store the label in a DB and let the learning happen in the background.

Does that make more sense?

vsoch · 2022-03-02T10:53:11Z

I can give it a shot! If you updated the example in your test.py for chantilly with this approach what would that look like?

MaxHalford · 2022-03-02T11:50:03Z

Ah well something like this I suppose:

x = {...}
uuid = ...
requests.post('/predict', json={'features': x, 'id': uuid})

label = True
requests.post('/label', json={'id': uuid, 'label': label})

vsoch · 2022-03-02T17:53:09Z

Gotcha, so you would store one or more labels with a model name and identifier? E.g.,:

self.db[f"labels/{model_name}/{identifier}"] = ["label"]

vsoch · 2022-03-02T17:53:39Z

and a label != a ground truth provided in /learn ?

MaxHalford · 2022-03-02T18:49:49Z

Yes you could store it like that. But then once you consider the case of having multiple models being updated in parallel, then this storage scheme might not make much sense.

And yes label and ground truths are synonyms.

vsoch · 2022-03-02T18:54:39Z

Agree, so just to clarify the use cases:

if I know a label at the time of learning, I can provide as ground truth
If I don't know a label at learning I can add it later, but I need to keep track of the identifier

Where exactly does this identifier come from? I see it's optional to add for variious endpoints, but it's not clear how it's generated. Shouldn't the server be generating it (and returning it somewhere) for the user, and then the user could do something like update a previous identifier?

I also think if ground truth == label the API should use them consistently, either choosing ground truth or label (but not both). What do you think?

MaxHalford · 2022-03-02T22:29:38Z

if I know a label at the time of learning, I can provide as ground truth

Indeed, when you have a ground truth, it usually means you made a prediction beforehand.

Where exactly does this identifier come from? I see it's optional to add for variious endpoints, but it's not clear how it's generated. Shouldn't the server be generating it (and returning it somewhere) for the user, and then the user could do something like update a previous identifier?

It depends. Ideally the user should provide this. But you could ask generate one for each prediction as a convenience for the user.

I also think if ground truth == label the API should use them consistently, either choosing ground truth or label (but not both). What do you think?

Yes, I suppose so. I would go with ground truth, as label is usually only used for classification.

vsoch · 2022-03-05T20:13:26Z

Follow up question for the label here - instead of trying to store it, can we not just use it to update the metrics from the previous prediction (and then delete the identifier from the cache since we've labeled it and reflected the accuracy etc. in the model? It looks like for the current implementation when we get ground truth for a label we:

use it to update metrics
use to model.learn_one with the prediction
announce to any stream listeners

So I'm inclined for label to do the same, and not actually save/cache it anywhere - it's basically the same as predict minus doing the prediction because we get it from the cache. Does that sound ok?

MaxHalford · 2022-03-05T20:24:47Z

Yes of course, you can do that. I'm only saying that doing the learning in the background might be desirable for performance reasons.

vsoch closed this as completed Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Standard API endpoints? #845

[question] Standard API endpoints? #845

vsoch commented Feb 14, 2022

MaxHalford commented Feb 18, 2022

vsoch commented Feb 18, 2022

vsoch commented Mar 1, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 2, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 2, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 2, 2022

vsoch commented Mar 2, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 2, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 5, 2022

MaxHalford commented Mar 5, 2022

[question] Standard API endpoints? #845

[question] Standard API endpoints? #845

Comments

vsoch commented Feb 14, 2022

MaxHalford commented Feb 18, 2022

vsoch commented Feb 18, 2022

vsoch commented Mar 1, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 2, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 2, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 2, 2022

vsoch commented Mar 2, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 2, 2022

MaxHalford commented Mar 2, 2022

vsoch commented Mar 5, 2022

MaxHalford commented Mar 5, 2022