New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define common API for prediction #102

Open
jlewi opened this Issue Jan 8, 2018 · 18 comments

Comments

Projects
None yet
8 participants
@jlewi
Copy link
Contributor

jlewi commented Jan 8, 2018

We need a common API for inference. This is critical to being able to support a variety of different frameworks (e.g. TensorFlow, scikits, xgboost).

There's a lot of prior art here. A lot of the existing frameworks define inference APIs. We should start by collecting those and seeing if it makes sense to adopt a slightly modified version of one of them.

I think the aim should be to produce an OpenAPI spec for the prediction API.

Some frameworks we should look at:

Does anyone have links to definitions for the H20 and Pipeline.AI APIs?

/cc @cfregly @seeldon @cliveseldon @lluunn

@jlewi jlewi added the area/inference label Jan 8, 2018

@cliveseldon

This comment has been minimized.

Copy link
Contributor

cliveseldon commented Jan 8, 2018

At Seldon, as the above link shows our API is described as a gRPC proto spec. For the external API we provide two endpoints, predict and feedback. Feedback allows "rewards" to be sent to allow users of the API to return the success or not of the predictions for use by online optimization algorithms, for example multi-armed bandit solvers.

The core message type SeldonMessage has various variants. The DefaultData has either a Tensor representation (most likely for use by gRPC users) and a NDArray representation (most likely for use by REST JSON users sending a list of list of features). There are also general string and binary representations for users whose use case doesn''t fit the DefaultMessage.

Added to this there is Meta and Status parts of the message. This allows the SeldonMessage to be quite generic and be used for both request and response parts of calls.

There seem to be a few questions:

  1. Does KubeFlow want to handle both REST and gRPC
  2. If both how will the gRPC spec be derived from the OpenAPI? We have an issue that discusses our situation where we want to create the OpenAPI spec from gRPC spec. I think Kubernetes starts with the Go Structs which are commented and then the OpenAPI spec and proto files are generated from that.

There is also the question of how KubeFlow will integrate with the other projects mentioned and whether the API needs to be a separate thing that multiple projects can adhere to?

@Maximophone from Seldon might have some comments as he worked on the API.

@elmiko

This comment has been minimized.

Copy link

elmiko commented Jan 8, 2018

I think the aim should be to produce an OpenAPI spec for the prediction API.

+1 for creating something with OpenAPI, i think it will give implementers the greatest flexibility with regards to creating tooling and interacting with the kubeflow processes.

Does KubeFlow want to handle both REST and gRPC

this sounds like a really interesting idea, the technical advatanges of using gRPC really offer some nice features but i am concerned about getting too entrenched in something that ties down a part of the stack that will be the highest visibility.

@cliveseldon i read the linked issue, have you made any progress on deriving an OpenAPI spec from the gRPC?

@cliveseldon

This comment has been minimized.

Copy link
Contributor

cliveseldon commented Jan 8, 2018

@elmiko In the issue I link to the OpenAPI issue discussing this in detail and also the NY Times project for OpenAPI->proto which I have not tried yet.

@jlewi

This comment has been minimized.

Copy link
Contributor

jlewi commented Jan 8, 2018

I think we definitely want to support both REST and gRPC.

@jlewi jlewi added this to the Kubecon Europe milestone Jan 9, 2018

@lluunn

This comment has been minimized.

Copy link
Contributor

lluunn commented Jan 11, 2018

@jlewi

This comment has been minimized.

Copy link
Contributor

jlewi commented Jan 11, 2018

The CMLE predict payload doesn't seem very structured compared to Seldon's. Its just a list of arbitrary objects. In comparison, in Seldon's the payload is

message SeldonMessage {

  Status status = 1;
  Meta meta = 2;
  oneof data_oneof {
    DefaultData data = 3;
    bytes binData = 4;
    string strData = 5;
  }
}

So Seldon explicitly has metadata
message Meta {
string puid = 1;
map<string,google.protobuf.Value> tags = 2;
map<string,int32> routing = 3;
}


Tags seems very useful and in the context of K8s I kind of expect every object to have a list of arbitrary key value pairs. As an example, I could see tags as being useful for providing a key that could be used to sessionize logs. How would this be handled in the CMLE API?

The Seldon API also provides a one off that allows sending data with more structure

message DefaultData {
repeated string names = 1;
oneof data_oneof {
Tensor tensor = 2;
google.protobuf.ListValue ndarray = 3;
}
}

message Tensor {
repeated int32 shape = 1 [packed=true];
repeated double values = 2 [packed=true];
}

@cfregly

This comment has been minimized.

Copy link

cfregly commented Jan 12, 2018

Here's a link to the PipelineAI OpenAPI docs:

https://github.com/PipelineAI/pipeline/tree/master/docs/api

Few Notes:

  • Our API is intentionally generic (similar to CMLE) to support a wide variety of use cases (including existing APIs).
  • I'd suggest not over-architecting this piece of the stack - not now, anyway. We went this route initially, but found it better to be more generic.
  • We rely on the server-side implementation of the predict() method to handle the inputs however they wish.
  • Keeping things generic let's us re-use the same model implementation behind a REST endpoint or a Kafka Stream. We just need to adjust the transform_request() and transform_response() methods inside the predict() method.
  • That link above points to a reverse-engineered OpenAPI Spec of the AWS SageMaker Predict API.
  • AWS SageMaker's Predict API is also very generic as you can see.
  • Whether you like AWS SageMaker or not, it's been released - and we should work with that group to support their Predict API.

Great discussion. Thanks for driving this, @jlewi!

Look forward to creating some standards among our prediction APIs!

/cc @owen-t @laurenyu (from AWS SageMaker team)

@cfregly

This comment has been minimized.

Copy link

cfregly commented Jan 15, 2018

Also, i'd like to emphasize the ping() method used to check status of the service. we have this in the PipelineAI Predict API (linked above).

@jlewi

This comment has been minimized.

Copy link
Contributor

jlewi commented Jan 16, 2018

What's the benefit of an API spec that just takes an arbitrary json object? Doesn't that tightly couple clients and models in that client code has to be written for a specific model?

The Seldon API seems like its more self documenting then the CMLE/Pipeline AI generic APIs. For example, if I look at the Seldon spec its pretty obvious which fields I need to set in order to send, strings, binary data, tensors etc... As a consequence of this, I'd expect auto-generated client tools to be much better in the Seldon case.

With a generic API spec that just takes arbitrary json I'd actually have to know what the server implementation was doing.

As one example, with the CMLE API you have to dig through the docs to realize that to send binary data you do it by using a dictionary with the magic key "b64". Whereas with the Seldon API the API spec along with the proto3 json mapping makes it clear you would just use the binData field and base64.

What does being generic have to do with using a Kafka Stream? I've never used Kafka but I'd expect it doesn't care what the schema of the payload is.

@bhack

This comment has been minimized.

Copy link

bhack commented Jan 19, 2018

@whitlockjc

This comment has been minimized.

Copy link

whitlockjc commented Jan 19, 2018

OpenAPI is geared toward describing REST APIs. If you're describing those, I'm not sure of a better format for doing what you want. If it's not for REST APIs, you could potentially still use OpenAPI and benefit from its tooling and computer/human readability which might make it a good format as an input (well known, easy to author, etc.) but would allow for some transformation for other things.

Those are my initial thoughts without diving too deep into the context.

@yupbank

This comment has been minimized.

Copy link
Member

yupbank commented Feb 1, 2018

CMLE is currently only support tensorflow models.

And tensorflow/serving already defined their own API in gRPC and have a python package for it https://pypi.python.org/pypi/tensorflow-serving-api/1.4.0

The api of tensorflow/serving is quiet flexible.. it leaves the input and output schema to tensorflow/savedModel. And CMLE is backed by tensorflow/serving so they are the same.

jlewi added a commit that referenced this issue Feb 5, 2018

add http proxy for tf-serving (#183)
Since REST Api is still widely used and tf-serving the c++ binary don't support http request.

This pr use python tornado with async supported web server to do the proxy

relationship between this pr and #65
differences:

#65 providing a big application supports tf-serving, xgboost and sklearn. while this pr is just a thin layer on top of tf-serving
#65 is using paste web server instead of tornado.
#65 is spawning a tf-serving process inside the docker, this pr use tf-serving as a grpc provider
similar:

both utilizing tensorflow/serving.
both provide CMLE like REST api
since #65 is too big for one pr and the author intend to split it, this pr can potentially help the tf-serving part. as the code is only one file.

Related to:

Define common prediction API(#102)
TfServing Uber issue (#64)

@jlewi jlewi removed this from the Kubecon Europe milestone Feb 13, 2018

@jlewi

This comment has been minimized.

Copy link
Contributor

jlewi commented Feb 13, 2018

I'm removing this from the Kubecon Europe milestone. I don't see us converging on a single prediction API by then and I don't see an immediate need for one.

@yupbank

This comment has been minimized.

Copy link
Member

yupbank commented Feb 13, 2018

The goal is actually quiet ambitious if we are aiming support all the tools with the same api.
Even have a unified API between REST and GRPC is already complicated. Not to mention, the input data format is so different from different tools, Tensorflow/serving would expect input in tensor protobuf. H2O/xgboost would expect data in DMatrix , vowpal-wabbit would expect data in raw string.

@yupbank

This comment has been minimized.

Copy link
Member

yupbank commented Feb 13, 2018

But the flip side is that, do we need to support all the tools just for serving/prediction? Even though different tools comes with different magic for the same machine learning model. Most of the magic part stays at training phrase. After training.. the overlap between different tools is huge. that's why we can have https://github.com/onnx/onnx .
And most of the time, prediction have nothing to do with training. why don't we focus on tensorflow only for now, and then we can have one or two(REST+GRPC) "unified prediction API"

@jlewi jlewi added the priority/p3 label Mar 7, 2018

@bhack

This comment has been minimized.

Copy link

bhack commented Mar 21, 2018

Just to mention another K8 related project: https://github.com/IBM/FfDL

@bhack

This comment has been minimized.

Copy link

bhack commented Mar 27, 2018

Something is moving inside the LinuxFoundation perimeter: https://www.acumos.org/platform/

@bhack

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment