Batch Prediction using GPUs with local runner #251

jlewi · 2018-02-15T22:43:42Z

Using GPUs for batch prediction could be really valuable.

There's a couple of different ways we could support this.

If we use a framework like Spark or Flink running on a K8s cluster with GPU nodes then the workers should be able to directly use the GPUs.
If we run Spark/Flink/Dataflow external to K8s such that the workers don't have direct access to GPUs then we could deploy TFServing on a K8s cluster with GPUs and then the workers could send batches of requests to the model to do inference.

jlewi · 2018-03-22T19:39:40Z

Some things to investigate

TFServing currently doesn't support multi GPUs but you can potentially work around this by running 1 container per GPU.
NVIDIA has TensorRT to optimize nets for GPUs
- But this only supports a subset of TF graphs.
NVIDIA has GRE

yixinshi · 2018-03-23T17:13:19Z

/assign yixinshi

ankushagarwal · 2018-03-29T08:31:29Z

With the new integration between TensorRT and TensorFlow 1.7, TensorRT optimizes compatible sub-graphs and let's TensorFlow execute the rest.

jlewi · 2018-04-09T17:05:42Z

@yixinshi How's this going?

yixinshi · 2018-04-16T17:57:10Z

I think I will start with a container image of Tf-serving as the base image for batch prediction. Then optimize it to use TensorFlow RT and/or GRE.

jlewi · 2018-04-30T19:28:08Z

P1 because we'd like to have this in our 0.2 release. However, we will probably only support local beam runner for GPUs in 0.2.

jlewi · 2018-06-05T22:16:29Z

@yixinshi What is the likelihood this will make 0.2?

bhack · 2018-07-01T18:48:38Z

Is tensorflow serving compilable with TensorRT? tensorflow/serving#864

jlewi · 2018-08-20T01:25:01Z

@bhack Don't know.

jlewi · 2018-08-20T01:25:38Z

@yixinshi Can we close this issue? I think batch prediction with GPUs and local runner is working now?

* fix bayse optimization suggestion Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add bayseopt-example Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * reset x_train in burn-in Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * validate parameters Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com>

…w#251) * Specify write access for operator teams and add mxnet-operator team Signed-off-by: terrytangyuan <terrytangyuan@gmail.com> * Add mxnet-operator to project-maintainers Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>

jlewi added the area/inference label Mar 22, 2018

k8s-ci-robot assigned yixinshi Mar 23, 2018

jlewi mentioned this issue Mar 29, 2018

[Enhance] Image enhancement example kubeflow/examples#59

Closed

10 tasks

jlewi mentioned this issue Apr 16, 2018

Batch Prediction Beam Library #662

Closed

jlewi changed the title ~~Batch Prediction using GPUs~~ Batch Prediction using GPUs with local runner Apr 30, 2018

jlewi added priority/p1 priority/p0 and removed priority/p0 labels Apr 30, 2018

jlewi added the area/0.2.0 label Jun 5, 2018

jlewi added area/0.3.0 and removed area/0.2.0 labels Jun 19, 2018

jlewi closed this as completed Sep 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Prediction using GPUs with local runner #251

Batch Prediction using GPUs with local runner #251

jlewi commented Feb 15, 2018 •

edited

jlewi commented Mar 22, 2018

yixinshi commented Mar 23, 2018

ankushagarwal commented Mar 29, 2018

jlewi commented Apr 9, 2018

yixinshi commented Apr 16, 2018

jlewi commented Apr 30, 2018

jlewi commented Jun 5, 2018

bhack commented Jul 1, 2018

jlewi commented Aug 20, 2018

jlewi commented Aug 20, 2018

Batch Prediction using GPUs with local runner #251

Batch Prediction using GPUs with local runner #251

Comments

jlewi commented Feb 15, 2018 • edited

jlewi commented Mar 22, 2018

yixinshi commented Mar 23, 2018

ankushagarwal commented Mar 29, 2018

jlewi commented Apr 9, 2018

yixinshi commented Apr 16, 2018

jlewi commented Apr 30, 2018

jlewi commented Jun 5, 2018

bhack commented Jul 1, 2018

jlewi commented Aug 20, 2018

jlewi commented Aug 20, 2018

jlewi commented Feb 15, 2018 •

edited