New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch Prediction using GPUs with local runner #251
Comments
/assign yixinshi |
With the new integration between TensorRT and TensorFlow 1.7, TensorRT optimizes compatible sub-graphs and let's TensorFlow execute the rest. |
@yixinshi How's this going? |
I think I will start with a container image of Tf-serving as the base image for batch prediction. Then optimize it to use TensorFlow RT and/or GRE. |
P1 because we'd like to have this in our 0.2 release. However, we will probably only support local beam runner for GPUs in 0.2. |
@yixinshi What is the likelihood this will make 0.2? |
Is tensorflow serving compilable with TensorRT? tensorflow/serving#864 |
@bhack Don't know. |
@yixinshi Can we close this issue? I think batch prediction with GPUs and local runner is working now? |
* fix bayse optimization suggestion Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * add bayseopt-example Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * reset x_train in burn-in Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com> * validate parameters Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com>
…w#251) * Specify write access for operator teams and add mxnet-operator team Signed-off-by: terrytangyuan <terrytangyuan@gmail.com> * Add mxnet-operator to project-maintainers Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>
Using GPUs for batch prediction could be really valuable.
There's a couple of different ways we could support this.
If we use a framework like Spark or Flink running on a K8s cluster with GPU nodes then the workers should be able to directly use the GPUs.
If we run Spark/Flink/Dataflow external to K8s such that the workers don't have direct access to GPUs then we could deploy TFServing on a K8s cluster with GPUs and then the workers could send batches of requests to the model to do inference.
The text was updated successfully, but these errors were encountered: