TFServing deployment should support GPUs #292

jlewi · 2018-02-24T22:50:52Z

Our TFServing component should have options to support serving with GPUs.

jlewi · 2018-03-07T20:11:53Z

/assign jlewi
/unassign @lluunn

I've started work on this.

…clouds. * To support GPUs and specific clouds we refactor the component to make it easy to override the parts we care about (e.g. container environment variables, resources, etc...). * We do this by moving the things we care about up to the root of tf-serving.libsonnet. * We rely on jsonnet late binding (http://jsonnet.org/docs/tutorial.html). Late binding allows us to devine dictionaries (e.g. params, tfServingContainer) in tf-serving.libsonnet. We can then create manifests based on those objects (e.g. tfDeployment). We can then override values (e.g. params) and the derived objecs (e.g. tfDeployment) will use the overwritten values. * We introduce a parameter "cloud" which allows us to control which "prototype" to use. We use this to use cloud specific customizations; like setting the environment variables on AWS to use S3. * Late binding also makes it possible to select an appropriate default image based on whether GPUs are bing used or not while still allowing the user to override the images. * We remove parameter definitions from the prototypes. The set of parameters ends up being conditional based on flags like cloud, GPUs so its not clear how scalable that was. Related Issues: Fix kubeflow#292

…clouds. * To support GPUs and specific clouds we refactor the component to make it easy to override the parts we care about (e.g. container environment variables, resources, etc...). * We do this by moving the things we care about up to the root of tf-serving.libsonnet. * We rely on jsonnet late binding (http://jsonnet.org/docs/tutorial.html). Late binding allows us to devine dictionaries (e.g. params, tfServingContainer) in tf-serving.libsonnet. We can then create manifests based on those objects (e.g. tfDeployment). We can then override values (e.g. params) and the derived objecs (e.g. tfDeployment) will use the overwritten values. * We introduce a parameter "cloud" which allows us to control which "prototype" to use. We use this to use cloud specific customizations; like setting the environment variables on AWS to use S3. * Late binding also makes it possible to select an appropriate default image based on whether GPUs are bing used or not while still allowing the user to override the images. * We remove parameter definitions from the prototypes. The set of parameters ends up being conditional based on flags like cloud, GPUs so its not clear how scalable that was. * Use camelCase not underscores for parameters. See kubeflow#303. Related Issues: Fix kubeflow#292 Update the test to work with the changes. * Parameters are now camelCase. They also aren't parameters of the prototype so we can't set them in the call to generate. * So we need to modify deploy to take a list of the parameters to set on the component.

…clouds (#387) * Refactor the TFServing component to better support GPUs and specific clouds. * To support GPUs and specific clouds we refactor the component to make it easy to override the parts we care about (e.g. container environment variables, resources, etc...). * We do this by moving the things we care about up to the root of tf-serving.libsonnet. * We rely on jsonnet late binding (http://jsonnet.org/docs/tutorial.html). Late binding allows us to devine dictionaries (e.g. params, tfServingContainer) in tf-serving.libsonnet. We can then create manifests based on those objects (e.g. tfDeployment). We can then override values (e.g. params) and the derived objecs (e.g. tfDeployment) will use the overwritten values. * We introduce a parameter "cloud" which allows us to control which "prototype" to use. We use this to use cloud specific customizations; like setting the environment variables on AWS to use S3. * Late binding also makes it possible to select an appropriate default image based on whether GPUs are bing used or not while still allowing the user to override the images. * We remove parameter definitions from the prototypes. The set of parameters ends up being conditional based on flags like cloud, GPUs so its not clear how scalable that was. * Use camelCase not underscores for parameters. See #303. Related Issues: Fix #292 Update the test to work with the changes. * Parameters are now camelCase. They also aren't parameters of the prototype so we can't set them in the call to generate. * So we need to modify deploy to take a list of the parameters to set on the component. * jsonnet format.

Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com>

jlewi added the area/inference label Feb 24, 2018

jlewi assigned lluunn Feb 24, 2018

jlewi mentioned this issue Feb 24, 2018

Add GPU Support for k8s-model-server on Kubeflow #194

Closed

jlewi added the priority/p1 label Mar 7, 2018

jlewi mentioned this issue Mar 7, 2018

Create a GPU model deployment to use for E2E testing of serving with GPUs #362

Merged

k8s-ci-robot assigned jlewi and unassigned lluunn Mar 7, 2018

This was referenced Mar 7, 2018

Refactor the TFServing component to better support GPUs and specific clouds #387

Merged

TFServing prototype for using GCS with service account key #385

Closed

E2E Test for TFServing with GPUs #291

Closed

k8s-ci-robot closed this as completed in #387 Mar 8, 2018

yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021

get metricscollector by API (kubeflow#292)

33b2e58

Signed-off-by: YujiOshima <yuji.oshima0x3fd@gmail.com>

elenzio9 pushed a commit to arrikto/kubeflow that referenced this issue Oct 31, 2022

Adding member to github org (kubeflow#292)

b214ddc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TFServing deployment should support GPUs #292

TFServing deployment should support GPUs #292

jlewi commented Feb 24, 2018

jlewi commented Mar 7, 2018

TFServing deployment should support GPUs #292

TFServing deployment should support GPUs #292

Comments

jlewi commented Feb 24, 2018

jlewi commented Mar 7, 2018