-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the TFServing component to better support GPUs and specific clouds #387
Conversation
/cc @lluunn |
/test all |
…clouds. * To support GPUs and specific clouds we refactor the component to make it easy to override the parts we care about (e.g. container environment variables, resources, etc...). * We do this by moving the things we care about up to the root of tf-serving.libsonnet. * We rely on jsonnet late binding (http://jsonnet.org/docs/tutorial.html). Late binding allows us to devine dictionaries (e.g. params, tfServingContainer) in tf-serving.libsonnet. We can then create manifests based on those objects (e.g. tfDeployment). We can then override values (e.g. params) and the derived objecs (e.g. tfDeployment) will use the overwritten values. * We introduce a parameter "cloud" which allows us to control which "prototype" to use. We use this to use cloud specific customizations; like setting the environment variables on AWS to use S3. * Late binding also makes it possible to select an appropriate default image based on whether GPUs are bing used or not while still allowing the user to override the images. * We remove parameter definitions from the prototypes. The set of parameters ends up being conditional based on flags like cloud, GPUs so its not clear how scalable that was. * Use camelCase not underscores for parameters. See kubeflow#303. Related Issues: Fix kubeflow#292 Update the test to work with the changes. * Parameters are now camelCase. They also aren't parameters of the prototype so we can't set them in the call to generate. * So we need to modify deploy to take a list of the parameters to set on the component.
Reviewed 6 of 6 files at r1. kubeflow/tf-serving/tf-serving.libsonnet, line 143 at r1 (raw file):
Do you want to lower these numbers? Comments from Reviewable |
There are conflicts. |
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. kubeflow/tf-serving/tf-serving.libsonnet, line 143 at r1 (raw file): Previously, lluunn (Lun-Kai Hsu) wrote…
We probably should. If you have good numbers we can change it otherwise we'll leave it for a follow on PR. There's an existing issue related to http proxy resources. Comments from Reviewable |
Reviewed 3 of 3 files at r2. Comments from Reviewable |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lluunn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
components:: { | ||
|
||
all:: | ||
if $.params.cloud == "aws" then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better as params.s3Enable. The S3 functionality isn't specific to AWS.
// Parts specific to S3 | ||
s3parts:: $.parts { | ||
s3Env:: [ | ||
{ name: "AWS_ACCESS_KEY_ID", valueFrom: { secretKeyRef: { name: $.s3params.s3SecretName, key: $.s3params.s3SecretAcesskeyidKeyName } } }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/s3SecretAcesskeyidKeyName/s3SecretAccesskeyidKeyName
Looks like this got auto-merged because Lunkai is an approver (its a quirk of the automatic review system). I'll send a new PR. |
* Use the params s3Enable not cloud to decide whether to enable S3 modifications. Use of S3 can be orthogonal to cloud. * Fix a typo in the variable name. * PR kubeflow#387 was automatically committed as a quirk of our auto-merge code even though there were outstanding comments.
* Use the params s3Enable not cloud to decide whether to enable S3 modifications. Use of S3 can be orthogonal to cloud. * Fix a typo in the variable name. * PR kubeflow#387 was automatically committed as a quirk of our auto-merge code even though there were outstanding comments.
* Use the params s3Enable not cloud to decide whether to enable S3 modifications. Use of S3 can be orthogonal to cloud. * Fix a typo in the variable name. * PR #387 was automatically committed as a quirk of our auto-merge code even though there were outstanding comments.
* Removing Operator specific handling during a StudyJob run * Return empty in error
chore: use Bobgy's github token instead. Fixes kubeflow#387
To support GPUs and specific clouds we refactor the component to make
it easy to override the parts we care about (e.g. container environment
variables, resources, etc...).
We do this by moving the things we care about up to the root of tf-serving.libsonnet.
We rely on jsonnet late binding (http://jsonnet.org/docs/tutorial.html).
Late binding allows us to devine dictionaries (e.g. params, tfServingContainer)
in tf-serving.libsonnet. We can then create manifests based on those
objects (e.g. tfDeployment). We can then override values (e.g. params) and
the derived objecs (e.g. tfDeployment) will use the overwritten values.
We introduce a parameter "cloud" which allows us to control which "prototype" to use. We use this to use cloud specific customizations; like setting
the environment variables on AWS to use S3.
Late binding also makes it possible to select an appropriate default image
based on whether GPUs are bing used or not while still allowing the
user to override the images.
We remove parameter definitions from the prototypes. The set of parameters
ends up being conditional based on flags like cloud, GPUs so its
not clear how scalable that was.
Related Issues:
#376 patterns for ksonnet prototypes.
Fix #292
This change is