Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New TF Serving template #1589

Merged
merged 13 commits into from Sep 27, 2018
Merged

New TF Serving template #1589

merged 13 commits into from Sep 27, 2018

Conversation

lluunn
Copy link
Contributor

@lluunn lluunn commented Sep 20, 2018

For #1264

Move things into jsonnet instead of relying on import.
It's easier to customize.

  • previous params remain the same, just the component name is tf-serving-template
  • e2e test is now using the new tf-serving-template

/cc @jlewi


This change is Reviewable

@lluunn
Copy link
Contributor Author

lluunn commented Sep 20, 2018

/retest

1 similar comment
@lluunn
Copy link
Contributor Author

lluunn commented Sep 20, 2018

/retest

@lluunn lluunn changed the title [WIP] New TF Serving template New TF Serving template Sep 21, 2018
@jlewi
Copy link
Contributor

jlewi commented Sep 21, 2018

/assign @kunmingg

@jlewi
Copy link
Contributor

jlewi commented Sep 21, 2018

/cc @kkasravi

@jlewi
Copy link
Contributor

jlewi commented Sep 21, 2018

I think we're headed in the direction of having multiple prototypes (we already have 2) this adds a 3rd.
I think if we follow my original suggestion of putting everything into the prototype (.jsonnet) file we will get a lot of unnecessary code duplication that will be difficult to maintain overtime.

@kkasravi's recent prototype #1501 suggests to me (see comment) that we should define a base libsonnet file and then use k8s libsonnet to modify it for each prototype.

I considered having a basic prototype with all the logic in the .jsonnet file that people can use as a simple example so they don't have to learn the k8s lib. But even that seems like it would get pretty unruly.

My initial thought is that we should have the following prototypes

  • TFServing using S3
  • TFServing using GCS
  • TFServing using PVC

So just supporting those 3 different use cases will lead to a lot of code duplication if we put all the code into the .jsonnet file.

@lluunn
Copy link
Contributor Author

lluunn commented Sep 21, 2018

I was thinking the we can use only this one jsonnet later.
The optional features are not mutually exclusive. They (GCP credential, S3, istio, request log) can co-exist, so have their own "enabled" flag.

Here the jsonnet structure has two parts (before and after line 75):

  1. Flags and customization.
    Flags for enabling features and can be easily set. Customization for each feature can be set by flags
    or by modify the component directly (the component is the copy of whole jsonnet)
  2. Service/Container/Deployment
    This part may look a little complicated as features being added, env: [] + if XX then [a] else [] + ...
    But it should not need to be changed by users.

WDYT?

@jlewi
Copy link
Contributor

jlewi commented Sep 21, 2018

One of the reasons for having parameters is for making it easy for users to figure out what they have to set.
The problem with the way its currently configured is that the parameters don't define the things the user has to set for S3 vs. GCS. They'd have to read the code to realize there are parameters like "gcpCredentialSecretName" and "s3SecretName".

If we have separate prototypes for GCP and S3. Then in the prototype for GCP we can explicitly define all the relevant parameters e.g.

// @optionalParam gcpSecretName string "user-sa" Name of the secret containing the GCP service account

Then when the user invokes help on the component they can get a meaningful list of parameters that they could set.

@lluunn
Copy link
Contributor Author

lluunn commented Sep 22, 2018

But how about istio and request logs? They can co-exist with gcp credentials and S3.
We can't make one prototype just for istio and another just for request logs because users might want both (and gcp credential as well).

We can add both gcpSecretName and s3Enabled as flags in this prototype, and add better documentation. WDYT?

@jlewi
Copy link
Contributor

jlewi commented Sep 22, 2018

I think its fine to include options for istio and request logs since those aren't mutually exclusive.

We should introduce different prototypes for things that are exclusive; the choice of cloud is exclusive. You are likely either running on AWS or running on GCP.

We already have 8 non cloud specific parameters

S3 has 7 different parameters
s3SecretName: "",
s3SecretAccesskeyidKeyName: "AWS_ACCESS_KEY_ID",
s3SecretSecretaccesskeyKeyName: "AWS_SECRET_ACCESS_KEY",
s3AwsRegion: "us-west-1",
s3UseHttps: "true",
s3VerifySsl: "true",
s3Endpoint

GCP has 1
gcpCredentialSecretName

The whole point of adding parameters is to make it easy for users to customize the variables they care about. IMO the easy of use starts to decrease if we provide so many parameters that the user can't figure out which ones they care about. A GCP use could reasonable be confused by the fact that there are a bunch of S3 related parameters and wonder why they have to set them.

The situation will only get worse as we add support for other Clouds (e.g. ACK, Azure).

So I think having multiple prototypes is more tractable and we can iterate on what belong in a separate prototype vs. options in the existing prototypes.

I think a good place to start would be to think about a prototype specific to GCP and add the parameters and options that make sense on GCP.

Copy link
Contributor

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like defining the separate prototypes as @jlewi outlined:

  • TFServing using S3
    • kubeflow/tf-serving/prototypes/tf-serving-aws.jsonnet
  • TFServing using GCP
    • kubeflow/tf-serving/prototypes/tf-serving-gcp.jsonnet
  • TFServing using PVC
    • kubeflow/tf-serving/prototypes/tf-serving-pvc.jsonnet

though the istio, request logs would need to be added to all 3 prototypes via a mixin at the libsonnet level.

Tensorboard has

  • Tensorboard using S3
    • kubeflow/tensorboard/prototypes/tensorboard-aws.jsonnet
  • Tensorboard using GCP
    • kubeflow/tensorboard/prototypes/tensorboard-gcp.jsonnet

however in {tensorboard-aws.jsonnet, tensorboard-gcp.jsonnet} I neglected to include the params in the comments at the top of each file. I'll submit a PR for the tensorboard change.

// @optionalParam deployHttpProxy string false Whether to deploy http proxy
// @optionalParam modelBasePath string gs://kubeflow-examples-data/mnist The model path
// @optionalParam modelName string mnist The model name
// @optionalParam s3Enable string false Whether to enable S3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be moved into a file called kubeflow/tf-serving/prototypes/tf-serving-aws.jsonnet

} + params;

// Parameters that control S3 access. Need to set params.s3Enable to true
local s3params = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these were moved into the comments in a kubeflow/tf-serving/prototypes/tf-serving-aws.jsonnet then it would be a better locality of reference. For example

// @apiversion 0.1
// @name io.ksonnet.pkg.tf-serving-template
// @description TensorFlow serving
// @shortDescription A TensorFlow serving deployment
// @param name string Name to give to each of the components
// @optionalParam namespace string kubeflow The namespace
// @optionalParam numGpus string 0 Number of gpus to use
// @optionalParam deployHttpProxy string false Whether to deploy http proxy
// @optionalParam modelBasePath string gs://kubeflow-examples-data/mnist The model path
// @optionalParam modelName string mnist The model name
// @optionalParam s3SecretName string "" Name of the k8s secrets containing S3 credentials
// @optionalParam s3SecretAccesskeyidKeyName string "AWS_ACCESS_KEY_ID" Name of the key in the k8s secret containing AWS_ACCESS_KEY_ID
// @optionalParam s3AwsRegion string "us-west-1" S3 region
// @optionalParam s3UseHttps string "true" Whether or not to use https
// @optionalParam s3VerifySsl string "true" Whether or not to verify https certificates for S3 connections
// @optionalParam s3Endpoint string "http://s3.us-west-1.amazonaws.com" URL for your s3-compatible endpoint

},
type: "ClusterIP",
},
}; // service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a
service:: service

as described here

},
],
env: []
+ if util.toBool(params.s3Enable) then s3Env else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this into deployment.mapContainers in the kubeflow/tf-serving/prototypes/tf-serving-aws.jsonnet file as described above

httpProxyContainer,
] else [],
volumes: []
+ if gcpParams.gcpCredentialSecretName != "" then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move into a kubeflow/tf-serving/prototypes/tf-serving-gcp.jsonnet file

@lluunn
Copy link
Contributor Author

lluunn commented Sep 24, 2018

Thanks @jlewi @kkasravi , PTAL

Copy link
Contributor

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment otherwise
/lgtm

}, // tfDeployment
tfDeployment:: tfDeployment,

all:: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this if we're generating the lists in the prototypes?

@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 25, 2018
@lluunn
Copy link
Contributor Author

lluunn commented Sep 25, 2018

/retest

2 similar comments
@lluunn
Copy link
Contributor Author

lluunn commented Sep 25, 2018

/retest

@lluunn
Copy link
Contributor Author

lluunn commented Sep 25, 2018

/retest

@lluunn
Copy link
Contributor Author

lluunn commented Sep 25, 2018

argo test failed in kfctl
Failed to run argo workflow

@lluunn
Copy link
Contributor Author

lluunn commented Sep 25, 2018

/retest

@lluunn
Copy link
Contributor Author

lluunn commented Sep 25, 2018

@lluunn
Copy link
Contributor Author

lluunn commented Sep 26, 2018

still argo test failure, but I don't see how this PR is related to that.

@@ -1,5 +1,7 @@
// Some useful routines.
{
local k = import "k.libsonnet",

// Convert a string to upper case.
upper:: function(x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std library has std.asciiUpper(str)
http://jsonnet.org/ref/stdlib.html

Copy link
Contributor

@kkasravi kkasravi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kkasravi
Copy link
Contributor

@lluunn jfmt errors

@k8s-ci-robot k8s-ci-robot removed the lgtm label Sep 27, 2018
@lluunn
Copy link
Contributor Author

lluunn commented Sep 27, 2018

Thanks @kkasravi

@jlewi
Copy link
Contributor

jlewi commented Sep 27, 2018

This looks great.
/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lluunn
Copy link
Contributor Author

lluunn commented Sep 27, 2018

not sure why notebook release workflow is triggered...

@lluunn
Copy link
Contributor Author

lluunn commented Sep 27, 2018

/retest

@k8s-ci-robot k8s-ci-robot merged commit b31ca6b into kubeflow:master Sep 27, 2018
k8s-ci-robot pushed a commit that referenced this pull request Sep 28, 2018
* new template

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* review

* fix
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021
* new template

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* review

* make timeout longer for deploy argo

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants