-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ksonnet to configure/deploy Kubeflow #36
Conversation
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
README.ksonnet.md
Outdated
|
||
## Build ks | ||
|
||
ksonnet doesn't support adding non default registries yet ((ksonnet/ksonnet/issues/38)[https://github.com/ksonnet/ksonnet/issues/38]) so we need to modify and [build from source](https://ksonnet.io/docs/build-install). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub isn't rendering this link correctly. I believe it should be: [ksonnet/ksonnet/issues/38](https://github.com/ksonnet/ksonnet/issues/38)
README.ksonnet.md
Outdated
go get github.com/ksonnet/ksonnet | ||
``` | ||
|
||
Checkout PR 228 which adds support for registry add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been merged: ksonnet/ksonnet@29e3cd4
Thanks @paulbaumgart PTAL |
@jlewi The README.ksonnet.md updates look good. Might be worth taking another pass at https://github.com/jlewi/kubeflow/blob/ksonnet_crd/kubeflow/README.md (perhaps just linking to the README.ksonnet.md section?) The doc-gen link in kubeflow/README.md also seems to be broken. |
@paulbaumgart Thanks. I also auto-generated READMEs for each component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The abstraction seems ok although quite a bit complicated - it is a bit concerning how much friction here is due to changes we're waiting on in ksonnet.
|
||
You need a version of ksonnet newer than the [0.7.0 release](https://github.com/ksonnet/ksonnet/releases). | ||
|
||
As of this writing, Heptio hasn't released any newer prebuilt binaries so you will need to [build from source](https://ksonnet.io/docs/build-install). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be a new release soon? This seems like a bit of friction until then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully, heptio might push a release this week but no guarantees.
|
||
Create the Kubeflow core component. The core component includes | ||
* JupyterHub | ||
* TensorFlow job controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename core
to training
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming something core
indicates that it's a dependency for other subsequent steps, which is not strictly true here I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JupyterHub isn't training specific and the CRD is needed for subsequent steps. The idea here is that we have certain core components that you probably only deploy once. Whereas for tf-serving and tf-job you probably have multiple ones.
I'm open to other names but training doesn't seem right.
Parameters can be set using `ks param` e.g. to set the Docker image used | ||
|
||
``` | ||
ks param set ${JOB_NAME} image ${IMAGE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead recommend the declarative method here? I think people would want to version their config as a reproducible artifact, and specifying it this way doesn't allow doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is versioned. ks param set modifies params.jsonnet which is checked into code control.
ks env add cloud | ||
ks param set --cloud=gke | ||
``` | ||
* The cloud parameter triggers a set of curated cloud configs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe what parameters this sets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now nothing, but it should control things like the CRD config for different clusters.
@@ -0,0 +1,29 @@ | |||
"""A simple utility to convert a manifest to corresponding jsonnet.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parts of this script seem brittle, what's the purpose of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a hack hence its location in "hacks" for converting our existing yaml manifests to jsonnet.
In a subsequent PR when we clean up the old stuff and move to ksonnet we can delete it.
|
||
### io.ksonnet.pkg.kubeflow-core | ||
|
||
Kubeflow core components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about naming parts of it core
"kind": "ServiceAccount", | ||
"metadata": { | ||
"labels": { | ||
"app": name + "nfs-provisioner" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the nfs-provisioner be its own component? Not sure if it is here already. I see it as - if it launches a new (optional) "thing" in the user's cluster, it should be its own component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally and long term yes. I think the ideal way to do this would be to have distinct packages for each thing; e.g a package for JupyterHub, a package for Spark, a package for the CRD etc...
We could then have one or more packages with one or more prototypes defining useful combinations of this components. Its not immediately clear what the organizing principle for those higher level packages should be.
In the short term though, its more convenient for users if they have fewer packages to install. So that was the motivation for putting them all in the same package.
Using more packages should be reasonable when ksonnet adds better support for managing package dependencies ; e.g. so that core can depend on some other packages and installing core automatically pulls in the other packages.
kubeflow/core/parts.yaml
Outdated
"keywords": [ | ||
"kubeflow", | ||
"tensorflow", | ||
"database" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
database?
@foxish this should be ready for another look. |
Please squash before merging. Also - can we have an issue tracking the fixes that need to be done afterwards (like removing the hack/ directory, separating out NFS provisioner, etc) |
* Jupyter is hitting problems in non-default namespaces because it relies on default service account.
…ving problems with NFS.
* Remove instructions to PRs that have been merged. * Assume the PR adding ksonnet registry to google/kubeflow has been merged.
* Auto generate README files for each component.
I had to force the merge because the CLA check wouldn't run. |
* initial seldon manifests * Namespaced and clusterwide deployment * Remove labels added by commonLabels * initial seldon operator kustomize update * add application and recreate * update from review * change name to seldon-core-operator * update service selector
Signed-off-by: Ce Gao <gaoce@caicloud.io>
[pull] master from kubeflow:master
Use ksonnet to customize/deploy Kubeflow.
Its intended to inform the discussion in Tooling to manage configuration and deployment #23
The current implementation supports JupyterHub, TfJob operator, TfServing
The README.ksonnet.md provides instructions on how to use ksonnet.
Provide a prototype for running the TfCnn benchmark.
The directory organization is based on what ksonnet expects. Specifically the directory within the repo gets checked out into the repo. So we use the directory kubeflow so that pkgs will be installed in ".../vendor/kubeflow/".
The core component has an option to specify a list of persistent disks to be exported and mounted via NFS.
/cc @aronchick @foxish