Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ksonnet to configure/deploy Kubeflow #36

Merged
merged 55 commits into from
Dec 20, 2017
Merged

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented Dec 14, 2017

Use ksonnet to customize/deploy Kubeflow.

  • Its intended to inform the discussion in Tooling to manage configuration and deployment #23

  • The current implementation supports JupyterHub, TfJob operator, TfServing

  • The README.ksonnet.md provides instructions on how to use ksonnet.

  • Provide a prototype for running the TfCnn benchmark.

  • The directory organization is based on what ksonnet expects. Specifically the directory within the repo gets checked out into the repo. So we use the directory kubeflow so that pkgs will be installed in ".../vendor/kubeflow/".

  • The core component has an option to specify a list of persistent disks to be exported and mounted via NFS.

/cc @aronchick @foxish

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this State. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@jlewi
Copy link
Contributor Author

jlewi commented Dec 18, 2017

This is ready for review.

/cc @foxish @vishh

@jlewi jlewi changed the title [Wip] Example of using ksonnet to configure/deploy Kubeflow Use ksonnet to configure/deploy Kubeflow Dec 18, 2017

## Build ks

ksonnet doesn't support adding non default registries yet ((ksonnet/ksonnet/issues/38)[https://github.com/ksonnet/ksonnet/issues/38]) so we need to modify and [build from source](https://ksonnet.io/docs/build-install).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub isn't rendering this link correctly. I believe it should be: [ksonnet/ksonnet/issues/38](https://github.com/ksonnet/ksonnet/issues/38)

go get github.com/ksonnet/ksonnet
```

Checkout PR 228 which adds support for registry add

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been merged: ksonnet/ksonnet@29e3cd4

@jlewi
Copy link
Contributor Author

jlewi commented Dec 19, 2017

Thanks @paulbaumgart PTAL

@jlewi
Copy link
Contributor Author

jlewi commented Dec 19, 2017

@foxish @vishh Do you want to review this before I merge it in?

@paulbaumgart
Copy link

@jlewi The README.ksonnet.md updates look good. Might be worth taking another pass at https://github.com/jlewi/kubeflow/blob/ksonnet_crd/kubeflow/README.md (perhaps just linking to the README.ksonnet.md section?)

The doc-gen link in kubeflow/README.md also seems to be broken.

@jlewi
Copy link
Contributor Author

jlewi commented Dec 19, 2017

@paulbaumgart Thanks. I also auto-generated READMEs for each component.

Copy link
Contributor

@foxish foxish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstraction seems ok although quite a bit complicated - it is a bit concerning how much friction here is due to changes we're waiting on in ksonnet.


You need a version of ksonnet newer than the [0.7.0 release](https://github.com/ksonnet/ksonnet/releases).

As of this writing, Heptio hasn't released any newer prebuilt binaries so you will need to [build from source](https://ksonnet.io/docs/build-install).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a new release soon? This seems like a bit of friction until then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, heptio might push a release this week but no guarantees.


Create the Kubeflow core component. The core component includes
* JupyterHub
* TensorFlow job controller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename core to training?

Copy link
Contributor

@foxish foxish Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming something core indicates that it's a dependency for other subsequent steps, which is not strictly true here I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JupyterHub isn't training specific and the CRD is needed for subsequent steps. The idea here is that we have certain core components that you probably only deploy once. Whereas for tf-serving and tf-job you probably have multiple ones.

I'm open to other names but training doesn't seem right.

Parameters can be set using `ks param` e.g. to set the Docker image used

```
ks param set ${JOB_NAME} image ${IMAGE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead recommend the declarative method here? I think people would want to version their config as a reproducible artifact, and specifying it this way doesn't allow doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is versioned. ks param set modifies params.jsonnet which is checked into code control.

ks env add cloud
ks param set --cloud=gke
```
* The cloud parameter triggers a set of curated cloud configs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe what parameters this sets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now nothing, but it should control things like the CRD config for different clusters.

@@ -0,0 +1,29 @@
"""A simple utility to convert a manifest to corresponding jsonnet."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parts of this script seem brittle, what's the purpose of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a hack hence its location in "hacks" for converting our existing yaml manifests to jsonnet.
In a subsequent PR when we clean up the old stuff and move to ksonnet we can delete it.


### io.ksonnet.pkg.kubeflow-core

Kubeflow core components
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about naming parts of it core

"kind": "ServiceAccount",
"metadata": {
"labels": {
"app": name + "nfs-provisioner"
Copy link
Contributor

@foxish foxish Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the nfs-provisioner be its own component? Not sure if it is here already. I see it as - if it launches a new (optional) "thing" in the user's cluster, it should be its own component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally and long term yes. I think the ideal way to do this would be to have distinct packages for each thing; e.g a package for JupyterHub, a package for Spark, a package for the CRD etc...

We could then have one or more packages with one or more prototypes defining useful combinations of this components. Its not immediately clear what the organizing principle for those higher level packages should be.

In the short term though, its more convenient for users if they have fewer packages to install. So that was the motivation for putting them all in the same package.

Using more packages should be reasonable when ksonnet adds better support for managing package dependencies ; e.g. so that core can depend on some other packages and installing core automatically pulls in the other packages.

"keywords": [
"kubeflow",
"tensorflow",
"database"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database?

@jlewi
Copy link
Contributor Author

jlewi commented Dec 19, 2017

@foxish this should be ready for another look.

foxish
foxish previously approved these changes Dec 20, 2017
@foxish
Copy link
Contributor

foxish commented Dec 20, 2017

Please squash before merging. Also - can we have an issue tracking the fixes that need to be done afterwards (like removing the hack/ directory, separating out NFS provisioner, etc)

@jlewi
Copy link
Contributor Author

jlewi commented Dec 20, 2017

I had to force the merge because the CLA check wouldn't run.

kimwnasptd pushed a commit to arrikto/kubeflow that referenced this pull request Mar 5, 2019
yanniszark pushed a commit to arrikto/kubeflow that referenced this pull request Nov 1, 2019
* initial seldon manifests

* Namespaced and clusterwide deployment

* Remove labels added by commonLabels

* initial seldon operator kustomize update

* add application and recreate

* update from review

* change name to seldon-core-operator

* update service selector
yanniszark pushed a commit to arrikto/kubeflow that referenced this pull request Feb 15, 2021
Signed-off-by: Ce Gao <gaoce@caicloud.io>
VaishnaviHire pushed a commit to VaishnaviHire/kubeflow that referenced this pull request Dec 6, 2022
[pull] master from kubeflow:master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants