New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a minikube setup script. #1387
Conversation
scripts/setup-minikube.sh
Outdated
|
||
# if user requested a local fs path to be mounted, make it accessible via | ||
# Jupyter Notebooks | ||
function mount_local_fs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To better fit this into the kfctl.sh pattern;
pv.yaml should be dumped to the k8s_specs directory during the generate phase of the kfctl command. And it should then be created during the kfctl apply phase
scripts/setup-minikube.sh
Outdated
kubectl create -n kubeflow -f ./pv-claim.yaml | ||
|
||
# update tf-hub stateful set env to use the claim | ||
kubectl -n kubeflow set env statefulset tf-hub -e KF_PVC_LIST=local-notebooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this in kfctl.sh during the generate phase?
Can we check if platform is minikube and then set ksonnet parameters to set these parameters on tf-hub?
Right now this script is wrapping kfctl.sh; can we instead have kfctl.sh call out to the functions in this script? One time setup functions can be called during the init phase; e.g.
|
scripts/setup-minikube.sh
Outdated
../kfctl.sh apply all | ||
popd | ||
|
||
if is_kubeflow_ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per other comments; if we call this from kfctl.sh then most of the functionality up to this line should already be handled by kfctl.sh
@jlewi I think its ready for review now. Updated based on your comments.
When a new notebook is spawned it should automatically mount the local fs directory if specified and shows up as local-notebooks alongside work directory. I haven't been able to test the parameterizing the jupyterhub ksonnet part. Since the script pulls all the sonnets etc. from github, I will retest after this is merged and send any fixes if needed. |
You should be able to run kfctl.sh directly from you local git repo and it should use whatever files you have in your local copy |
@@ -13,6 +13,8 @@ | |||
// @optionalParam repoName string kubeflow-images-public The repository name for JupyterNotebook. | |||
// @optionalParam disks string null Comma separated list of Google persistent disks to attach to jupyter environments. | |||
// @optionalParam gcpSecretName string user-gcp-sa The name of the secret containing service account credentials for GCP | |||
// @optionalParam notebookUid string 1000 UserId of the host user for minikube local fs mount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we changing the default behavior? It doesn't look like KubeFormSpawner was setting the UID/GID before but now it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is doing this now only in minikube case. This is to make sure that the locally mounted filesystem is read/writable via Jupyter Notebook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updating to nulls.
kubeflow/core/kubeform_spawner.py
Outdated
if env_uid and env_uid != 'null': | ||
c.KubeSpawner.singleuser_uid = int(env_uid) | ||
env_gid = os.environ.get('NOTEBOOK_GID') | ||
if env_gid and env_gid != 'null': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this break by default? It looks like you are defaulting env_gid to 100. So you will now add a post pod hook that tries to mount local notebooks into the container and that command will presumably fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. Updating above default values of params to null
kubeflow/core/kubeform_spawner.py
Outdated
env_gid = os.environ.get('NOTEBOOK_GID') | ||
if env_gid and env_gid != 'null': | ||
c.KubeSpawner.singleuser_fs_gid = int(env_gid) | ||
def modify_pod_hook(spawner, pod): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we guard the lifecycle with its own hook? Maybe define an environment variable for the directory containing the notebooks.
I worry it might be brittle to infer this based on gid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a separate env var for this.
@@ -177,6 +177,26 @@ def _expand_user_properties(self, template): | |||
c.KubeSpawner.singleuser_working_dir = '/home/jovyan' | |||
volumes = [] | |||
volume_mounts = [] | |||
|
|||
# Allow environment vars to override uid and gid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pdmack Would you mind taking a look at this code? I remember it taking us a while to get uid and gid to work correctly when using a PV backing the home directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I have been able to test that I can access the local directory when specified via the Jupyter Notebook and create/modify notebooks.
Also, if not specified, the notebook behaves as earlier.
I haven't tested after creating the corresponding jsonnet params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed this. Merged but I'll take a closer look this weekend anyway.
|
||
# Allow environment vars to override uid and gid. | ||
# This allows local host path mounts to be read/writable | ||
env_uid = os.environ.get('NOTEBOOK_UID') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhi-g How does this compare to the current behavior? i.e. if you run on GKE with a PV for /home/jovyan how does this code changing the UID and GID change what user and group we run the notebook as?
We had a lot of trouble getting the permissions correct when using a PV for the home directory.
The thing to do would be to run it on GKE with the default settings and verify that the user had RW access to the home directory /home/jovyan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll definitely test that. Except, last time I tried, I still get stuck with IAM issues on GKE
scripts/kfctl.sh
Outdated
@@ -15,9 +15,14 @@ WHAT=$2 | |||
|
|||
ENV_FILE="env.sh" | |||
|
|||
KUBEFLOW_VERSION=${KUBEFLOW_VERSION:-"master"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need KUBEFLOW_VERSION?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To download the appropriate repository archive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be set here, because createEnv is called after downloading corresponding shell scripts that are needed by kfctl.sh.
scripts/kfctl.sh
Outdated
@@ -15,9 +15,14 @@ WHAT=$2 | |||
|
|||
ENV_FILE="env.sh" | |||
|
|||
KUBEFLOW_VERSION=${KUBEFLOW_VERSION:-"master"} | |||
KUBEFLOW_DEPLOY=${KUBEFLOW_DEPLOY:-true} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need KUBEFLOW_DEPLOY anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
scripts/kfctl.sh
Outdated
KUBEFLOW_VERSION=${KUBEFLOW_VERSION:-"master"} | ||
KUBEFLOW_DEPLOY=${KUBEFLOW_DEPLOY:-true} | ||
K8S_NAMESPACE=${K8S_NAMESPACE:-"kubeflow"} | ||
KUBEFLOW_CLOUD=${KUBEFLOW_CLOUD:-"minikube"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of KUBEFLOW_CLOUD? This should be provided by the parameter --platform passed on kfctl init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
scripts/kfctl.sh
Outdated
@@ -15,9 +15,14 @@ WHAT=$2 | |||
|
|||
ENV_FILE="env.sh" | |||
|
|||
KUBEFLOW_VERSION=${KUBEFLOW_VERSION:-"master"} | |||
KUBEFLOW_DEPLOY=${KUBEFLOW_DEPLOY:-true} | |||
K8S_NAMESPACE=${K8S_NAMESPACE:-"kubeflow"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need K8S_NAMESPACE to be provided as an environment variable?
This should probably be a parameter of kfctl init if its really needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already defined in CREATE_ENV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed K8S_NAMESPACE here as well
scripts/kfctl.sh
Outdated
set +x | ||
infer_minikube_settings | ||
set -x | ||
download_kubeflow_source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you calling download_kubeflow_source here?
The needed source will already be there; source is downloaded by
https://github.com/kubeflow/kubeflow/blob/master/scripts/download.sh
The source is located based on the location of kfctl.sh
Once you get rid of it it will use your local code and you can test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the local experience, I don't like the idea of a user downloading the kubeflow source first. In fact, in its current form, they only need to download kfctl.sh, and then:
kfctl.sh init appName --platform minikube
cd appName
../kfctl.sh generate all
../kfctl.sh apply all
scripts/kfctl.sh
Outdated
echo HOST_PLATFPORM=${HOST_PLATFPORM} >> ${ENV_FILE} | ||
echo MOUNT_LOCAL=${MOUNT_LOCAL} >> ${ENV_FILE} | ||
echo MINIKUBE_CMD=\"${MINIKUBE_CMD}\" >> ${ENV_FILE} | ||
echo KUBEFLOW_REPO="$(pwd)/kubeflow-${KUBEFLOW_VERSION}" >> ${ENV_FILE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KUBEFLOW_REPO is already defined on line 24
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the experience described above, download_kubeflow_source will get a copy of the repo into the appName dir. This overrides KUBEFLOW_REPO to point to that location.
scripts/kfctl.sh
Outdated
@@ -92,6 +107,17 @@ createEnv() { | |||
fi | |||
} | |||
|
|||
# For minikube single script download experience | |||
function download_kfctl_scripts() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete this;
We have a separate script to do the download.sh
https://github.com/kubeflow/kubeflow/blob/master/scripts/download.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can leverage this script to download everything.. But from an experience point of view:
- download the download script.
- run it to download the source and scripts.
- find out where kfctl.sh is.
- either run from within that directory (creating appName dir there) or move the scripts out elsewhere and then make sure source is pointing to right location.
Seems a bit more demanding than downloading and running a VM image. The goal with initial local experience is to keep it really simple first.
scripts/kfctl.sh
Outdated
download_kfctl_scripts | ||
fi | ||
|
||
source "${DIR}/util.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this back to the top; they will already exist because kfctl.sh doesn't download the scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 7 unresolved discussions (waiting on @jlewi, @abhi-g, and @richardsliu)
kubeflow/core/kubeform_spawner.py, line 187 at r3 (raw file):
Previously, abhi-g (Abhishek Gupta) wrote…
You are correct. Updating above default values of params to null
In the jsonnet can we avoid setting the environment variables if we don't want to override the values, then env_gid and env_uid should be None if the environment variables aren't set
kubeflow/core/prototypes/jupyterhub.jsonnet, line 16 at r3 (raw file):
Previously, jlewi (Jeremy Lewi) wrote…
Did you forget to push, not seeing the change?
Isn't 0 a valid uid? i.e. doesn't 0 mean root?
Should we use -1?
Then in the jsonnet only set the environment variable if its >= 0.
Then in the Python code os.getEnv should return None and we can use that to determine its not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 8 unresolved discussions (waiting on @jlewi, @abhi-g, and @richardsliu)
scripts/kfctl.sh, line 18 at r7 (raw file):
ENV_FILE="env.sh" KUBEFLOW_DOCKER_REGISTRY=${KUBEFLOW_DOCKER_REGISTRY:-""}
I think this should be in createEnv
Right now we are only doing it for platform ack but we should just always do it
We only want to inherit environment variables from the environment when init is called
after that we want to get them from env.sh so that we always use the same values.
Just a couple nits and then this should be ready to go. |
Addressed your last couple of comments @jlewi |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
adding back lgtm label. Only fixed a 1 line conflict with upstream. /lgtm |
@abhi-g: you cannot LGTM your own PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Good to know @k8s-ci-robot |
/lgtm |
fixes #1153 |
* Added a minikube setup script. * Make local-notebooks accessible via Jupyter Notebooks * Restructure minikube deployment script to work with kfctl patterns * Cleanup * fix comparison in jupyterhub.libsonnet for the new params * Fix jsonnet string param parsing issues. * Addressing some of jlewi's comments * Refactored to split out kfctl.sh and setup-minikube.sh * Remove unused function. * Updated to use -1 on the env vars for UID / GID * Move Docker registry env var into createEnv
Adding a shell script to make it easier for someone trying to setup local Minikube based Kubeflow deployment.
Some features:
/hold
This change is