New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify NB Dockerfile and start scripts to support mount of /home/jovyan #786
Conversation
Can you run autoformat_jsonnet to fix lint issues? The minikube failure should have been transient. /test all |
What is being installed in /home/jovyan ? Could that be installed someplace else? |
We can install anywhere writable in the base image (i.e., |
ff52482
to
aa594a2
Compare
I'm perfectly fine with user installed packages not persisting across pod restarts. This conforms to my normal expectation that if a pod restarts you get back the original OS as baked into the image. From a user perspective, I think the fact that a pod restart terminates your kernel and you lose all state is more problematic. So in that regard Jupyter notebooks need to have high availability. If users want to install additional packages and be resilient to restarts I think the way to do that is to build a custom docker image which has the additional packages. So my preference would be to
So I'd be in favor of installing into /opt or /usr even if it means any additional user installed packages would need to be reinstalled on restart. |
OK, apparently there’s misgivings about shared, multi-user conda installs since it is by design geared for separate user environments. However, in our case it is indeed a one-user env. I’ll have a look tomorrow. |
11d81af
to
bbeedae
Compare
bbeedae
to
7d6136b
Compare
jupyter labextension install @jupyter-widgets/jupyterlab-manager | ||
jupyter labextension install @jupyter-widgets/jupyterlab-manager && \ | ||
# Do chown in this layer for significant size savings | ||
chown -R ${NB_USER}:users $HOME && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
||
# Setup work directory for backward-compatibility | ||
RUN mkdir /home/$NB_USER/work | ||
|
||
# Install Tini - used as entrypoint for container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a comment that we want a single layer to minimize size so that people don't split it up later into multiple commands not realizing the benefit of using a single layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -16,7 +16,7 @@ ENV DEBIAN_FRONTEND noninteractive | |||
ENV NB_USER jovyan | |||
ENV NB_UID 1000 | |||
ENV HOME /home/$NB_USER | |||
ENV CONDA_DIR=$HOME/.conda | |||
ENV CONDA_DIR=/opt/conda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining why we use /opt/conda and not /home
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
RUN useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \ | ||
chown -R ${NB_USER}:users /usr/local/bin | ||
# but allow for non-initial launches of the notebook to have | ||
# $HOME provided by the contents of a PV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we doing to allow non-initial launches of the notebook to have HOME provided by contents of a PV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below.
|
||
# Install common packages from requirements.txt for both python2 and python3 | ||
# NB: the COPY chown can't expand a bash variable for NB_USER | ||
COPY --chown=jovyan:users requirements.txt $HOME/requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use $HOME here as opposed to some directory in /opt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below.
RUN pip --no-cache-dir install -r $HOME/requirements.txt && \ | ||
source activate py2 && \ | ||
pip --no-cache-dir install -r $HOME/requirements.txt | ||
|
||
# Tar and delete staged content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do this? I thought the whole point of installing in /opt was to avoid having to create a tarball and then installing that tarball?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below.
# limitations under the License. | ||
|
||
# stored in the NB Dockerfile | ||
SRC_TAR=/tmp/$NB_USER.tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do this? Can't we just pip install the packages into a directory other than home? So that we don't have to reinstall them on startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlewi I'm kind of struggling here. It seems like there is a minimal set of content from the Dockerfile build that should be put aside and then loaded into a "fresh" /home/jovyan
including:
- gcloud config
- jupyter_notebook_config.py
- requirements.txt (at least as a user-visible record of what was installed...sure they can do pip list, I suppose...)
If the PV is empty then they will have none of that since it will mask that content.
/cc @ankushagarwal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deciding what we want and what we don't want to put on a PV is really tricky. I can argue that we might want to put python packages into PV as well since it's very unlikely that users will be using only the packages pre-installed in the image - users would want those packages to survive container restarts as well.
Putting things in PV is also really tricky because to put things on PV, we can't do it at build time and have to do it at runtime with solutions like copying the tarball. I don't have a good solution on how to do this cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing precludes a user from installing whatever they want into their home directory / PV after they've started the pod.
If at all possible I'd like to avoid having the Dockerfile put anything in /home/jovyan so we can avoid the complexity of having to copy files into /home after mounting the PVC.
gcloud config - I don't think we need this. My guess is it will be created automatically if a user runs gcloud.
requirements.txt I don't think a user visible record is that important. So we can just move it somewhere else or just forget about it for now.
.jupyter/jupyter_notebook_config.py - Looks like this is autocreated when you start a notebook.
The only file I think we need is .bashrc
and I think it would be fine to copy that on startup to the PVC (although even better would be if it uses the system default by default).
I ran a simple experiment and removed all the "." directories in my home directory and launched a notebook; it ran fine.
So I think if you've installed all the pip packages in a directory other than /home/jovyan I think we should be good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jlewi, we seem to be fairly opinionated in the current Dockerfile....
gcloud:
gcloud config set core/disable_usage_reporting true && \
gcloud config set component_manager/disable_update_check true && \
gcloud config set metrics/environment github_docker_image
.jupyter/jupyter_notebook_config.py:
We definitely don't generate a default for this in the Dockerfile with jupyter notebook --generate-config
. Instead we have a custom one copied in that seems specifically crafted.
requirements.txt:
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willingc @yuvipanda @foxish Any reason why we would need a custom jupyternotebook_config?
For the gcloud config its fine to overwrite those defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if there is no other content to preserve then I will update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I suggest we try it out and if we found we blew away something we shouldn't have we can add it back.
7d6136b
to
1106e28
Compare
1106e28
to
dd63deb
Compare
/retest |
/retest |
@jlewi PTAL
|
This is great. Thank you so much for doing this. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
BTW I took the liberty of updating the PR description. I filed #854 to verify it is working. |
* fix: Add log Signed-off-by: Ce Gao <gaoce@caicloud.io> * WIP Signed-off-by: Ce Gao <gaoce@caicloud.io>
Put all of /home/jovyan on a PVC when using PVC with Jupyter notebooks.
hidden by the PVC once its mounted
start into /home/jovyan so that they will be copied onto the PVC
Fixes #561
/cc @jlewi
/cc @inc0
Pretty much what @jlewi described. Dockerfile installs conda, pip, etc. in situ at/home/jovyan
then we tar it up to/tmp
and remove the content at/home/jovyan
. Thenstart-singleuser.sh
invokes a newpvc-check.sh
which determines whether or not we init a new/home/jovyan
, regardless of whether it is mounted or not (it doesn't know). If it detects content there from a PVC it just re-installs from that pip cache. If nothing is there it untars to that location which may or may not be a mount (doesn't matter).~~The downside of the tar technique is that it unfortunately adds a 2.5 GB (mostly
.conda
) layer to the NB image, but I'm at a loss for a cleaner way to do this. I don't think it's worth doing compression in the Docker build, nor decompression at pod launch. ~~I tested with a hostPath PV like this. Obviously that dir needs to be
chmod 777
for the hostPath to work.Could use some tips/help/suggestions on e2e testing this. README updates can follow but IMHO we better agree on this strategy first.
This change is