Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kaggle notebook Dockerfile #1109

Merged
merged 1 commit into from
Jul 15, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions components/contrib/kaggle-notebook-image/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.

# use basic syntax for now
FROM gcr.io/kaggle-images/python:latest

USER root

ENV DEBIAN_FRONTEND noninteractive

ENV NB_USER jovyan
ENV NB_UID 1000
ENV HOME /home/$NB_USER
# We prefer to have a global conda install
# to minimize the amount of content in $HOME
ENV CONDA_DIR=/opt/conda
ENV PATH $CONDA_DIR/bin:$PATH

# Use bash instead of sh
SHELL ["/bin/bash", "-c"]

# add https support
RUN apt-get update && apt-get install -yq --no-install-recommends apt-transport-https

RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \
locale-gen

ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8

# Create jovyan user with UID=1000 and in the 'users' group
# but allow for non-initial launches of the notebook to have
# $HOME provided by the contents of a PV
RUN useradd -M -s /bin/bash -N -u $NB_UID $NB_USER && \
chown -R ${NB_USER}:users /usr/local/bin && \
mkdir -p $HOME

RUN export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \
apt-get update && \
apt-get install -y google-cloud-sdk kubectl && \
# pin to 0.8.1 due to conflicts with tornado>=5.x and pyzmq>=17.x
# we just really need jupyterhub-singleuser for our KF scripts
pip install jupyterhub==0.8.1

# Install Tini - used as entrypoint for container
RUN cd /tmp && \
wget --quiet https://github.com/krallin/tini/releases/download/v0.10.0/tini && \
echo "1361527f39190a7338a0b434bd8c88ff7233ce7b9a4876f3315c22fce7eca1b0 *tini" | sha256sum -c - && \
mv tini /usr/local/bin/tini && \
chmod +x /usr/local/bin/tini

RUN chown -R ${NB_USER}:users $HOME

ENV GITHUB_REF https://raw.githubusercontent.com/kubeflow/kubeflow/master/components/tensorflow-notebook-image

ADD --chown=jovyan:users $GITHUB_REF/jupyter_notebook_config.py /tmp

# Wipe $HOME for PVC detection later
WORKDIR $HOME
RUN rm -fr $(ls -A $HOME)

# Get init scripts from kubeflow
ADD --chown=jovyan:users \
$GITHUB_REF/start-singleuser.sh \
$GITHUB_REF/start-notebook.sh \
$GITHUB_REF/start.sh \
$GITHUB_REF/pvc-check.sh \
/usr/local/bin/

RUN chmod a+rx /usr/local/bin/*

# Configure container startup
EXPOSE 8888
ENTRYPOINT ["tini", "--"]
CMD ["start-notebook.sh"]
17 changes: 17 additions & 0 deletions components/contrib/kaggle-notebook-image/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Kaggle notebook

This Dockerfile builds an image that is derived from the latest [Kaggle python image](gcr.io/kaggle-images/pythoni:latest) but which is compatible for launching from the Kubeflow JupyterHub. [Kaggle](https://www.kaggle.com/) is the home of data science collaboration and competition.

Important notes:
* this notebook is not curated by the Kubeflow project and is not regularly tested
* the versions of TensorFlow, PyTorch, and the other libraries included may change at any time
* this is a very large notebook, over 21 Gb in size. Since our notebook uses the latest Kaggle image, docker pulls (and notebook launches) can take a lengthy period of time.
* the base image size for docker devicemapper is 10 Gb, which won't be large enough to run this image. Your docker daemon must be configured for at least 30 Gb (`-storage-opt dm.basesize=30G`) or use a storage driver like overlay2.
* the Kaggle image includes TensorFlow 1.9 or greater built with AVX2 support, so the image may not run on older CPU
* the other Kubeflow curated notebooks have the feature that the jovyan user is able to install new packages from pip or conda. Unfortunately this is not the case with the Kaggle image due to the impact on the image size due to adding a new ownership layer.

To build the image run:
```
docker build --pull -t kubeflow-kaggle-notebook:latest .
```
Of course specify whatever repo and image tag you need for your purposes.