Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pytorch jupyter notebooks? #1057

Closed
jlewi opened this issue Jun 21, 2018 · 31 comments
Closed

pytorch jupyter notebooks? #1057

jlewi opened this issue Jun 21, 2018 · 31 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Jun 21, 2018

With the PyTorch operator coming should we add better support for PyTorch to our notebooks.

I can think of a couple options

  1. Add PyTorch to our existing Jupyter notebooks
  2. Create a new set of PyTorch Jupyter images
  3. Use someone elses images
  4. Use the Kaggle image (assuming it supports PyTorch) [discussion] How can we play well with Kaggle? #258

/cc @johnugeorge
/cc @pdmack

@jlewi jlewi added area/jupyter Issues related to Jupyter area/0.3.0 labels Jun 21, 2018
@pdmack
Copy link
Member

pdmack commented Jun 21, 2018

Users can do conda install pytorch torchvision -c pytorch themselves in the current NB, right?

@ankushagarwal
Copy link
Contributor

With the PyTorch operator coming should we add better support for PyTorch to our notebooks.

Supporting <framework>'s operator should be orthogonal to supporting <framework> in our jupyter notebook.

+1 for @pdmack suggestion. Generally speaking our story for supporting a pip / python package in jupyter notebook should be pip install or conda install

@johnugeorge
Copy link
Member

johnugeorge commented Jun 22, 2018 via email

@jlewi
Copy link
Contributor Author

jlewi commented Jun 22, 2018

I think we should try to provide a wholistic experience for other frameworks just like we do for TF.

So now that we are adding PyTorch support we should think about supporting it across the stack e.g

  • Notebooks
  • Training
  • Serving
  • Monitoring

@pdmack
Copy link
Member

pdmack commented Jun 22, 2018

Regarding option 1, the name of the notebooks advertise TF 1.x. I don't know if that would be an impedance to pytorch users where the installed version is more or less unknown from just browsing image names.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 22, 2018

@pdmack Agreed but we could always rename the images.

For me the question is what's the right balance between the overhead of maintaining a set of container images, providing a container that has what you need to do things out of the box, and keeping image size tolerable.

I like the idea of building an uber, kitchen sync container like Kaggle. Mostly as an experiment to see whether customers find that extreme solution valuable.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 22, 2018

Also I think it would be really valuable if users could run any Kaggle solution right out of the box.

@pdmack
Copy link
Member

pdmack commented Jun 22, 2018

What about that? A big, honkin' Kaggle image that we de-couple from the release cadence. Latest of all the frameworks, lightly curated, and loosely maintained. At least to start. Keep it out of the spawner defaults but promote it through docs, website. Make it clear that it's a long pull, other caveats, etc.

@johnugeorge
Copy link
Member

I feel, there is always a value in providing what user needs rather than asking them to install something before using it.

@pdmack
Copy link
Member

pdmack commented Jun 22, 2018

FYI, just tried to build the kaggle image but ran out of room in my base dm. But, it's at least 59 minutes in and approaching 12 Gb in size.

@pdmack
Copy link
Member

pdmack commented Jun 22, 2018

I'm skeptical that a multi-stage build with the Kaggle image would work but I'll have a look. But we could consider deriving a Kaggle image from our TF 1.8 image and adding in the missing parts. I'm guessing there's a good amount of overlap.

@pdmack
Copy link
Member

pdmack commented Jun 22, 2018

Oh my...

kaggle/python latest cdc6ffe1b12c 2 months ago 16.8GB

https://gist.github.com/pdmack/6bea356917d6edbad0eccf46a27970eb

@pdmack
Copy link
Member

pdmack commented Jun 22, 2018

Successfully built 178635b0097c
Successfully tagged kaggle/python-build:latest
real	119m34.774s

"gulp"
This was on a somewhat older Xeon CPU but it's a 24-way and has 48 Gb RAM.
Intel(R) Xeon(R) CPU X5670 @ 2.93GHz

@jlewi
Copy link
Contributor Author

jlewi commented Jun 28, 2018

@pdmack This is pretty cool. Were you able to start Jupyter in the Kaggle image and use it with JupyterHub?

@pdmack
Copy link
Member

pdmack commented Jun 28, 2018

No, it doesn't have start-singleuser.sh in place. Note that this wasn't the proposed multi-stage build or anything like that, just the upstream Dockerfile build. I supposed I could still look at that with doing a COPY of our /usr/local/bin/ scripts. But 2 hours for an almost 20Gb image? Do we want to go there?

@pdmack
Copy link
Member

pdmack commented Jun 28, 2018

This is as far as I got reusing the kaggle docker image and adding our special sauce:

Traceback (most recent call last):
  File "/opt/conda/bin/jupyterhub-singleuser", line 3, in <module>
    from jupyterhub.singleuser import main
  File "/opt/conda/lib/python3.6/site-packages/jupyterhub/singleuser.py", line 34, in <module>
    from notebook.notebookapp import (
  File "/opt/conda/lib/python3.6/site-packages/notebook/notebookapp.py", line 40, in <module>
    ioloop.install()
  File "/opt/conda/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 210, in install
    assert (not ioloop.IOLoop.initialized()) or \
AttributeError: type object 'IOLoop' has no attribute 'initialized'

https://github.com/pdmack/kubeflow/tree/kaggle-nb-image

@jzf2101
Copy link

jzf2101 commented Jun 29, 2018

2c from the binder ecosystem- I've seen people install PyTorch and torchvision on binder after creating a conda environment using @soumith 's conda channel, though this is not the latest installation instructions

See example:

https://github.com/jrzech/reproduce-chexnet/blob/master/postBuild

@jlewi
Copy link
Contributor Author

jlewi commented Jun 29, 2018

@pdmack Point taken.

Perhaps we should create an examples container intended to be used for running the examples. By definition, the image will contain the union of dependencies needed to run the Kubeflow examples. So it will grow over time until i) it becomes too large or ii) version conflicts.

By that definition it would make sense to start building an image with PyTorch and TF.

For 0.3 I'd like to be able to

  • Offer Users Click To Deploy Kubeflow
  • Dump user into Jupyter
  • User walks through one or more Kubeflow codelab in JupyterLab

I'd like users to be able to do everything from JupyterLab; i.e. JupyterLab provides a notebook editor, basic text editor, and terminal which is sufficient for running the codelabs.

@jlewi
Copy link
Contributor Author

jlewi commented Jul 6, 2018

Looks like @pdmack published a version of the image in
gcr.io/kubeflow-dev/kubeflow-kaggle-notebook:latest

@jlewi
Copy link
Contributor Author

jlewi commented Jul 6, 2018

I retagged into gcr.io/kubeflow-images-public/kaggle-notebook:v20180629

I retagged it using Google Container Builder; trying to use gcloud container add tag choked.
Here's the GCB config
https://github.com/jlewi/kubeflow-dev/tree/master/kaggle-image

@jlewi
Copy link
Contributor Author

jlewi commented Aug 10, 2018

Lets add PyTorch to our codelab notebook image see #1157 and then close this out.

@chrisheecho
Copy link

/remove-priority p2

@jlewi
Copy link
Contributor Author

jlewi commented Dec 3, 2018

Anyone working on this?
Should we punt this to 0.5?

@carmine
Copy link

carmine commented Dec 4, 2018

Move to 0.5.0, same priority.

@jlewi jlewi removed this from the 0.4.0 milestone Dec 17, 2018
@jlewi jlewi added this to New in 0.5.0 via automation Dec 17, 2018
@jlewi jlewi removed this from To do in 0.4.0 Dec 17, 2018
@jlewi jlewi moved this from New to Build / Train / Deploy from notebook in 0.5.0 Jan 6, 2019
@jlewi
Copy link
Contributor Author

jlewi commented Feb 4, 2019

Downgrading to P2 since we are focusing on xgboost and TF.

I think a good next step would be to try to use some stock Jupyter images for PyTorch with the new notebook CR.

Ideally, we'd like existing jupyter images to just work so that we don't need to build custom images; see #2208. This should be more doable now that we no longer use JupyterHub.

@jlewi jlewi added this to New in 0.6.0 via automation Mar 10, 2019
@jlewi jlewi removed this from Build / Train / Deploy from notebook in 0.5.0 Mar 10, 2019
@siddsuresh97
Copy link

siddsuresh97 commented Mar 21, 2019

Hello @jlewi ,
I'm new to contributing to open source. I would like to work on this issue. Could you help me on getting started?

@jlewi
Copy link
Contributor Author

jlewi commented Apr 22, 2019

@siddsuresh97 Docs for creating custom Jupyter images are here
https://www.kubeflow.org/docs/notebooks/custom-notebook/

So I think if you were interested you could start by building a jupyter image suitable for PyTorch and making it work with Kubeflow. You could then publish it on DockerHub and provide instructions here or in kubeflow/website on how people could use it for PyTorch.

@stale
Copy link

stale bot commented Jul 21, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@jlewi jlewi added this to To Do in Needs Triage Jul 26, 2019
@stale stale bot closed this as completed Jul 28, 2019
Needs Triage automation moved this from To Do to Closed Jul 28, 2019
@jlewi jlewi removed this from Closed in Needs Triage Aug 2, 2019
@davidspek
Copy link
Contributor

@jlewi I was going through the Notebook issues and found this one. I created a PyTorch image based on the jupyter/scipy-notebook that seems to work. It also contains TensorFlow, not sure if that is an unnecessary combination or not (I don't work with either), but it can be easily removed if necessary. Hopefully this helps, otherwise I can do more testing if you specify what would need to be done. Here is the dockerfile:

# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
ARG BASE_CONTAINER=jupyter/scipy-notebook
FROM $BASE_CONTAINER

LABEL maintainer="Jupyter Project <jupyter@googlegroups.com>"

# Install Tensorflow
RUN pip install --quiet --no-cache-dir \
    'tensorflow==2.3.0' && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

USER $NB_UID

RUN conda config --system --append channels pytorch

RUN conda install --quiet --yes -c pytorch \
    'pytorch' \
    'torchvision' \
    'cpuonly' \
    && \
    conda clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

# Configure container startup
EXPOSE 8888
USER jovyan
ENTRYPOINT ["tini", "--"]
CMD ["sh","-c", "jupyter lab --notebook-dir=/home/${NB_USER} --ip=0.0.0.0 --no-browser --allow-root --port=8888 --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.allow_origin='*' --NotebookApp.base_url=${NB_PREFIX}"]

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.57

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021
* Add first run condition in BO Suggestion

* Tell to Optimizer only about new Trials

* Logging
Return new trials in each getSuggestion call

* Small fix log

* Remove n_points from ask

* Fix log

* Add newline to log

* Change log

* Change dict to list of recorded trials

* Get search space only for the first run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
0.6.0
  
New
Development

No branches or pull requests