Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construction and Deployment of a Docker image #5

Merged
merged 20 commits into from
Aug 21, 2023

Conversation

tschm
Copy link
Contributor

@tschm tschm commented Aug 21, 2023

You need to control how can introduce tags. You need your own Dockerhub account...

@jonathan-taylor
Copy link
Collaborator

This launches a huge download (at least on first time) several GB. I guess this is related to a one-time download of the jupyter/scipy-notebook. I presume this download wouldn't have to happen for further updates but not sure.

I also get an error with docker run -p 8888:8888 tschm/islp_labs:v0.0.1 because I happen to be using ports 8888,8889 with jupyter lab. So, this is another detail a user would need to check.

Choosing a proper port the log gives me a link that should point me to a jupyter server but these links don't work on chrome. Perhaps the 8888 port is hard-coded into the docker image so I'm out of luck if my port 8888 is in use?

log.txt

@jonathan-taylor
Copy link
Collaborator

Also, log indicates that this image is for different architecture than my. My Mac is an M1, probably this was built for Intel? Still runs, but not sure if this is an issue -- do the docker images depend on an architecture?

@jonathan-taylor
Copy link
Collaborator

Overall, I think this can wait until we actually have several people who want an "official" docker image.

By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

Also, log indicates that this image is for different architecture than my. My Mac is an M1, probably this was built for Intel? Still runs, but not sure if this is an issue -- do the docker images depend on an architecture?

Yes, docker images a very much an ubuntu thing. That's a huge advantage as you can use them on Windows, Mac or Ubuntu. I am using a Mac with M1, too

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

This launches a huge download (at least on first time) several GB. I guess this is related to a one-time download of the jupyter/scipy-notebook. I presume this download wouldn't have to happen for further updates but not sure.

I also get an error with docker run -p 8888:8888 tschm/islp_labs:v0.0.1 because I happen to be using ports 8888,8889 with jupyter lab. So, this is another detail a user would need to check.

Choosing a proper port the log gives me a link that should point me to a jupyter server but these links don't work on chrome. Perhaps the 8888 port is hard-coded into the docker image so I'm out of luck if my port 8888 is in use?

log.txt

Yes, loading the image the first time, is a huge operation if you don't have the scicy-notebook layers in cache...
I think the scipy-notebook is very helpful though... Has conda, pip, non-root user, ...

@jonathan-taylor
Copy link
Collaborator

jonathan-taylor commented Aug 21, 2023 via email

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

For the port,
The docker image runs internally always on 8888. You can forward this port to a different port though. At the choice is up to yours, e.g. something like 3000:8888 is possible. Then the Jupyter server would run on port 3000 on the host.

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

Overall, I think this can wait until we actually have several people who want an "official" docker image.

By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would not use conda or recommend it :-) Where do you get jupyterlab from?

@jonathan-taylor
Copy link
Collaborator

Overall, I think this can wait until we actually have several people who want an "official" docker image.
By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would not use conda or recommend it :-) Where do you get jupyterlab from?

Well, conda is a community standard (even if it has flaws). I typically just use it to create a minimal environment, then pip for everything else. Could use mamba instead. Both are much lighter weight than docker.

Fair enough about jupyterlab. This is generally enough

pip install jupyterlab

@jonathan-taylor
Copy link
Collaborator

For the port, The docker image runs internally always on 8888. You can forward this port to a different port though. At the choice is up to yours, e.g. something like 3000:8888 is possible. Then the Jupyter server would run on port 3000 on the host.

Yep, docker --help pointed that out...

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

Overall, I think this can wait until we actually have several people who want an "official" docker image.

By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would recommend to keep and document both options. The results of your pip install will not be invariant as you can't control dependencies of your dependencies. Also, some versions you point to may disappear. Once you bake them into an image they are there for eternity. You may not need this level of robustness though.

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

Overall, I think this can wait until we actually have several people who want an "official" docker image.
By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would not use conda or recommend it :-) Where do you get jupyterlab from?

Well, conda is a community standard (even if it has flaws). I typically just use it to create a minimal environment, then pip for everything else. Could use mamba instead. Both are much lighter weight than docker.

Fair enough about jupyterlab. This is generally enough

pip install jupyterlab

I with the community standard would be to setup a virtual environment in the first place as you do. To me it seems people just pip install into their central Python env

@jonathan-taylor
Copy link
Collaborator

jonathan-taylor commented Aug 21, 2023 via email

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

You can also create an even bigger image that has both R and Python installed. See jupyter-stack documentation

@jonathan-taylor
Copy link
Collaborator

jonathan-taylor commented Aug 21, 2023 via email

@@ -0,0 +1,10 @@
FROM docker.io/jupyter/scipy-notebook:lab-4.0.4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implicitly adds more requirements to requirements.txt that could class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see

Everything in jupyter/minimal-notebook and its ancestor images
altair, beautifulsoup4, bokeh, bottleneck, cloudpickle, conda-forge::blas=*=openblas, cython, dask, dill, h5py, jupyterlab-git, matplotlib-base, numba, numexpr, openpyxl, pandas, patsy, protobuf, pytables, scikit-image, scikit-learn, scipy, seaborn, sqlalchemy, statsmodel, sympy, widgetsnbextension, xlrd packages
ipympl and ipywidgets for interactive visualizations and plots in Python notebooks
Facets for visualizing machine learning datasets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could replace the spicy-notebook by the minimal-notebook. Smaller image and no unwanted packages

@jonathan-taylor
Copy link
Collaborator

jonathan-taylor commented Aug 21, 2023 via email

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1 So, on the whole, this is "effectively" capturing the docker image that binder builds. It has more packages due to the FROM docker.io/jupyter/scipy-notebook​ line. This could​​ lead to conflicts if requirements.txt​ is not current with that image... Using binder doesn't make that assumption.

You need to fix the version of the spicy-notebook image. I think I am using something like 4.0.4. For binder, there are ways to build the image directly on binder infrastructure and keep it in their cache. Not an expert though... I think your image might be a bit too big for binder. Takes ages to construct it from your requirements

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

But it seems a little heavy-handed to say the solution is to use docker​ instead of teaching them to manage a virtual environment....

The virtual environment thing is not that easy. It exposes you to all sorts of OS dependency problems.

@jonathan-taylor
Copy link
Collaborator

jonathan-taylor commented Aug 21, 2023 via email

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1 So, on the whole, this is "effectively" capturing the docker image that binder builds. It has more packages due to the FROM docker.io/jupyter/scipy-notebook​ line. This could​​ lead to conflicts if requirements.txt​ is not current with that image... Using binder doesn't make that assumption.

I think the order is wrong :-) You should build the image and binder should capture it :-) Binder is somewhat tricky about being pointed to docker images.

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

I have updated the underlying image, see https://hub.docker.com/r/tschm/islp_labs/tags. The resulting image is now smaller but still close to 3 GB... let's check the files copied into the image

@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

I have tried to address the somewhat large size of the resulting images. However, it seems that's a direct consequence of installing the NVidia packages. I did an analysis with SLIM.ai and the constructed Python environment takes several GBs. I kept the Dockerfile somewhat standard and readable. When I build the image locally it tells me it has like 2.1 GB. Doing the roundtrip via Dockerhub the same image after a pull is now 6 GB? Weird...

jonathan-taylor and others added 2 commits August 21, 2023 11:58
Change to manual dispatch, where images will get stored
.dockerignore Outdated Show resolved Hide resolved
@tschm
Copy link
Contributor Author

tschm commented Aug 21, 2023

You have the merge power. I am not sure you do yourself a favor with the manual release of the docker image. The pushed image will have no strong link to a tag then (if I understand the manual workflow correctly)...

@jonathan-taylor jonathan-taylor merged commit 353df68 into intro-stat-learning:main Aug 21, 2023
@jonathan-taylor
Copy link
Collaborator

Manual dispatch works fine: jetaylor74/islp_labs should have v2.1.1 and latest

Tried to get it to work on push to stable but not getting triggered. Will eventually sort it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants