Construction and Deployment of a Docker image #5

tschm · 2023-08-21T04:10:05Z

You need to control how can introduce tags. You need your own Dockerhub account...

V2.1rc (#4)

jonathan-taylor · 2023-08-21T06:12:46Z

This launches a huge download (at least on first time) several GB. I guess this is related to a one-time download of the jupyter/scipy-notebook. I presume this download wouldn't have to happen for further updates but not sure.

I also get an error with docker run -p 8888:8888 tschm/islp_labs:v0.0.1 because I happen to be using ports 8888,8889 with jupyter lab. So, this is another detail a user would need to check.

Choosing a proper port the log gives me a link that should point me to a jupyter server but these links don't work on chrome. Perhaps the 8888 port is hard-coded into the docker image so I'm out of luck if my port 8888 is in use?

log.txt

.github/workflows/docker.yml

jonathan-taylor · 2023-08-21T06:16:48Z

Also, log indicates that this image is for different architecture than my. My Mac is an M1, probably this was built for Intel? Still runs, but not sure if this is an issue -- do the docker images depend on an architecture?

jonathan-taylor · 2023-08-21T06:22:44Z

Overall, I think this can wait until we actually have several people who want an "official" docker image.

By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:

conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

tschm · 2023-08-21T06:26:26Z

Also, log indicates that this image is for different architecture than my. My Mac is an M1, probably this was built for Intel? Still runs, but not sure if this is an issue -- do the docker images depend on an architecture?

Yes, docker images a very much an ubuntu thing. That's a huge advantage as you can use them on Windows, Mac or Ubuntu. I am using a Mac with M1, too

tschm · 2023-08-21T06:28:33Z

This launches a huge download (at least on first time) several GB. I guess this is related to a one-time download of the jupyter/scipy-notebook. I presume this download wouldn't have to happen for further updates but not sure.

I also get an error with docker run -p 8888:8888 tschm/islp_labs:v0.0.1 because I happen to be using ports 8888,8889 with jupyter lab. So, this is another detail a user would need to check.

Choosing a proper port the log gives me a link that should point me to a jupyter server but these links don't work on chrome. Perhaps the 8888 port is hard-coded into the docker image so I'm out of luck if my port 8888 is in use?

log.txt

Yes, loading the image the first time, is a huge operation if you don't have the scicy-notebook layers in cache...
I think the scipy-notebook is very helpful though... Has conda, pip, non-root user, ...

jonathan-taylor · 2023-08-21T06:29:26Z

Got it, you made a tag v0.0.1... https://github.com/tschm/ISLP_labs/releases/tag/v0.0.1 Was just not sure where v0.0.1 came from since the intro-stat-learning repo doesn't have that tag. Could also change it to a manual dispatch or something else I suppose.

…

________________________________ From: Thomas Schmelzer ***@***.***> Sent: Sunday, August 20, 2023 11:25 PM To: intro-stat-learning/ISLP_labs ***@***.***> Cc: Jonathan Taylor ***@***.***>; Comment ***@***.***> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5) @tschm commented on this pull request.

________________________________ On .github/workflows/docker.yml<#5 (comment)>: The docker image constructed in tagged. It is only executed when on: release: types: [published] Hence the tag is picked up and used to tag the image. If you just do a simple commit no new docker image is constructed. At the same time the image :latest is updated. — Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACTM22TTUTYZREEC6ZETVDXWL5NRANCNFSM6AAAAAA3XYYXVQ>. You are receiving this because you commented.Message ID: ***@***.***>

tschm · 2023-08-21T06:31:42Z

For the port,
The docker image runs internally always on 8888. You can forward this port to a different port though. At the choice is up to yours, e.g. something like 3000:8888 is possible. Then the Jupyter server would run on port 3000 on the host.

tschm · 2023-08-21T06:34:23Z

Overall, I think this can wait until we actually have several people who want an "official" docker image.

By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:
conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would not use conda or recommend it :-) Where do you get jupyterlab from?

jonathan-taylor · 2023-08-21T06:39:11Z

Overall, I think this can wait until we actually have several people who want an "official" docker image.
By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:
conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt
I would not use conda or recommend it :-) Where do you get jupyterlab from?

Well, conda is a community standard (even if it has flaws). I typically just use it to create a minimal environment, then pip for everything else. Could use mamba instead. Both are much lighter weight than docker.

Fair enough about jupyterlab. This is generally enough

pip install jupyterlab

jonathan-taylor · 2023-08-21T06:40:34Z

For the port, The docker image runs internally always on 8888. You can forward this port to a different port though. At the choice is up to yours, e.g. something like 3000:8888 is possible. Then the Jupyter server would run on port 3000 on the host.

Yep, docker --help pointed that out...

tschm · 2023-08-21T06:41:38Z

Overall, I think this can wait until we actually have several people who want an "official" docker image.

By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:
conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt

I would recommend to keep and document both options. The results of your pip install will not be invariant as you can't control dependencies of your dependencies. Also, some versions you point to may disappear. Once you bake them into an image they are there for eternity. You may not need this level of robustness though.

tschm · 2023-08-21T06:43:23Z

Overall, I think this can wait until we actually have several people who want an "official" docker image.
By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do:
conda create -n my_islp_env python=3.11 -y
conda activate my_islp_env
pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt
I would not use conda or recommend it :-) Where do you get jupyterlab from?
Well, conda is a community standard (even if it has flaws). I typically just use it to create a minimal environment, then pip for everything else. Could use mamba instead. Both are much lighter weight than docker.

Fair enough about jupyterlab. This is generally enough
pip install jupyterlab

I with the community standard would be to setup a virtual environment in the first place as you do. To me it seems people just pip install into their central Python env

jonathan-taylor · 2023-08-21T06:43:48Z

OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1 So, on the whole, this is "effectively" capturing the docker image that binder builds. It has more packages due to the FROM docker.io/jupyter/scipy-notebook line. This could lead to conflicts if requirements.txt is not current with that image... Using binder doesn't make that assumption.

…

________________________________ From: Jonathan E. Taylor ***@***.***> Sent: Sunday, August 20, 2023 11:29 PM To: intro-stat-learning/ISLP_labs ***@***.***>; intro-stat-learning/ISLP_labs ***@***.***> Cc: Jonathan Taylor ***@***.***>; Comment ***@***.***> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5) Got it, you made a tag v0.0.1... https://github.com/tschm/ISLP_labs/releases/tag/v0.0.1 Was just not sure where v0.0.1 came from since the intro-stat-learning repo doesn't have that tag. Could also change it to a manual dispatch or something else I suppose.

________________________________ From: Thomas Schmelzer ***@***.***> Sent: Sunday, August 20, 2023 11:25 PM To: intro-stat-learning/ISLP_labs ***@***.***> Cc: Jonathan Taylor ***@***.***>; Comment ***@***.***> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5) @tschm commented on this pull request.

________________________________ On .github/workflows/docker.yml<#5 (comment)>: The docker image constructed in tagged. It is only executed when on: release: types: [published] Hence the tag is picked up and used to tag the image. If you just do a simple commit no new docker image is constructed. At the same time the image :latest is updated. — Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACTM22TTUTYZREEC6ZETVDXWL5NRANCNFSM6AAAAAA3XYYXVQ>. You are receiving this because you commented.Message ID: ***@***.***>

tschm · 2023-08-21T06:44:36Z

You can also create an even bigger image that has both R and Python installed. See jupyter-stack documentation

jonathan-taylor · 2023-08-21T06:44:53Z

But it seems a little heavy-handed to say the solution is to use docker instead of teaching them to manage a virtual environment....

…

________________________________ From: Thomas Schmelzer ***@***.***> Sent: Sunday, August 20, 2023 11:43 PM To: intro-stat-learning/ISLP_labs ***@***.***> Cc: Jonathan Taylor ***@***.***>; Comment ***@***.***> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5) Overall, I think this can wait until we actually have several people who want an "official" docker image. By simply install pip -r requirements.txt this really does no more isolation of code than if I were to do: conda create -n my_islp_env python=3.11 -y conda activate my_islp_env pip install -r https://raw.githubusercontent.com/intro-stat-learning/ISLP_labs/v2.1/requirements.txt I would not use conda or recommend it :-) Where do you get jupyterlab from? Well, conda is a community standard (even if it has flaws). I typically just use it to create a minimal environment, then pip for everything else. Could use mamba instead. Both are much lighter weight than docker. Fair enough about jupyterlab. This is generally enough pip install jupyterlab I with the community standard would be to setup a virtual environment in the first place as you do. To me it seems people just pip install into their central Python env — Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACTM23KNWZ3FGC2D67NLQDXWL7RNANCNFSM6AAAAAA3XYYXVQ>. You are receiving this because you commented.Message ID: ***@***.***>

jonathan-taylor · 2023-08-21T06:45:02Z

docker/Dockerfile

@@ -0,0 +1,10 @@
+FROM docker.io/jupyter/scipy-notebook:lab-4.0.4


This implicitly adds more requirements to requirements.txt that could class.

Yes, see

Everything in jupyter/minimal-notebook and its ancestor images
altair, beautifulsoup4, bokeh, bottleneck, cloudpickle, conda-forge::blas=*=openblas, cython, dask, dill, h5py, jupyterlab-git, matplotlib-base, numba, numexpr, openpyxl, pandas, patsy, protobuf, pytables, scikit-image, scikit-learn, scipy, seaborn, sqlalchemy, statsmodel, sympy, widgetsnbextension, xlrd packages
ipympl and ipywidgets for interactive visualizations and plots in Python notebooks
Facets for visualizing machine learning datasets

you could replace the spicy-notebook by the minimal-notebook. Smaller image and no unwanted packages

jonathan-taylor · 2023-08-21T06:46:28Z

I believe it. It's basically an ubuntu server. You can do tons. Of course, linking to R introduces more dependency.

…

________________________________ From: Thomas Schmelzer ***@***.***> Sent: Sunday, August 20, 2023 11:44 PM To: intro-stat-learning/ISLP_labs ***@***.***> Cc: Jonathan Taylor ***@***.***>; Comment ***@***.***> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5) You can also create an even bigger image that has both R and Python installed. See jupyter-stack documentation — Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACTM26WHYG6ZWTL4WYXAA3XWL7V5ANCNFSM6AAAAAA3XYYXVQ>. You are receiving this because you commented.Message ID: ***@***.***>

tschm · 2023-08-21T06:46:50Z

OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1 So, on the whole, this is "effectively" capturing the docker image that binder builds. It has more packages due to the FROM docker.io/jupyter/scipy-notebook line. This could lead to conflicts if requirements.txt is not current with that image... Using binder doesn't make that assumption.
…

You need to fix the version of the spicy-notebook image. I think I am using something like 4.0.4. For binder, there are ways to build the image directly on binder infrastructure and keep it in their cache. Not an expert though... I think your image might be a bit too big for binder. Takes ages to construct it from your requirements

tschm · 2023-08-21T06:48:46Z

But it seems a little heavy-handed to say the solution is to use docker instead of teaching them to manage a virtual environment....
…

The virtual environment thing is not that easy. It exposes you to all sorts of OS dependency problems.

jonathan-taylor · 2023-08-21T06:50:34Z

Sigh. Binder is not something we "support". It's a service that people can try. It has limited resources, and has its way of managing them. And yes, a fresh build takes some time. Docker images are cached on binder, and if you read the documentation, it indicates that repos that get a lot of traffic eventually have quicker startup times. My comment was that we can think of making this docker image available is going to give users the same experience as launching binder, but it can be faster.

…

________________________________ From: Thomas Schmelzer ***@***.***> Sent: Sunday, August 20, 2023 11:47 PM To: intro-stat-learning/ISLP_labs ***@***.***> Cc: Jonathan Taylor ***@***.***>; Comment ***@***.***> Subject: Re: [intro-stat-learning/ISLP_labs] Construction and Deployment of a Docker image (PR #5) OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1 So, on the whole, this is "effectively" capturing the docker image that binder builds. It has more packages due to the FROM docker.io/jupyter/scipy-notebook line. This could lead to conflicts if requirements.txt is not current with that image... Using binder doesn't make that assumption. … You need to fix the version of the spicy-notebook image. I think I am using something like 4.0.4. For binder, there are ways to build the image directly on binder infrastructure and keep it in their cache. Not an expert though... I think your image might be a bit too big for binder. Takes ages to construct it from your requirements — Reply to this email directly, view it on GitHub<#5 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AACTM25UIDXNNUZ66YOZUXTXWL76JANCNFSM6AAAAAA3XYYXVQ>. You are receiving this because you commented.Message ID: ***@***.***>

tschm · 2023-08-21T06:54:40Z

OK, by choosing -p 10000:8888 works for me. So, this is just opens essentially the same thing as this: https://mybinder.org/v2/gh/intro-stat-learning/ISLP_labs/v2.1 So, on the whole, this is "effectively" capturing the docker image that binder builds. It has more packages due to the FROM docker.io/jupyter/scipy-notebook line. This could lead to conflicts if requirements.txt is not current with that image... Using binder doesn't make that assumption.
…

I think the order is wrong :-) You should build the image and binder should capture it :-) Binder is somewhat tricky about being pointed to docker images.

tschm · 2023-08-21T07:11:05Z

I have updated the underlying image, see https://hub.docker.com/r/tschm/islp_labs/tags. The resulting image is now smaller but still close to 3 GB... let's check the files copied into the image

tschm · 2023-08-21T17:46:54Z

I have tried to address the somewhat large size of the resulting images. However, it seems that's a direct consequence of installing the NVidia packages. I did an analysis with SLIM.ai and the constructed Python environment takes several GBs. I kept the Dockerfile somewhat standard and readable. When I build the image locally it tells me it has like 2.1 GB. Doing the roundtrip via Dockerhub the same image after a pull is now 6 GB? Weird...

Change to manual dispatch, where images will get stored

.dockerignore

tschm · 2023-08-21T20:00:09Z

You have the merge power. I am not sure you do yourself a favor with the manual release of the docker image. The pushed image will have no strong link to a tag then (if I understand the manual workflow correctly)...

jonathan-taylor · 2023-08-21T22:34:58Z

Manual dispatch works fine: jetaylor74/islp_labs should have v2.1.1 and latest

Tried to get it to work on push to stable but not getting triggered. Will eventually sort it out.

tschm added 9 commits August 20, 2023 20:50

Create Dockerfile

09d888f

Create .dockerignore

9ca33a6

Create .dockerignore

c2c9f40

Delete .dockerignore

28f757e

Update Dockerfile

166a9d8

Update .dockerignore

3fe78e1

Update Dockerfile

714a769

Create docker.yml

3a94ea5

Merge pull request #1 from intro-stat-learning/main

b2d3c70

V2.1rc (#4)

jonathan-taylor reviewed Aug 21, 2023

View reviewed changes

.github/workflows/docker.yml Outdated Show resolved Hide resolved

jonathan-taylor reviewed Aug 21, 2023

View reviewed changes

Update Dockerfile

10fa653

tschm and others added 7 commits August 21, 2023 00:13

Update .dockerignore

07d4385

Update Dockerfile

7d8563d

Using manual dispatch, updated user name for docker

65ea5a9

remove the push event

c69d0cc

use name for job

d3fba0e

using previous job name

d63ad71

Dockerfile

44d8928

jonathan-taylor and others added 2 commits August 21, 2023 11:58

caps for ISLP

9ab76fa

Merge pull request #2 from jonathan-taylor/tschm_main

abe129c

Change to manual dispatch, where images will get stored

jonathan-taylor reviewed Aug 21, 2023

View reviewed changes

.dockerignore Outdated Show resolved Hide resolved

Update .dockerignore

bc4f08a

jonathan-taylor merged commit 353df68 into intro-stat-learning:main Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Construction and Deployment of a Docker image #5

Construction and Deployment of a Docker image #5

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023 via email

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023 via email

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023 via email

jonathan-taylor Aug 21, 2023

tschm Aug 21, 2023

tschm Aug 21, 2023

jonathan-taylor commented Aug 21, 2023 via email

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023 via email

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

		@@ -0,0 +1,10 @@
		FROM docker.io/jupyter/scipy-notebook:lab-4.0.4

Construction and Deployment of a Docker image #5

Construction and Deployment of a Docker image #5

Conversation

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023 via email

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023 via email

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023 via email

jonathan-taylor Aug 21, 2023

Choose a reason for hiding this comment

tschm Aug 21, 2023

Choose a reason for hiding this comment

tschm Aug 21, 2023

Choose a reason for hiding this comment

jonathan-taylor commented Aug 21, 2023 via email

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023 via email

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

tschm commented Aug 21, 2023

jonathan-taylor commented Aug 21, 2023