Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builds on DockerHub #2

Closed
parente opened this issue Jul 27, 2015 · 16 comments
Closed

Builds on DockerHub #2

parente opened this issue Jul 27, 2015 · 16 comments

Comments

@parente
Copy link
Member

parente commented Jul 27, 2015

I'd like to get stack builds cranking on DockerHub as soon as there's a few more images worth building. (Note to self: hurry up with more images. :) To do so, though, we need to answer a few questions.

  1. How should builds from Docker image definitions in this git repo surface on Docker Hub? Take the minimal-notebook and r-notebook stacks as examples. Do they appear as jupyter/minimal-notebook and jupyter/r-notebook on DockerHub? The naming is consistent with everything else in the Jupyter project, but will users tell jupyter/minimal-notebook apart from jupyter/notebook (setup for dev / test) or jupyter/minimal-demo (base image for tmpnb.org? Do we put them in a new Docker repository like docker-stacks/minimal-notebook, docker-stacks/r-notebook, etc.? Can we even use a different name Docker repository name or is it linked to the GitHub user name?
  2. When should builds kick-off? Automatically? Manually? A single GitHub repo for multiple images is not how Docker Hub wants to work. Docker Hub builds are either automatic on push or manual. If we set all the images to automatic, every image rebuilds immediately when there's a push to the repo, even if unnecessary. Worse, if an image starts rebuilding before its parent image has built, it's possible to wind up with a failed build or incorrect image. We can configure linked builds on Docker Hub to trigger child container rebuilds after the parent build succeeds, but since child lives in the same git repo as parent, there will still be a wasted build. (Maybe we don't care?)
  3. Does Docker tagging help us with anything? We could tag images by the version of the Jupyter project contained therein. But it's a myopic reflection when there's Python 2.x, Python 3.x, Scala 2.x, Spark 1.4.x, Mesos x.x. etc. all in a container and potentially varying from build to build. Do we just stick with latest?
@DrPaulBrewer
Copy link

These are my opinions as a docker and ipython user.

I am not an official member of your project.

Re: (3) Does Docker tagging help us with anything?

As a user, it would help if I could rely upon certain numbered version as being unchanging, i.e. designated releases, so I can use it to reliably build functionality that will continue to function for processes or customers that need a frozen environment that always works.

:latest should be for jupyter devs, testers, bleeding edgers, who can afford malfunction or investigate it.

As to the myriad versions of everything else, the choices seem to be: (a) aggregation into one container, (b) joining various containers, (c) customizable build script left in container

(a) aggregate into one container. This is good for end user ease-of-use, until they decide they want something different than the stock arrangement. One issue that this style tends to produce huge containers that take forever to download. Some of the spark/hadoop docker containers released by others suffer from too-big and too-many-layers already. One solution would be to use a suitable base image everyone using docker already has, or should have, but as I write this ubuntu:latest and debian:latest images have python3 and no python2, and centos:latest has python2 and no python3... not to mention the other lesser used components.

(b) Split across containers. Here an environment would be built from several docker containers linked to the jupyter container, perhaps as docker volumes. This is often done for tcp/ip linking of a database running in one container with an app like a web front-end running in another container. In those cases the mysql/maria containers provide examples, but that doesn't seem to be the primary problem faced here. Instead, the problem here seems to be "what lang/environment is this jupyter for? can I adjust that without downloading the entirety of jupyter again?" In those cases a -v volume option exists that allows mounting one containers filesystem inside another container. But this configuration would seem to be harder on developers and end users but perhaps more flexible for creating various configurations. I'm unaware of a way for a bunch of containers to each dump executables into /usr/bin of a single container. In the absence of that functionality, some clever planning and setting of PATH, PYTHONPATH, etc., would need to be done.

(c) customizable build script left in container The idea is to leave a script in the container that uses root privilege on the container along with apt, yum, pip, and similar tools to customize the environment from the base environment to something that works and then run the resulting environment as an ordinary user.

As a potential end user (a) and (c) currently sound best to me.

@parente
Copy link
Member Author

parente commented Jul 28, 2015

I believe we're on track for (c), but please check this assumption.

The initial Docker image definitions in this repo install conda and pip in an environment so that the unprivileged jovyan container user can install new packages. The minimal-notebook definition upon which the others are based accepts a GRANT_SUDO env var on start to give that jovyan user the ability to run apt-get if the user is in a trusted environment (e.g., his/her own VM). In addition, once these images are published to Docker Hub, end users can script the creation of new images to install libs that will not be lost across container destroy/create cycles, ala:

FROM jupyter/python-notebook-stack
USER $NB_USER
RUN conda install <some additional lib for Python 3 that you want in your custom image>
docker build -t parente/my-python-notebook-stack .

@rgbkrk
Copy link
Member

rgbkrk commented Jul 29, 2015

Most of the issue we had with automated builds was how permissions on Docker Hub worked. We weren't getting deterministic builds for jupyter/demo permissions wise and sometimes our builds took too long on the hub, so we started building them manually instead.

I'm a bigger fan of automated trusted (as much as they can be) builds, triggered via the normal github -> docker webhook.

@parente
Copy link
Member Author

parente commented Aug 12, 2015

I'd like to give the automated builds another shot. I started by getting minimal-notebook working properly under my personal namespace on Docker Hub. It built without a hitch. I'll try scipy-notebook too (hacked to point to parente/*) and ensure the automation properly rebuilds scipy-notebook on a new build of minimal finishes.

If all that works, would be good to get the builds going under jupyter/*.

@parente
Copy link
Member Author

parente commented Aug 14, 2015

I setup parente/minimal-notebook and parente/pyspark-notebook which is triggered by the former. I made changes in my fork to the minimal notebook and confirmed the latter rebuilt. I've been using the resulting pyspark image locally all day without permission-related or other problems.

I think we should give the automated builds a shot again and only fall back on a bespoke solution if necessary.

Under what org should we build the images on Docker Hub? Should we create a new jupyterstacks org to avoid naming conflicts with existing images? Or would you like them all under jupyter?

@minrk
Copy link
Member

minrk commented Aug 14, 2015

I think they can just be under jupyter. @rgbkrk?

@rgbkrk
Copy link
Member

rgbkrk commented Aug 15, 2015

I think they can be under jupyter.

@parente
Copy link
Member Author

parente commented Aug 16, 2015

Works for me. @rgbkrk can you grant me permissions in the jupyter org to set it up?

@parente
Copy link
Member Author

parente commented Aug 19, 2015

I setup the automated builds for "latest" versions of the stacks we have. Everything went smoothly except r-notebook which has the new behavior (on Dockerhub and local for me) of hanging in conda solving package specs. I'll look into debugging it locally.

@parente
Copy link
Member Author

parente commented Aug 19, 2015

r-notebook problem was related to r-devtools. Bumping it 1.8 and adjusting for newer R 3.2 release and incompatible packages solved the problem. It's now on docker hub. (I'm still not clear why conda was struggling with the older versions, but moving on ...)

Along the way I noticed it was installing a second copy of IPy. PR #8 should fix it.

@rgbkrk The docker hub webhooks for this git repo don't seem to be enabled. I don't have admin permissions on the repo to check. (And I'm a bit lost on where I would configure a docker hub organization to enable them for an organization on github.) Did you or someone else manage to set them up for ipython/docker-images originally?

@rgbkrk
Copy link
Member

rgbkrk commented Aug 19, 2015

Let me see what I can do there too.

@rgbkrk
Copy link
Member

rgbkrk commented Aug 19, 2015

Want to try setting up hooks now? You now have admin access on this repo instead of just write.

@parente
Copy link
Member Author

parente commented Aug 19, 2015

I'll give it a shot later today. Tnx.

@parente
Copy link
Member Author

parente commented Aug 19, 2015

I think it's now enabled, but we'll have to wait for the next git push / PR to find out. If it doesn't work, it's honestly not too bad at the moment with the few stacks we have to manually go trigger the docker hub builds. Better control too. As it stands, any merge to any stack folder is going to trigger all of them to rebuild, and there's nothing we can do about it if all the stacks are in one repo.

At any rate, all the "latest" images are now pushed to Docker Hub using notebook 3.2.1. I plan to now create a 3.2.1 branch, update all the descendants of minimal-notebook to build FROM jupyter/minimal-notebook:3.2.1, and get those tagged builds going too. If that all works, can move on to issue #6 once Conda has a 4.0 build.

@parente
Copy link
Member Author

parente commented Aug 20, 2015

Created branch 3.2.x (following pattern of jupyter/notebook branches). Setup builds for that branch on Docker Hub. Manually triggered the build of jupyter/minimal-notebook:3.2. (Notice no patch number since the conda command allows the patch to vary.) Updated all Dockerfiles in the branch to build FROM that tagged 3.2 image. Pushed that minor change to 3.2.x branch and all images started rebuilding automatically on GitHub. They all finished successfully.

@parente
Copy link
Member Author

parente commented Aug 20, 2015

(1) and (2) from the original description are done. (3) is done enough (tags reflect main process version). I opened issue #12 for the finer points of how to capture versions of installed libraries in a stable manner.

@parente parente closed this as completed Aug 20, 2015
rochaporto pushed a commit to rochaporto/docker-stacks that referenced this issue Jan 23, 2019
finalspy pushed a commit to finalspy/docker-stacks that referenced this issue Jan 31, 2020
try use debian stretch slim in place of ubuntu for pyspark image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants