Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for versioning, immutable historical builds #12

Closed
parente opened this issue Aug 20, 2015 · 12 comments
Closed

Proposal for versioning, immutable historical builds #12

parente opened this issue Aug 20, 2015 · 12 comments

Comments

@parente
Copy link
Member

parente commented Aug 20, 2015

Current Setup

  • Docker Hub (DH) automatically builds git master and tags it with latest
  • DH automatically builds version branches like 4.0.x and tags them with 4.0
  • We only ever roll the latest master branch and latest version branch forward with fixes.

Problems

  • We don't tag per git commit so that users can easily roll back to prior Docker image versions. This is important when major libraries change (e.g., Spark). Some users want the latest which should go into master, while others want to stay on the current version.
  • Changes on any branch in the git repo causes a build storm on Docker Hub. (See issue Docker Hub automated builds and undesirable rebuilds #15). All tags jump to the latest build even when nothing changed in the build definition.

Proposed Improvement

  • Stop relying on Docker Hub automated builds. (We need more control.)
  • Stop relying on the 3.2.x and 4.0.x branches and branches in general. (Too coarse grained.)
  • Adopt the mantra that when a PR wants to bump a library version in one of the stacks, we take it without question.
  • Setup a build VM that maintainers have access to easily pull from this repo and build images. (Manually for now. We can automate later.)
  • Beef up the Makefile so that a make latest (or some such) does the following:
    • Builds the necessary stacks (change in minimal = build everything, change in scipy = just that stack, etc.) from master HEAD
    • Tags the latest built images (even if they were not rebuilt just now) with latest
    • Tags those same images with the current git commit SHA
    • Pushes all of those tags and new builds to Docker Hub (Note: The client will be smart about only sending deltas / tag metadata for images that did not change.)

Net Result for End Users

For people that want to walk-up-and-use the latest and greatest:

docker run jupyter/some-stack-name

For people that want to depend on a specific container image configuration tied to some point in time in the docker-stacks git history:

docker run jupyter/some-stack-name:<some-git-sha>

where GitHub / git makes the contents of that particular tagged image visible to the user (i.e., find the SHA in git and look at the Dockerfiles).

@parente parente changed the title How to reflect other package metadata within an image Proposal for versioning, immutable historical builds Oct 14, 2015
@parente
Copy link
Member Author

parente commented Oct 14, 2015

/cc @rgbkrk @minrk @kbroughton

@parente
Copy link
Member Author

parente commented Oct 14, 2015

I should add, that in order to affect this change, we'll need to delete the existing automated rebuild repos and recreate them as manual repos from pushed images. There will be an window during this process during which the images are not available. There's just no way to manually push to an automated build repo or convert it to a manual repo.

Maybe we could ask Docker support, but I doubt it'll be possible.

@rgbkrk
Copy link
Member

rgbkrk commented Oct 15, 2015

We've definitely gone through this before for when I moved jupyter/demo over to manual builds. Momentary downtime for this is going to happen. It's ok.

@rgbkrk
Copy link
Member

rgbkrk commented Oct 15, 2015

I'm definitely a fan of requiring people to pin to the (now available) SHA hashes instead of us maintaining the branches. That's a lot of work otherwise.

@parente
Copy link
Member Author

parente commented Oct 15, 2015

The problem with Docker SHA is that you don't really know the content without pulling the image. I'm suggesting a bit of make automation that tags the Docker image with the git sha before push so that your view into the image is simply the contents of the git repo at that git sha.

@rgbkrk
Copy link
Member

rgbkrk commented Oct 15, 2015

Ohhhhhh

@minrk
Copy link
Member

minrk commented Oct 15, 2015

I think that makes sense. I guess there's no room for traditional version tags, since the images contain so many different packages, is there? Can you mock up what the buildbot would look like?

@parente
Copy link
Member Author

parente commented Oct 15, 2015

I think we could still do additional version tags manually at key points, but what those points are and how to capture the version has eluded me. When the primary process version changes? When a major library changes? With tags like notebook_4.0.1_spark_1.5.0? The good part of the new scheme is that if we ever figure that out or want to tag specific images, we're free to do so at will. The Docker Hub automated build precluded it.

Can you mock up what the buildbot would look like?

The attached PR has the simple make steps that any CI system should be able to run. As a next step after trying it manually for a bit to make sure there are no surprises, we can try to do it via Travis, or Circle, or our own Jenkins, or ...

@parente
Copy link
Member Author

parente commented Oct 16, 2015

The new makefile is in. I'm running the first build using it in a tmux on the VM documented in the README. Since there were debian fixes, it's a pretty big rebuilding. The box probably needs a bit of disk performance tuning too. Will keep an eye on it.

In the meantime, all the original images are still available on Docker Hub as they were before. So no "outage" while we get the latest and greatest built and pushed.

@parente
Copy link
Member Author

parente commented Oct 17, 2015

Built master at SHA 9bd33dc and pushed all tags to Docker Hub. The disk buffering for the image layers was really slow on the VM for some reason compared to other VMs in the same data center. I'll dig into it over time.

@parente parente closed this as completed Oct 17, 2015
@parente
Copy link
Member Author

parente commented Oct 22, 2015

Update on slowness: moby/moby#15493

Appears that docker 1.9.0 has this PR to address the problem as seen by others as well. In the meantime, we deal with it.

@dnk8n
Copy link

dnk8n commented Oct 28, 2018

I think a maintained document with the versions would be all that is required (even be a json or bash variable manifest which gets referenced by the docker file too). That way whenever there is a less complicated version bump, only one file gets a change. It would also then be a lot easier to find the tag (although scouring the source control should not be a requirement).

Since docker images and git commits can be referenced by multiple tags, a git tag like datascience-1.0.1 and docker tag like 1.0.1 could be applied every time the version manifest changes.

A dev then could look in one place at the version manifest and the tags would be clearly visible. Creating some documentation somewhere central would be trivial.

I am happy to help but I guess this might not be high on your priority list and something like this would require some buy in from a lot of people.

The nice thing is that the current system can remain in place. Extra tags wouldn't break anything for anyone downstream.

At least adding special tags for programming language changes would be ideal. For example, move the tag python3.6.6 to whichever the latest docker image is with that version of python.

Just some ideas from a downstream user perspective. The system is usable now that I know how it works though. With a bit of digging I was able to find the appropriate image.

rochaporto pushed a commit to rochaporto/docker-stacks that referenced this issue Jan 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants