Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Start reusing existing docker images if content hasn't changed #461

Merged
merged 2 commits into from
Dec 4, 2018

Conversation

betatim
Copy link
Member

@betatim betatim commented Nov 4, 2018

Content providers can specify a "content ID" which is used to identify
versions of the content. The ID is used in the docker image name and if
we find an existing image for a given content ID the build step is
skipped.

(This is also physics @ctb 馃榾)

Closes #452

Content providers can specify a "content ID" which is used to identify
versions of the content. The ID is used in the docker image name and if
we find an existing image for a given content ID the build step is
skipped.
@betatim
Copy link
Member Author

betatim commented Nov 4, 2018

Should we switch to using tags instead of putting this in the image name?

@betatim betatim changed the title [WIP] Start reusing existing docker images if content hasn't changed [MRG] Start reusing existing docker images if content hasn't changed Nov 5, 2018
@psychemedia
Copy link

So this make me wonder...

Could repo2docker do something like:

  • rebuild an "environment" image from a repo if build related files have changed
  • use pre-existing image if only "content" files (eg notebooks) have changed and mount those into a notebooks volume shared with the environment image?

@betatim
Copy link
Member Author

betatim commented Nov 6, 2018

Could repo2docker do something like:

That idea is being explored in #410. The tricky part is figuring out what is a "content" file and what is a "environment" file. For example if you use a requirements.txt as environment file then any file in the repo could influence what is installed as you could run pip install -e. in any of the subdirectories of the repo.

@choldgraf
Copy link
Member

Am I correct that this is what the PR is implementing:

  • Lets people specify a kwarg "content_id" that is meant to be a unique identified for content that the user manually specifies
  • Adds that ID to the built docker image registry somehow
  • If a new build is triggered and the content ID is the same, then it just re-uses an old image?

(maybe more generally, some more context about what this PR implements, why, what the big upsides are, etc would be helpful for me to understand what's going on!)

@betatim
Copy link
Member Author

betatim commented Nov 11, 2018

Right now if you run repo2docker https://github.com/awesome-org/super-repo twice (without the remote repo actually changing) you have to build the image twice. This PR allows the Git content provider to return a "content ID" (in this case the SHA-1 of the commit) which repo2docker uses to check if there already is an image for this. If yes it will skip building and directly launch the image.

It is like the behaviour of a BinderHub that won't rebuild a repo unless there is a new commit.

The user doesn't specify anything. It is up to a content provider to return a "content ID". The local directory content provider doesn't do this so we always rebuild the image (this is fine because if nothing has changed we profit from the caching that docker does).

So far we add a timestamp to the end of the docker image name so we have something like r2d-https-blahblahblah-124235 as image name. If we rebuild the same repo 10 seconds later we end up with r2d-https-blahblahblah-124245.

If we merge this PR we will have names like r2d-https-blahblahblah-$CONTENTID which lets us check if an image with that name already exists and if yes skip building it.

@yuvipanda
Copy link
Collaborator

@betatim we put this in tags in binderhub, right? If so maybe we should do that here too. But we can explore that in a different pr

@betatim
Copy link
Member Author

betatim commented Dec 4, 2018

Nods, I thought about using tags but then decided not to because the timestamp (current behaviour) isn't in a tag either. I'll change to using tags and write some tests.

@yuvipanda
Copy link
Collaborator

@betatim cool. I've merged this now, can move to tags after :)

@yuvipanda yuvipanda merged commit 8c9f08c into jupyterhub:master Dec 4, 2018
@betatim betatim deleted the caching-builds branch December 5, 2018 06:45
markmo pushed a commit to markmo/repo2docker that referenced this pull request Jan 22, 2021
[MRG] Start reusing existing docker images if content hasn't changed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants