Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying images to reuse cache from #478

Merged
merged 9 commits into from Dec 11, 2018

Conversation

@yuvipanda
Copy link
Collaborator

yuvipanda commented Nov 29, 2018

This lets us explicitly specify images that repo2docker
should try to re-use cached layers from. Docker normally only
looks for layers from images that were built locally - if
we want it to look in images that were pulled from a registry,
we need to specify it here.

This isn't a problem on binder, since we don't pull images from
elsewhere yet. However, this is a problem for hubploy, and
would make the life of yuvipanda/hubploy#4
much easier.

@yuvipanda yuvipanda changed the title Allow specifying images to reuse cache from [WIP] Allow specifying images to reuse cache from Nov 29, 2018
@yuvipanda

This comment has been minimized.

Copy link
Collaborator Author

yuvipanda commented Nov 29, 2018

I am not sure at all how to test this.

@betatim

This comment has been minimized.

Copy link
Member

betatim commented Dec 1, 2018

I like it! Two ponderings:

  • should we stick to using exactly the same command-line argument name as docker build uses instead of changing the name slightly?
  • can we offer some (automatic) behaviour to the unsuspecting user that "does the right thing" and speeds up their local builds? See last paragraph for some thoughts

re: testing, will docker inspect tell you if a layer was built or taken from the cache? Alternatively I think a layer that is shared between two images should have the same ID in both so if we inspect the cache-from image and the newly built one we should be able to find shared layers.


We have discussed building and pushing images that contain the layers before copying in the repo contents for each build pack to docker hub. This would speed up the 100% shared part of the builds, for the user who has never built that buildpack before locally. If we do this then we could have a --cache-from=auto setting that adds all those images to the list. Would we end up pulling all images or only those that are used?

There is also #310 for which this PR could be useful.

@yuvipanda

This comment has been minimized.

Copy link
Collaborator Author

yuvipanda commented Dec 5, 2018

I like it! Two ponderings:

* should we stick to using exactly the same command-line argument name as `docker build` uses instead of changing the name slightly?

Yeah, that sounds like a good idea.

* can we offer some (automatic) behaviour to the unsuspecting user that "does the right thing" and speeds up their local builds? See last paragraph for some thoughts

This already happens for images already built locally. --cache-from is only needed to re-use cache from images pulled from an external repository.

re: testing, will docker inspect tell you if a layer was built or taken from the cache? Alternatively I think a layer that is shared between two images should have the same ID in both so if we inspect the cache-from image and the newly built one we should be able to find shared layers.

Since cache is used both for images already built locally and images pulled from a remote registry, there's no easy way to find out if this was effective or not.

We have discussed building and pushing images that contain the layers before copying in the repo contents for each build pack to docker hub. This would speed up the 100% shared part of the builds, for the user who has never built that buildpack before locally. If we do this then we could have a --cache-from=auto setting that adds all those images to the list. Would we end up pulling all images or only those that are used?

This seems like a larger discussion that we should have in a separate issue.

There is also #310 for which this PR could be useful.

yuvipanda added 4 commits Nov 29, 2018
This lets us explicitly specify images that repo2docker
should try to re-use cached layers from. Docker normally only
looks for layers from images that were *built* locally - if
we want it to look in images that were *pulled* from a registry,
we need to specify it here.
More accurate name in our context
@yuvipanda yuvipanda force-pushed the yuvipanda:cache-from branch from b8d53e5 to 89665ea Dec 5, 2018
@yuvipanda

This comment has been minimized.

Copy link
Collaborator Author

yuvipanda commented Dec 6, 2018

The way to test this would be:

  1. Set up a local docker image registry
  2. Build and push an image to it
  3. Delete all locally built images (so we can be sure there isn't any local cache reuse)
  4. Pull the image we just pushed
  5. Rebuild the image, and check the logs for messages about cache-reuse / use docker inspect

Step (3) might drastically slow down other tests if run in the same place as them, so we gotta isolate this one.

@yuvipanda

This comment has been minimized.

Copy link
Collaborator Author

yuvipanda commented Dec 6, 2018

Maybe from #482 (comment), another option here is to mock the docker client. I think that'd be much easier, actually! The integration test would be testing that docker itself works fine, while we only care wether we pass through the param properly

yuvipanda added 3 commits Dec 10, 2018
Merges in this revert: #481
- Creating a new client with 'auto' version causes repeated
  unnecessary network requests to discover version of docker daemon.
- Testing is easier this way, since we can inject a mocked docker
  client more easily
@yuvipanda yuvipanda changed the title [WIP] Allow specifying images to reuse cache from Allow specifying images to reuse cache from Dec 10, 2018
Got lost in a merge
@yuvipanda yuvipanda requested review from betatim and minrk Dec 10, 2018
This was referenced Dec 11, 2018
@minrk minrk merged commit 694e728 into jupyter:master Dec 11, 2018
4 checks passed
4 checks passed
ci/circleci: build_docs Your tests passed on CircleCI!
Details
codecov/patch 90% of diff hit (target 0%)
Details
codecov/project 75.8% (target 0%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@davidar davidar referenced this pull request Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.