Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pinning repo2docker version #490

Open
betatim opened this issue Dec 10, 2018 · 10 comments

Comments

@betatim
Copy link
Member

commented Dec 10, 2018

One thing missing for fully reproducible environments is a mechanism for pinning the version of repo2docker.

Several people have proposed ideas and it seems we have a plan: repo2docker will start, inspect a /binder/repo2docker.version file that specifies the version (git commit or version) that the user wants to use and then restart/start a new copy of repo2docker in that version.

The way I'd start on this is to experiment with adding a new entrypoint/CLI script that starts the current repo2docker CLI script inside a container together with the right mounts(?). Docker hub has a collection of images for lots of git revisions which we can use.

(related #170)

@minrk

This comment has been minimized.

Copy link
Member

commented Dec 12, 2018

Since you can run repo2docker in docker by mounting the docker socket (no mount necessary with docker-machine), this should be quite doable:

docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -e DOCKER_HOST=unix:///var/run/docker.sock  jupyter/repo2docker:tag ...

So the only trick is going to be to make sure that repo2docker knows enough to start a repo2docker container with the docker socket mounted. The default case of /var/run/docker.sock ought to be straightforward. If there's any complexity, it's probably going to be in capturing and forwarding output from the sub-container.

@jzf2101

This comment has been minimized.

Copy link
Collaborator

commented Dec 12, 2018

We have to publish versions on pip though- I thought that was still in process?

@betatim

This comment has been minimized.

Copy link
Member Author

commented Dec 17, 2018

We have to publish versions on pip though- I thought that was still in process?

Releases of repo2docker are listed on PyPI: https://pypi.org/project/jupyter-repo2docker/

@craig-willis

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2018

As discussed on the call and in https://discourse.jupyter.org/t/repo2docker-roadmap-review/249/4, I'd like to work on this but will need to come up to speed on a few things.

Running a specified version of the jupyter-repo2docker package from within a Docker image makes sense. What would the expected behavior be for someone running the CLI directly -- just a warning or error? What are the current use cases for running via CLI?

@betatim

This comment has been minimized.

Copy link
Member Author

commented Jan 3, 2019

IMHO running from the CLI should do the same thing as using repo2docker as a library. This means that running repo2docker https://github.com/org/repo does the following:

  1. fetch the repository
  2. inspect the repository to find the version of repo2docker it specifies; default to the version of repo2docker that is installed
  3. run docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock -e DOCKER_HOST=unix:///var/run/docker.sock jupyter/repo2docker:<r2dversionhere> ... with the ... replaced by the original CLI arguments

This is the MVP I'd try to implement and see what happens/breaks. One thing that could get tricky is to get repo2docker /path/to/local/directory to work properly when forwarding the CLI arguments.

I think almost all uses of repo2docker right now are via the CLI, I don't know many who use repo2docker as a library. So I think we should aim for pip install jupyter-repo2docker; repo2docker https://gitlab.com/org/repo-with-version-specified-in-it to "just work" as something you can type into a terminal.

@craig-willis

This comment has been minimized.

Copy link
Contributor

commented Jan 9, 2019

I've taken a first pass at this in #550 -- any feedback welcome.

I've made the naive assumption that repo2docker.version will contain a valid Docker image tag, which may be undesirable. My initial thinking is that the version should be pinned to official releases. Alternatives might include a git commit hash or even a git repo URL for someone working with a fork, and repo2docker could build the repo2docker image...

I also tried to limit the Docker call to just build the image, not necessarily push and run. This introduced the complexity of needing to control the image name passed from the current repo2docker session into the Docker container.

Handling the local directory was indeed tricky. I opted to mount it to a specific path in container, which I think makes things easier.

@yuvipanda

This comment has been minimized.

Copy link
Collaborator

commented Jan 9, 2019

I'd say the real way to long term pin it is to have a defined repo2docker standard version, and pin to those rather than to a specific version of the python package. That might take a while though - so until then, I think it's ok to pin to a specific version of the released package. I don't think we should specify pinning to commit hashes.

@betatim

This comment has been minimized.

Copy link
Member Author

commented Jan 10, 2019

What is the thinking behind not allowing to pin against "anything for which there is a tag on dockerhub"? In practice this would be a SHA1 or a release of repo2docker.

I think having a link to a git repository that contains a repo2docker version that then gets built to then build the repo in repo2docker.version is too meta. I'd try to support that via pip install https://github.com/org/repo2docker-fork; repo2docker https://gitlab.com/org/repo-with-version-specified-in-it were the fork now understands and is responsible for providing docker images for the version string in the version file.

@craig-willis

This comment has been minimized.

Copy link
Contributor

commented Jan 17, 2019

Here's my takeaway from the 1/17 community call discussing feedback on #550. Let me know if I missed anything.

  • repo2docker itself shouldn't have logic to get and run other versions of repo2docker. It should check if the version matches and if not print and error and exit.
  • An external tool (e.g.,repo2docker-runner) should have knowledge to get the right version and run it. This could depend only on the CLI (preferred) or library if needed.
  • This can be two different scripts in the repo2docker repository, but preferably independently packaged and versioned. The goal here is separation of concerns, since the repo2docker-runner may be running different versions of repo2docker and may not have the same update frequency..
  • There are patterns with multiple packages sharing common files. This way the repo2docker-runner could leverage existing fetch, subdir, and binder directory handling.
  • It may be reasonable for repo2docker to have a fetch-only mode that enables using the latest version for new content providers, and the subsequent build/run call would use an older version to build the image.
  • r2d-runner can depend on r2d to to the fetch, read the version file if present, and call the appropriate r2d version using the local provider.

Since it sounds like this can all live in the same repo, I'll take another pass and update the PR.

@yuvipanda

This comment has been minimized.

Copy link
Collaborator

commented Jan 17, 2019

This sounds good to me, @craig-willis! re 'fetch only mode', you can do that with the library right now - and I think you should be able to just use that instead of relying on the CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.