New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run rstudio directly from repo2docker #533

Open
yuvipanda opened this Issue Dec 20, 2018 · 14 comments

Comments

Projects
None yet
4 participants
@yuvipanda
Copy link
Member

yuvipanda commented Dec 20, 2018

If you have an install.R file, we install RStudio into the image. By default, we use nbressionproxy to make this available from JupyterHub / MyBinder.org, but if you are just building images with repo2docker you don't have to use Jupyter at all - you can just launch directly into rstudio.

We should document this.

/cc @craig-willis

@yuvipanda

This comment has been minimized.

Copy link
Member

yuvipanda commented Dec 20, 2018

If I run repo2docker -p 8788:8787 <path-to-repo-with-R-support> /usr/lib/rstudio-server/bin/rserver on my local machine, I can access rstudio on localhost:8788

@psychemedia

This comment has been minimized.

Copy link

psychemedia commented Jan 4, 2019

Related to this, is there a general pattern that could be documented for running arbitrary applications under a similar model, eg @betatim's nbopenrefineproxy.

@betatim

This comment has been minimized.

Copy link
Collaborator

betatim commented Jan 4, 2019

I think the general pattern is that you need to know three bits of information:

  1. which port the process will be listening on (8788 in this case)
  2. which command to execute to start the process
  3. how to integrate these two bits of information in the repo2docker command-line

This means that you can use it to start openrefine directly, but you will have to know which port needs forwarding and what the openrefine command is.

@craig-willis

This comment has been minimized.

Copy link
Contributor

craig-willis commented Jan 4, 2019

Has there been any discussion of putting the port and command into the generated Dockerfile instead of overriding values during docker run? For the WT use case, if we build an image to run Rstudio, it seems odd to have the default command be jupyter.

@psychemedia

This comment has been minimized.

Copy link

psychemedia commented Jan 4, 2019

If the port is aliased, by a proxy path, you presumably don't need to know the port number?

Some services might allow the port on which they are run to be specified, in which case it would make sense to have a standard recipe / convention for handling that, eg a standardised way/convention for passing a port number in via a docker environment variable via -e arguments?

Something that struck me in context of the OpenRefine service, which was started on a specified port that was dynamically allocated, was that it would be useful to have the port number available via introspection eg within a notebook kernel, eg by writing it to a config file in a standard location.

By the by, one other thing I noticed running repo2docker on a mac was that the default HOME user inside the built container was the same user as the account I ran repo2docker with on host, whereas I'd expected it yo be jovyan for no very good reason. This leaks information into a Docker image for an unwary user who doesn't run repo2docker eg with --user jovyan set.

@yuvipanda

This comment has been minimized.

Copy link
Member

yuvipanda commented Jan 4, 2019

Adding aliases is easy now with https://jupyter-server-proxy.readthedocs.io/

@betatim

This comment has been minimized.

Copy link
Collaborator

betatim commented Jan 4, 2019

I think we are talking about two different things here. This issue is specifically about directly launching an executable (like RStudio or OpenRefine) without going via the notebook server and its proxy.

I think it is worth creating separate issues for discussing these topics and things related to each.


Has there been any discussion of putting the port and command into the generated Dockerfile instead of overriding values during docker run?

Can you explain a bit what you mean? My understanding of how docker works is that even if you put a EXPOSE <porthere> in the docker file you still need to specify how to map that port with -p port:port when you start the container. Do you think we should try and guess a good mapping instead of relying on the user to provide it?

@psychemedia

This comment has been minimized.

Copy link

psychemedia commented Jan 5, 2019

Does Binderhub always start with a CMD that launches jupyterhub? i.e. could MyBinder be used just to run RStudio? More generally, is there a set-up for MyBinder where I could autostart a service in addition to a notebook server, such as a postgres database?

@psychemedia

This comment has been minimized.

Copy link

psychemedia commented Jan 5, 2019

(@betatim re: separate issues: yes, apols, I was conflating user lumped ideas of starting arbitrary services together. eg ones that a user might start from a notebook menu and run via a proxy, ones that might autostart, etc)

@betatim

This comment has been minimized.

Copy link
Collaborator

betatim commented Jan 5, 2019

Currently the creator of a binder/repo can't control the command that BinderHub will run. You can hook yourself into the startup process though via https://repo2docker.readthedocs.io/en/latest/config_files.html#start-run-code-before-the-user-sessions-starts aka start up a DB.

(I don't know what will happen if you don't add exec $@ to the end of your start file. That would seem like a hack worth trying to see if you can "override" the command used from the outside. I'd expect weird stuff to happen. And would definitely class it as "voids your warranty" use)

@yuvipanda

This comment has been minimized.

Copy link
Member

yuvipanda commented Jan 5, 2019

BinderHub requires a notebook server to be running as CMD, since it does token authentication through that. One option is to use a simple binary that does proxying, supervision & token authentication only, which might be useful to explore. The alternative I've been exploring is, of course, jupyter-server-proxy, which can now spawn and supervise additional processes, including postgres. The ability to autostart services and have arbitrary readiness checks doesn't exist yet, but could be added easily!

@craig-willis

This comment has been minimized.

Copy link
Contributor

craig-willis commented Jan 5, 2019

This is in response to @betatim's question #533 (comment). Sorry to be so verbose, but since Whole Tale will be using repo2docker outside of Binder, I feel like I need to provide more context. I'm happy to move this to another issue if appropriate, since it's tangential to the main issue topic.

Can you explain a bit what you mean? My understanding of how docker works is that even if you put a EXPOSE <porthere> in the docker file you still need to specify how to map that port with -p port:port when you start the container. Do you think we should try and guess a good mapping instead of relying on the user to provide it?

My question was more motivated by the idea of having the generated Dockerfile/image accurately reflect what was passed to repo2docker.

For example, something like:

repo2docker --expose 8787 --cmd /usr/lib/rstudio-server/bin/rserver  <path-to-repo-with-R-support>

Would produce a Dockerfile with

FROM buildpack-deps:bionic
...
EXPOSE 8787
...
CMD ["/usr/lib/rstudio-server/bin/rserver"]

This same information could be used regardless of whether repo2docker is used to run the image. This wouldn't necessarily replace -p, since the user may still want to map to a different port during run.

The basic WT use case is as follows: A researcher comes to WT to create a Tale. They select from a set of supported interactive environments -- e.g., Jupyter or Rstudio. We run a vanilla base environment for them to start their work. They upload/create any necessary data/code/documentation, etc and custom configuration via repo2docker config files. (During development, they can rebuild the running environment to apply any config changes.) Once completed, they can publish the Tale to an external repository (e.g., DataONE, Dataverse, etc). Later, another user discovers and runs the Tale either from the WT system or via the external repository, potentially downloading a zip archive to run locally. At this point a given Tale will be based on a single environment -- Rstudio or Jupyter.

Note also that we're currently considering publishing the generated repo2docker Dockerfile as an additional artifact (similar to what's been discussed for the Odum CoRe2 project) so that a user doesn't need to run repo2docker to read it. From the archival perspective, the Dockerfile may also have value in the long run regardless of whether Docker is around or the image can actually be built.

For now, WT will use repo2docker to build (but not run) the image. We will necessarily need to store information about the port and default command for each environment (e.g., Jupyter, Rstudio). Looking at the typical Rstudio case where the user is not a Jupyter user, using repo2docker as described above the generated Dockerfile and image would have the wrong port (EXPOSE) and default command (CMD). We would need to include the Rstudio-specific information as part of the published Tale and would be less likely to include the generated Dockerfile, since it would potentially cause confusion. Using the current repo2docker implementation, I'd probably include a simple generated script or readme instructing the user how they could regenerate and run locally using repo2docker with the full command (#533 (comment))

However, by enabling overriding the EXPOSE and CMD during build as well as at runtime, the Dockerfile (and resulting image, if inspected) would reflect the intent of the user -- for anyone running it to use Rstudio, not Jupyter. From my understanding, Binder achieves this through the "launch in" badges/links which specify the environment a user should access.

@yuvipanda

This comment has been minimized.

Copy link
Member

yuvipanda commented Jan 5, 2019

@craig-willis Thank you for the well thought out comment. I've generated two specific issues from it: #545 and #546. Let's continue these discussions there.

Currently, the Dockerfile generated by repo2docker can't be built by itself - see #202 for discussions there. It is still useful as documentation, though.

@craig-willis

This comment has been minimized.

Copy link
Contributor

craig-willis commented Jan 5, 2019

Thanks, @yuvipanda -- I'll comment on the new tickets. I wasn't aware of #202 and that's very good to know.

@jonc1438 -- you may also be interested in parts of this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment