Run rstudio directly from repo2docker #533

yuvipanda · 2018-12-20T17:41:32Z

If you have an install.R file, we install RStudio into the image. By default, we use nbressionproxy to make this available from JupyterHub / MyBinder.org, but if you are just building images with repo2docker you don't have to use Jupyter at all - you can just launch directly into rstudio.

We should document this.

/cc @craig-willis

yuvipanda · 2018-12-20T18:12:13Z

If I run repo2docker -p 8788:8787 <path-to-repo-with-R-support> /usr/lib/rstudio-server/bin/rserver on my local machine, I can access rstudio on localhost:8788

psychemedia · 2019-01-04T11:42:56Z

Related to this, is there a general pattern that could be documented for running arbitrary applications under a similar model, eg @betatim's nbopenrefineproxy.

betatim · 2019-01-04T15:30:53Z

I think the general pattern is that you need to know three bits of information:

which port the process will be listening on (8788 in this case)
which command to execute to start the process
how to integrate these two bits of information in the repo2docker command-line

This means that you can use it to start openrefine directly, but you will have to know which port needs forwarding and what the openrefine command is.

craig-willis · 2019-01-04T16:12:33Z

Has there been any discussion of putting the port and command into the generated Dockerfile instead of overriding values during docker run? For the WT use case, if we build an image to run Rstudio, it seems odd to have the default command be jupyter.

psychemedia · 2019-01-04T18:20:25Z

If the port is aliased, by a proxy path, you presumably don't need to know the port number?

Some services might allow the port on which they are run to be specified, in which case it would make sense to have a standard recipe / convention for handling that, eg a standardised way/convention for passing a port number in via a docker environment variable via -e arguments?

Something that struck me in context of the OpenRefine service, which was started on a specified port that was dynamically allocated, was that it would be useful to have the port number available via introspection eg within a notebook kernel, eg by writing it to a config file in a standard location.

By the by, one other thing I noticed running repo2docker on a mac was that the default HOME user inside the built container was the same user as the account I ran repo2docker with on host, whereas I'd expected it yo be jovyan for no very good reason. This leaks information into a Docker image for an unwary user who doesn't run repo2docker eg with --user jovyan set.

yuvipanda · 2019-01-04T18:41:30Z

Adding aliases is easy now with https://jupyter-server-proxy.readthedocs.io/

betatim · 2019-01-04T20:44:58Z

I think we are talking about two different things here. This issue is specifically about directly launching an executable (like RStudio or OpenRefine) without going via the notebook server and its proxy.

I think it is worth creating separate issues for discussing these topics and things related to each.

Has there been any discussion of putting the port and command into the generated Dockerfile instead of overriding values during docker run?

Can you explain a bit what you mean? My understanding of how docker works is that even if you put a EXPOSE <porthere> in the docker file you still need to specify how to map that port with -p port:port when you start the container. Do you think we should try and guess a good mapping instead of relying on the user to provide it?

psychemedia · 2019-01-05T01:47:15Z

Does Binderhub always start with a CMD that launches jupyterhub? i.e. could MyBinder be used just to run RStudio? More generally, is there a set-up for MyBinder where I could autostart a service in addition to a notebook server, such as a postgres database?

psychemedia · 2019-01-05T01:50:20Z

(@betatim re: separate issues: yes, apols, I was conflating user lumped ideas of starting arbitrary services together. eg ones that a user might start from a notebook menu and run via a proxy, ones that might autostart, etc)

betatim · 2019-01-05T20:02:07Z

Currently the creator of a binder/repo can't control the command that BinderHub will run. You can hook yourself into the startup process though via https://repo2docker.readthedocs.io/en/latest/config_files.html#start-run-code-before-the-user-sessions-starts aka start up a DB.

(I don't know what will happen if you don't add exec $@ to the end of your start file. That would seem like a hack worth trying to see if you can "override" the command used from the outside. I'd expect weird stuff to happen. And would definitely class it as "voids your warranty" use)

yuvipanda · 2019-01-05T20:38:33Z

BinderHub requires a notebook server to be running as CMD, since it does token authentication through that. One option is to use a simple binary that does proxying, supervision & token authentication only, which might be useful to explore. The alternative I've been exploring is, of course, jupyter-server-proxy, which can now spawn and supervise additional processes, including postgres. The ability to autostart services and have arbitrary readiness checks doesn't exist yet, but could be added easily!

craig-willis · 2019-01-05T20:49:31Z

This is in response to @betatim's question #533 (comment). Sorry to be so verbose, but since Whole Tale will be using repo2docker outside of Binder, I feel like I need to provide more context. I'm happy to move this to another issue if appropriate, since it's tangential to the main issue topic.

Can you explain a bit what you mean? My understanding of how docker works is that even if you put a EXPOSE <porthere> in the docker file you still need to specify how to map that port with -p port:port when you start the container. Do you think we should try and guess a good mapping instead of relying on the user to provide it?

My question was more motivated by the idea of having the generated Dockerfile/image accurately reflect what was passed to repo2docker.

For example, something like:

repo2docker --expose 8787 --cmd /usr/lib/rstudio-server/bin/rserver  <path-to-repo-with-R-support>

Would produce a Dockerfile with

FROM buildpack-deps:bionic
...
EXPOSE 8787
...
CMD ["/usr/lib/rstudio-server/bin/rserver"]

This same information could be used regardless of whether repo2docker is used to run the image. This wouldn't necessarily replace -p, since the user may still want to map to a different port during run.

The basic WT use case is as follows: A researcher comes to WT to create a Tale. They select from a set of supported interactive environments -- e.g., Jupyter or Rstudio. We run a vanilla base environment for them to start their work. They upload/create any necessary data/code/documentation, etc and custom configuration via repo2docker config files. (During development, they can rebuild the running environment to apply any config changes.) Once completed, they can publish the Tale to an external repository (e.g., DataONE, Dataverse, etc). Later, another user discovers and runs the Tale either from the WT system or via the external repository, potentially downloading a zip archive to run locally. At this point a given Tale will be based on a single environment -- Rstudio or Jupyter.

Note also that we're currently considering publishing the generated repo2docker Dockerfile as an additional artifact (similar to what's been discussed for the Odum CoRe2 project) so that a user doesn't need to run repo2docker to read it. From the archival perspective, the Dockerfile may also have value in the long run regardless of whether Docker is around or the image can actually be built.

For now, WT will use repo2docker to build (but not run) the image. We will necessarily need to store information about the port and default command for each environment (e.g., Jupyter, Rstudio). Looking at the typical Rstudio case where the user is not a Jupyter user, using repo2docker as described above the generated Dockerfile and image would have the wrong port (EXPOSE) and default command (CMD). We would need to include the Rstudio-specific information as part of the published Tale and would be less likely to include the generated Dockerfile, since it would potentially cause confusion. Using the current repo2docker implementation, I'd probably include a simple generated script or readme instructing the user how they could regenerate and run locally using repo2docker with the full command (#533 (comment))

However, by enabling overriding the EXPOSE and CMD during build as well as at runtime, the Dockerfile (and resulting image, if inspected) would reflect the intent of the user -- for anyone running it to use Rstudio, not Jupyter. From my understanding, Binder achieves this through the "launch in" badges/links which specify the environment a user should access.

yuvipanda · 2019-01-05T21:03:23Z

@craig-willis Thank you for the well thought out comment. I've generated two specific issues from it: #545 and #546. Let's continue these discussions there.

Currently, the Dockerfile generated by repo2docker can't be built by itself - see #202 for discussions there. It is still useful as documentation, though.

craig-willis · 2019-01-05T21:22:26Z

Thanks, @yuvipanda -- I'll comment on the new tickets. I wasn't aware of #202 and that's very good to know.

@jonc1438 -- you may also be interested in parts of this discussion.

nasiegel88 · 2022-06-08T21:31:34Z

If I run repo2docker -p 8788:8787 <path-to-repo-with-R-support> /usr/lib/rstudio-server/bin/rserver on my local machine, I can access rstudio on localhost:8788

How are you doing this? I am running a similar command and I get what appears to be a success message however, going to localhost:8888 yields an error.

repo2docker -p 8888:8787 $PWD /usr/lib/rstudio-server/bin/rserver

Results in...

 ---> Running in a448e223cb32
Removing intermediate container a448e223cb32
 ---> b75ea650e2ed
Step 57/61 : ENV PYTHONUNBUFFERED=1
 ---> Running in cc6fdfb1ea0f
Removing intermediate container cc6fdfb1ea0f
 ---> 68006a8975b1
Step 58/61 : COPY /python3-login /usr/local/bin/python3-login
 ---> 74e01a8e934f
Step 59/61 : COPY /repo2docker-entrypoint /usr/local/bin/repo2docker-entrypoint
 ---> 4f85a207d078
Step 60/61 : ENTRYPOINT ["/usr/local/bin/repo2docker-entrypoint"]
 ---> Running in b740cb1ff035
Removing intermediate container b740cb1ff035
 ---> 51855a58f677
Step 61/61 : CMD ["jupyter", "notebook", "--ip", "0.0.0.0"]
 ---> Running in 72db9f88b2c2
Removing intermediate container 72db9f88b2c2
 ---> 2fc93bf925c3
{"aux": {"ID": "sha256:2fc93bf925c377b0562d9e5560ff1f52f19643da3d676d13ad63521e22a36279"}}Successfully built 2fc93bf925c3
Successfully tagged r2d-2fhome-2fnoah-2fprojects-2fh-2epylori-2dsiegel-5fet-5fal-5f20221654723301:latest
CONTAINER FINISHED RUNNING.

betatim mentioned this issue Dec 24, 2018

Bump default python version to 3.7 #539

Merged

This was referenced Jan 5, 2019

Emit an EXPOSE entry for ports specified in -p to repo2docker #545

Open

Record default command used to run repo2docker in Dockerfile #546

Open

craig-willis mentioned this issue Jan 17, 2019

Build Tale image using repo2docker whole-tale/gwvolman#43

Merged

craig-willis mentioned this issue Feb 26, 2019

repo2docker integration discussion whole-tale/whole-tale#52

Closed

yuvipanda added the documentation label May 21, 2019

nuest mentioned this issue Sep 4, 2019

Add option to generate non-Jupyter Dockerfiles #776

Open

bryanjonas mentioned this issue Nov 5, 2019

Easy way to spawn directly into RStudio yuvipanda/repo2dockerspawner#6

Closed

fomightez mentioned this issue Dec 11, 2019

"desktop" link fails when install.R and runtime.txt are added yuvipanda/jupyter-desktop-server#13

Open

jtpio mentioned this issue Jun 25, 2020

Offer RStudio as default frontend for R users plasmabio/tljh-repo2docker#29

Open

sgibson91 mentioned this issue Oct 20, 2023

Unable to launch RStudio from repo2docker #1316

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run rstudio directly from repo2docker #533

Run rstudio directly from repo2docker #533

yuvipanda commented Dec 20, 2018

yuvipanda commented Dec 20, 2018

psychemedia commented Jan 4, 2019

betatim commented Jan 4, 2019

craig-willis commented Jan 4, 2019

psychemedia commented Jan 4, 2019

yuvipanda commented Jan 4, 2019

betatim commented Jan 4, 2019

psychemedia commented Jan 5, 2019 •

edited

psychemedia commented Jan 5, 2019

betatim commented Jan 5, 2019

yuvipanda commented Jan 5, 2019

craig-willis commented Jan 5, 2019 •

edited

yuvipanda commented Jan 5, 2019

craig-willis commented Jan 5, 2019

nasiegel88 commented Jun 8, 2022

Run rstudio directly from repo2docker #533

Run rstudio directly from repo2docker #533

Comments

yuvipanda commented Dec 20, 2018

yuvipanda commented Dec 20, 2018

psychemedia commented Jan 4, 2019

betatim commented Jan 4, 2019

craig-willis commented Jan 4, 2019

psychemedia commented Jan 4, 2019

yuvipanda commented Jan 4, 2019

betatim commented Jan 4, 2019

psychemedia commented Jan 5, 2019 • edited

psychemedia commented Jan 5, 2019

betatim commented Jan 5, 2019

yuvipanda commented Jan 5, 2019

craig-willis commented Jan 5, 2019 • edited

yuvipanda commented Jan 5, 2019

craig-willis commented Jan 5, 2019

nasiegel88 commented Jun 8, 2022

psychemedia commented Jan 5, 2019 •

edited

craig-willis commented Jan 5, 2019 •

edited