Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an image with R in it #163

Open
yuvipanda opened this issue Nov 19, 2020 · 8 comments
Open

Add an image with R in it #163

yuvipanda opened this issue Nov 19, 2020 · 8 comments

Comments

@yuvipanda
Copy link
Member

In 2i2c-org/farallon-image#15, we added R to a base pangeo image to so folks from Farallon can use it. Most of that is upstreamable - would there be interest in adding an R image here? It would provide the R kernel, RStudio, and probably an install.R onbuild system similar to what repo2docker offers.

@rsignell-usgs
Copy link
Member

The USGS has a huge R community and we have Rstudio on our Pangeo JupyterHub. So big thumbs up here!
2020-11-19_6-48-38

@rabernat
Copy link
Member

Are there strong reasons to have R in the same image as python? Do users often switch between them in the same session? @rsignell-usgs's solution seems simpler.

@willirath
Copy link
Member

Until a few days ago, I'd have said there aren't. But then I saw a PhD student as she "quickly drop[ed] to R, because the API for some online repository of observational data is more mature there", pulled a dataframe back to Python, and powered on with Pandas a minute later all without ever leaving the same notebook.

@scottyhq
Copy link
Member

We ran a hub this summer that had both Python and Rstudio (installed via conda-forge) https://github.com/RPVote/jhub-rstudio. The image size was gigantic, and while people did jump back and forth between interfaces (jupyter notebook or rstudio), they did mostly seem to stick to one language.

I guess another question - are R users connecting to a k8s cluster for distributed computing? This isn't really necessary for a pangeo image (in fact most pangeo hub users are likely using dask-gateway a small fraction of the time!), but a key design principle for this repository was to force pangeo hub images to use the same dask versions to stay in sync. So images in this repo get updated together rather than separately.

Another option, if 2i2c or USGS images were pushed to DockerHub or Quay that would make it easy for any hub to point to them, right?

@rabernat
Copy link
Member

We also had an earlier version that had a Julia environment bundled into the image.

This brings us back to a discussion we had earlier. It would be amazing if we could provide environments via shareable kernels rather than docker images. If there were some way to publish kernels, that would be much simpler. The docker image could then just contain the base jupyterhub stuff and leave all the computational details to the kernels.

@yuvipanda
Copy link
Member Author

Until a few days ago, I'd have said there aren't. But then I saw a PhD student as she "quickly drop[ed] to R, because the API for some online repository of observational data is more mature there", pulled a dataframe back to Python, and powered on with Pandas a minute later all without ever leaving the same notebook.

I think this is a big use case. Another is sharing code between users who use primarily python and users who primarily use R. This often leads to mixed code, so having them in the same image is very useful.

However, that doesn't mean it needs to be in this repository. It's useful for other repos to build on top of this repo, with all the automation this has. So that would be an option, especially since we can move it into this repo if there is a lot of use.

Another option is to start from the R specific rocker image, and add PANGEO specific things (dask-gateway, etc). However, I think that'll be a lot more work to keep up to date.

@lucamarletta
Copy link

I'm new in the community, I'm implementing a cluster for researchers in EU facility and I'd be very interested to have RStudio spawned by jupyterhub.

Is it planned or in progress?

@TomAugspurger
Copy link
Member

TomAugspurger commented Apr 5, 2021

I have an environment that builds off our base image, but doesn't include RStudio. It only offers jupyterlab as a UI. Would people be interested in that as a stop-gap until optional RStudio support can be integrated into our base image?

edit: The branch is at https://github.com/pangeo-data/pangeo-docker-images/compare/feature/r?expand=1. I haven't done any work to integrate it into this repository's build automation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants