Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binder integration and paths #483

Open
jasmainak opened this issue Apr 30, 2019 · 16 comments
Open

binder integration and paths #483

jasmainak opened this issue Apr 30, 2019 · 16 comments

Comments

@jasmainak
Copy link
Contributor

jasmainak commented Apr 30, 2019

When trying to integrate binder links in the examples of MNE, we noticed with @larsoner a couple of problems.

  1. sphinx-gallery expects the notebooks to be in a directory called notebooks/. Firstly, this appears to be restrictive as this cannot allow for building documentation of multiple versions (as is the case for MNE). The second issue is that it makes wasteful copies of the notebooks as the notebooks are already available for download from the _downloads/ directory. Why are these not reused?

  2. The binder requirements file needs to be in a folder called binder/ relative to the root. Again, this is very restrictive and will involve manual steps at least in the case of MNE.

Could these requirements be relaxed somehow?

cc @choldgraf

@choldgraf
Copy link
Contributor

choldgraf commented May 6, 2019

yo - a couple thoughts:

  • re: your first point, that's the directory where the notebooks are generated, right? The reason notebooks are placed there (in addition to the _downloads directory, where they're placed by default) is in order to let them keep their relative file positioning (e.g. so the folder hierarchy in Binder is the same as the one in your repository). I agree it's redundant, but I couldn't find a way for things stored in the _downloads folder to be anything other than a flat list :-/
  • re: your second point, that's a hard restriction of Binder - environment files must be in either 1. the root of the repository, or 2. in a folder called either binder/ or .binder/ in the root of the repository.

I know it'll involve some degree of duplication, but I'm making the assumption that most folks are hosting documentation on a different branch from their main codebase (e.g. gh-pages) so hoping that some sub-optimal organization is OK on those branches

@larsoner
Copy link
Contributor

larsoner commented May 6, 2019

@choldgraf what do you mean by repository? Do you mean where the docs are built and eventually uploaded? Because the paths do not seem to be relative to that, but rather the host root.

To give a concrete example, let's say I want to be able to have docs up at two URLs with the same host https://mne-tools.github.io, say:

  • https://mne-tools.github.io/dev/...
  • https://mne-tolos.github.io/stable/...

In sphinx and sphinx-gallery in general this is not problematic because all file URLs are built relative to the sphinx build root directory using relative path names, so these can be uploaded to any arbitrary path like mne-tools.github.io/dev/... and all URLs links will end up looking like mne-tools.github.io/dev/whatever.html. However, it did not seem like binder worked that way, but rather that it required some files to live at the host root, i.e. mne-tools.github.io/notebooks, not mne-tools.github.io/dev/notebooks. Is it meant to work this way?

@choldgraf
Copy link
Contributor

I mean when Binder looks at a repo (generally, on GitHub) to decide how its environment should be built, it'll check the root of the repository, or the binder/ folders, for the environment files.

So relative to a git repository, you couldn't store the files in myrepo/mysubfolder/binder. They'd need to be in myrepo/binder/. There is a one-to-one mapping of "Binder environment" to "git repository" (so you can't specify multiple binder environments in a single repo, e.g. with mysubfolder1/binder/ and mysubfolder2/binder/

@jasmainak
Copy link
Contributor Author

jasmainak commented May 7, 2019

The reason notebooks are placed there (in addition to the _downloads directory, where they're placed by default) is in order to let them keep their relative file positioning

@choldgraf I don't quite follow. Shouldn't it be just a matter of using the right relative paths for the notebooks. How does one particular structure help as opposed to the other? At the end of the day, you are adding the relative path to the url so it shouldn't matter whether it's flat or not. Or am I missing something?

There is a one-to-one mapping of "Binder environment" to "git repository"

Fair enough but shouldn't it be still allowed to keep the binder environment in any location? In principle, you could have the path to the environment file (relative to the root of the git repository, or an arbitrary url on the web) added as a query string of the mybinder link. That is, a cleaner way to do this. Would this be something easy to incorporate?

Aside, but this one-to-one mapping assumption seems to also create surprising results when caching the docker images. I could be wrong but it seems to me that mybinder looks for changes in the git repository and rebuilds the image if it has been updated. Instead it should rebuild only when the environment has changed.

Let me know if this part of the discussion is more appropriate in one of the binder repositories (which one?). However, part one should be fixed as part of sphinx gallery I guess?

@choldgraf
Copy link
Contributor

If you're talking about where the dependency files live before you build the docs, those can be anywhere (and should be specified in the dependencies field of the binder config, see the config docs for info).

re: "you should only build a new docker image when the environment changes", that's a more complex question than it sounds, technically speaking. For now, and the foreseeable future, repo2docker (which Binder uses) will re-build the image any time the commit hash changes for a repository. It's been brought up in the repo2docker and binderhub repositories before but I can't find the links to the issues where it was discussed :-/ you're welcome to open another issue if you like, if so, probably jupyter/repo2docker is where the change would be made.

@banesullivan
Copy link
Contributor

banesullivan commented May 17, 2019

Something relevant that I've created is a script to upload all the notebooks that are autogenerated by sphinx-gallery to a separate repository that can be launched on MyBinder.

I created a cookie-cutter here that has all the needed details for MyBinder to properly install dependencies needed for our project, PyVista. Using this, I set up a script that runs after our documentation build on Travis CI that:

  1. Uses the cookie-cutter to make a new repo
  2. Copies all the autogenerated notebooks from sphinx-gallery into that cookie-cutter repo.
  3. Commits and force pushes those notebooks to the remote of that new repository

It's mildly hacky, but works perfectly for both PyVista and PVGeo. See the script for PyVista here but in brief, all you need is:

set -x
cookiecutter -f --no-input --config-file ./docs/my-package-binder-config.yml -o .. https://github.com/pyvista/cookiecutter-pyvista-binder.git;
rm -rf ../my-package-examples/notebooks/;
cd ./docs/;
find ./examples -type f -name '*.ipynb' | cpio -p -d -v ../../my-package-examples/;
cd ../../my-package-examples/;
git init;
git add .;
git commit -m "${TRAVIS_JOB_NUMBER} : Autogenerated notebooks from Travis";
REMOTE="https://${GH_TOKEN}@github.com/pyvista/my-package-examples";
git config --global user.name "${GH_NAME}";
git config --global user.email "${GH_EMAIL}";
git remote add origin ${REMOTE};
git push -uf origin master;
cd ../my-package/

# ... do other stuff like deploy using doctr
set +x

It would be really awesome, if there was a way to add a config parameter for sphinx-gallery so that the "Launch on Binder" could just point to that MyBinder URL and pass a file name.

For example, our Binder can be launched at:

https://mybinder.org/v2/gh/pyvista/pyvista-examples/master

So maybe we could just feed that URL to sphinx-gallery's config and it would append file paths for the notebooks. For example, we have a notebook at examples/01-filter/streamlines.ipynb

So all sphinx-gallery would need to do is make the Launch button and link the above URL plus
?filepath=examples/01-filter/streamlines.ipynb to create:

https://mybinder.org/v2/gh/pyvista/pyvista-examples/master?filepath=examples/01-filter/streamlines.ipynb

Note: this is the separate repo where we push the examples: https://github.com/pyvista/pyvista-examples

@larsoner
Copy link
Contributor

Something relevant that I've created is a script to upload all the notebooks that are autogenerated by sphinx-gallery to a separate repository that can be launched on MyBinder.

What limitation did you hit with the existing binder integration:

https://sphinx-gallery.github.io/configuration.html#binder-links

that forced you to do these extra manual steps?

@banesullivan
Copy link
Contributor

What limitation did you hit with the existing binder integration:

Setting up OpenGL headless displays and needing to install dependencies from specific dev channels on Anaconda.

@larsoner
Copy link
Contributor

This should "just" be a matter of setting up a proper dockerfile, no?

@larsoner
Copy link
Contributor

... perhaps mne-tools/mne-python#6177 can be helpful to see if there is an alternative / simpler way to do things? We didn't end up merging that PR because the MNE repo+dockerfile ended up being a bit "heavy" to rebuild all the time, but maybe this approach would simplify things for you.

@larsoner
Copy link
Contributor

I am no expert on these sorts of things, though -- @choldgraf knows more, any thoughts on this MyBinder integration methodology?

@jasmainak
Copy link
Contributor Author

One advantage I can see to @banesullivan 's approach is that all the notebooks live in a flat directory. So you can switch between notebooks easily (from the jupyter file menu) without having to relaunch binder (which can take a while).

I have a feeling that the mybinder integration in sphinx gallery needs more documentation for it to work for new repositories. It's not just a matter of making a pull request with the new config. You also need to make a pull request to the gh-pages branch or wherever the documentation is hosted with the new notebooks and the Dockerfile. This process is not automated and prone to error.

@banesullivan
Copy link
Contributor

banesullivan commented May 17, 2019

This should "just" be a matter of setting up a proper dockerfile, no?

In hindsight, yes, but I don't know how to make docekerfiles... I also could have set up all the MyBinder config parameters that come from that cookie-cutter in a separate repo and use the existing binder integration to push the notebooks to that repo.

Moving forward, I'll probably still use this approach for PyVista, PVGeo, this project, and coming soon a few other PyVista based projects as it allows me to control all of the setup for the Binder in one place (the cookie-cutter) so that way if a dependency needs to be changed or some environmental variable needs to be added, I only have to add it in the cookie-cutter for it to propagate across all these projects.

Honestly, I think when I made that MyBinder script, I didn't realize Sphinx-Gallery had this capability

@banesullivan
Copy link
Contributor

approach is that all the notebooks live in a flat directory. So you can switch between notebooks easily (from the jupyter file menu) without having to relaunch binder (which can take a while).

Ah yes, this has been super useful for new comers wanting to test drive a few of our examples really quickly

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented May 17, 2019 via email

@jasmainak
Copy link
Contributor Author

but I don't know how to make docekerfiles

take a look at the PR that @larsoner linked to above. It has a Dockerfile and in particular look at this line at the end:

ENTRYPOINT ["tini", "-g", "--", "xvfb-run"]

to make it work in headless mode. You can use repo2docker to test it locally without having to push to github.

Do you think that you can help here?

I knew I had set up a trap for myself :) I will try to set it up for another repo in a couple of days so I can help improve the docs when I do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants