New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding binder custom tree option #371

Merged
merged 16 commits into from May 31, 2018

Conversation

Projects
None yet
5 participants
@choldgraf
Contributor

choldgraf commented Apr 27, 2018

This PR adds the ability to collect jupyter notebooks in a way that preserves their relative paths/folders. It also generates Binder URLs that point to these paths.

The challenge

To my knowledge, sphinx gallery doesn't have access to the _downloads folder until after the site is built. To that extent, I don't think we can modify / duplicate those files after they've been created. I also don't think it's possible to tell the :download: directive to put a file in some relative path in _downloads. It only dumps it into the root of the directory.

So I think our only option (used in this PR) is kind of hacky: putting the folders/files in _static just before the site is built.

So, this PR does:

  1. If the binder configuration static_folder is defined (relative to conf.py) it will trigger the following process:
  2. Each ipynb file will be copied to a sub-tree within <staticfolder>/binder/path/to/ntbk.ipynb during the build process
  3. Binder links will point to this sub-tree rather than the _downloads folder

It also adds in the ability to specify use_lab=True in the binder config, in which case it will change the end of the link from:

?filepath=_downloads/myfile.ipynb to ?urlpath=lab/tree/_downloads/myfile.ipynb

LMK what folks think! It's a bit hacky but I think our only option. In the docs, we should recommend that people put the output folder under their gitignore.

Here's a binder link that shows what the final folder/file structure looks like:

https://mybinder.org/v2/gh/choldgraf/sphinx-gallery/gh-pages?urlpath=lab/tree/_static/binder/auto_examples/plot_colors.ipynb

LMKWYT!

cc @lesteve

closes #353 #368

doc/conf.py Outdated
@@ -341,10 +341,15 @@ def setup(app):
'find_mayavi_figures': find_mayavi_figures,
'expected_failing_examples': ['../examples/no_output/plot_raise.py',
'../examples/no_output/plot_syntaxerror.py'],
'binder': {'org': 'sphinx-gallery',
'repo': 'sphinx-gallery.github.io',
'binder': {#'org': 'sphinx-gallery',

This comment has been minimized.

@choldgraf

choldgraf Apr 27, 2018

Contributor

I will update these back to their respective values when we're +1 on a merge...this is just for testing

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 1, 2018

Thanks a lot for the PR! I have not looked at the details yet, but the main question I have is: is there a good reason that we have to use _static? I would have thought that the variable should be called notebooks_output_folder or something similar and that and that the user could chose any folder he wants but I may be missing a sphinx quirk.

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 1, 2018

Also some CI failures seem genuine, but I don't think that is the main problem right now.

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 1, 2018

@lesteve as far as I could figure out, sphinx won't copy over anything that isn't RST or markdown (e.g., it doesn't copy over anything other than this in the "auto_examples" folder, but correct me if I'm wrong) so the only option is to put it in _static/myfolder.

Another option is if we could edit html_static_path. If you changed conf.py so that you had a line like:

html_static_path = ['_static', 'myfolder']

then it would copy over anything that's inside myfolder.

Is there a way to edit the sphinx configuration at build time? Alternatively, we could instruct users via docs to take this step, though it'd add another step to an already cumbersome process.

Maybe we should:

  1. Rename the binder configuration variable to notebooks_output_folder
  2. In the docs say that they can either make this a sub-folder of _static (so something like _static/myfolder) or they will need to add the folder to the list in html_static_path.

WDYT?

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 2, 2018

Can we not copy the notebooks ourselves? This was my naive strategy originally:

  • we may be able to know the html output folder (_build/html in the sphinx-gallery case): if app is a Sphinx object, it looks like app.outdir is what we want.
  • once we know the html output folder we add notebooks_output_folder to it so that the notebook are inside the generated html. That means that the notebooks are part of the repo and we can use binder with the appropriate URLs.
@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 2, 2018

mmm, I think we could do this...it feels a little bit hacky since we're basically totally circumnavigating the build process, but perhaps that's fine. What do others think? @Titan-C @GaelVaroquaux @larsoner

So you're suggesting we:

  1. Let users specify output_dir name
  2. Use the sphinx configuration to find build/output/path
  3. Manually create that folder if it doesn't exist and move all notebooks to build/output/path/html/output_dir/myfolder/myfile.ipynb etc.

yeah?

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 3, 2018

Something like this, copying the notebooks rather than moving seems safer. Just to be clear, in my mind notebook_output_dir is a subfolder or app.outdir. Just to take an example in the sphix-gallery case, if I have notebook_output_dir='my/notebooks' I should have the notebooks in _build/html/my/notebooks.

I agree that this is a bit working around sphinx but I feel this is a bit what sphinx-gallery is already doing to some respect (e.g. the backreferences aka mini-galleries in the reference API doc comes to mind). Of course, if there are ways to do it the sphinx way for this PR or others, this would be more than welcome!

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 3, 2018

lemme give this a shot and we'll see how it looks :-)

@codecov-io

This comment has been minimized.

codecov-io commented May 3, 2018

Codecov Report

Merging #371 into master will increase coverage by 0.9%.
The diff coverage is 96.62%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #371     +/-   ##
=========================================
+ Coverage   92.07%   92.97%   +0.9%     
=========================================
  Files          27       27             
  Lines        1842     1908     +66     
=========================================
+ Hits         1696     1774     +78     
+ Misses        146      134     -12
Impacted Files Coverage Δ
sphinx_gallery/gen_rst.py 95.96% <100%> (+0.28%) ⬆️
sphinx_gallery/tests/test_binder.py 100% <100%> (ø) ⬆️
sphinx_gallery/gen_gallery.py 86.74% <100%> (+0.94%) ⬆️
sphinx_gallery/tests/test_gen_gallery.py 100% <100%> (ø) ⬆️
sphinx_gallery/binder.py 95.95% <94.11%> (+15.95%) ⬆️
sphinx_gallery/tests/test_docs_resolv.py 100% <0%> (ø) ⬆️
sphinx_gallery/docs_resolv.py 71.3% <0%> (+0.84%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0496ff3...f666660. Read the comment docs.

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 3, 2018

@lesteve what do you think about the latest push? You can now specify an arbitrary folder, and the notebooks will be copied to that folder within _build.

e.g.:

http://predictablynoisy.com/sphinx-gallery/auto_examples/plot_seaborn.html#sphx-glr-auto-examples-plot-seaborn-py

I think part of the errors are because I'm using a python 3 function somewhere, I'll try and run that down, but in the meantime WDYT about the current pattern?

Also - do you think that we should actually do this automatically, rather than have people provide a config to enable it? E.g., we could basically make notebooks_dir to default to notebooks.

link_base, replace_py_ipynb(relative_link))
else:
# Assume that we're in the root of _downloads
fname = os.path.basename(fname)

This comment has been minimized.

@Titan-C

Titan-C May 4, 2018

Member

why do you need this extra basename?

This comment has been minimized.

@choldgraf

choldgraf May 4, 2018

Contributor

I believe that fname here is a full path, so this just grabs the name of the file itself

save_notebook(example_nb, path_ipynb)
# Copy the ipynb file to the static folder, preserving folder structure
if binder_conf.get('notebooks_folder'):

This comment has been minimized.

@Titan-C

Titan-C May 4, 2018

Member

I don't like doing the copy part here nor passing the app object this deep, this breaks my desire of making sphinx-gallery more modular. This copy task should happen at the end of the build in the same way we implement embed_code_links.
That is have in setup something like this

app.connect('build-finished',copy_jupyter_nb)

And then in binder.py the funtion that does the copy.

This comment has been minimized.

@Titan-C

Titan-C May 4, 2018

Member

maybe for the the copy just use an os.walk and copy everything that is a notebook.

This comment has been minimized.

@choldgraf

choldgraf May 4, 2018

Contributor

yeah I think os.walk is the path forward here...lemme give a shot at modularizing this a little bit more.

@choldgraf choldgraf force-pushed the choldgraf:notebook_output branch from b204508 to 8409a74 May 5, 2018

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 5, 2018

ok I think the latest push should address much of @lesteve 's concerns. It factors out the notebook copying step to the very end as you suggested. Also uses os.walk now, so perhaps that means the python 2 builds will be happy again

@choldgraf choldgraf force-pushed the choldgraf:notebook_output branch from 8409a74 to 2e7ea3e May 7, 2018

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 7, 2018

oops, that was @Titan-C 's concerns not @lesteve 's concerns :-)

Latest version sets notebooks_folder to default to notebooks/, and lets people change this if they wish.

LMK if there's anything else I should tackle other than making tests pass.

@choldgraf choldgraf force-pushed the choldgraf:notebook_output branch 6 times, most recently from 66ca801 to 5d7b6e5 May 7, 2018

@choldgraf choldgraf force-pushed the choldgraf:notebook_output branch from 5d7b6e5 to a37b827 May 7, 2018

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 7, 2018

OK I think we're looking good w/ tests and coverage. This is ready for another review, or ready to 🚢

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 7, 2018

Thanks a lot for your work on this! I quickly tried out this PR by making make clean html and I am a bit confused by the structure of _build/html/notebooks. To me it looks like the structure of the examples folders is not correctly mirrored on the generated notebook side:

As an example, sin_func is a subfolder of examples

❯ tree -d examples 
examples
├── no_output
└── sin_func

so in the generated notebook folders the same should be true, but here sin_func is on the same level as auto_examples:

❯ tree -d doc/_build/html/notebooks     
doc/_build/html/notebooks
├── auto_examples
├── no_output
├── seaborn
├── sin_func
└── tutorials

Another small comment before I look more closely at the PR diff, we are using _dir for other directory options we may want to use a similar name rather than _folder for consistency reasons.

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 7, 2018

huh - yes I see the same thing too! I didn't check to see whether extra files and folders were created haha...lemme check on this. And I agree with _dir being consistent with the broader project

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 7, 2018

OK I think I fixed it up. Here's the my build tree now:

doc/_build/html/notebooks/
├── auto_examples
│   ├── no_output
│   │   ├── just_code.ipynb
│   │   ├── plot_raise.ipynb
│   │   ├── plot_strings.ipynb
│   │   └── plot_syntaxerror.ipynb
│   ├── plot_choose_thumbnail.ipynb
│   ├── plot_colors.ipynb
│   ├── plot_exp.ipynb
│   ├── plot_function_identifier.ipynb
│   ├── plot_gallery_version.ipynb
│   ├── plot_quantum.ipynb
│   ├── plot_seaborn.ipynb
│   ├── plot_sys_argv.ipynb
│   ├── plot_unicode_everywhere.ipynb
│   └── sin_func
│       ├── plot_sin_black_background.ipynb
│       └── plot_sin.ipynb
└── tutorials
    ├── plot_notebook.ipynb
    ├── plot_parse.ipynb
    └── seaborn
        └── plot_seaborn_notebook.ipynb

also replaced _folder with _dir

@choldgraf choldgraf force-pushed the choldgraf:notebook_output branch from 4778867 to 4cb5f12 May 13, 2018

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 13, 2018

Thanks for the patch! I cleaned up the filtering function a little bit since the multi-part boolean logic was confusing to me. If that looks OK to you, I think this is ready to go from my end. Here's a link to the live build:

https://predictablynoisy.com/sphinx-gallery/auto_examples/sin_func/plot_sin.html#sphx-glr-auto-examples-sin-func-plot-sin-py

I'll hold off until I get a 👍 from @lesteve and @Titan-C , and then I'll change the old config before merge.

@Titan-C

This comment has been minimized.

Member

Titan-C commented May 13, 2018

I liked my list comprehension a lot more it almost reads like english(I'm aware english is a terribly ambiguous language and that generates a lot of confusion too), but if you think your alternative is more maintainable is fine although for such an isolated function we wont really need to maintain it. On a second review and taking into account your changes in the docstring, we do need to work a bit more on this. It's a contradicting to call the function ignore and then talking about keeping stuff(I'm the first to be blamed).

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 14, 2018

Sounds good - I am all for beautifully complex list comprehensions, but I must admit it took me like 10 minutes to figure out what that one was doing :-)

I think the main thing that confused me was these lines:

entry.endswith('.ipynb') != os.path.isdir(os.path.join(path, entry))

anyway feel free to suggest alternate namings etc...

@Titan-C

This comment has been minimized.

Member

Titan-C commented May 14, 2018

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 14, 2018

I tried updating the functions name/description/structure to make it a bit more clear...LMK what you think (or feel free to make edits)

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 18, 2018

Sorry I haven't had time to look at this which I think is really important. I'll try to find some time beginning of next week.

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 24, 2018

friendly ping on this :-)

@Titan-C

This comment has been minimized.

Member

Titan-C commented May 24, 2018

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 24, 2018

ok - updated the conf

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 25, 2018

I am going to look at this one today.

@lesteve

A few comments, generally looks good.

It would be nice to have this tested on a repo that use a separate source and build directories (I think scikit-image does that for example) just to make sure there is no suprise here.

# Optional keys
'filepath_prefix': '<prefix>' # A prefix to append to any filepaths in Binder links.
'notebooks_dir': '<notebooks-directory-name>' # Jupyter notebooks for Binder will be moved to this directory (relative to site root).
'use_lab': <bool> # Whether Binder links should start Jupyter Lab instead of the Jupyter Notebook interface.

This comment has been minimized.

@lesteve

lesteve May 25, 2018

Contributor

I would slightly prefer use_jupyter_lab to be more explicit.

use this if you move your built documentation to a sub-folder of your repository (e.g., "v2.1")
# Optional keys
'filepath_prefix': '<prefix>' # A prefix to append to any filepaths in Binder links.
'notebooks_dir': '<notebooks-directory-name>' # Jupyter notebooks for Binder will be moved to this directory (relative to site root).

This comment has been minimized.

@lesteve

lesteve May 25, 2018

Contributor

Will they be moved or copied? It seems like this is not consistent between here and a few lines below.

}
}
Note that ``branch:`` should be the branch on which your documentation is hosted.
If you host your documentation on GitHub, this is usually ``gh-pages`` or ``master``.
A copy of each generated Jupyter Notebook will be copied to the folder

This comment has been minimized.

@lesteve

lesteve May 25, 2018

Contributor

Each generated Jupyter notebook will be copied ...

}
}
Note that ``branch:`` should be the branch on which your documentation is hosted.
If you host your documentation on GitHub, this is usually ``gh-pages`` or ``master``.
A copy of each generated Jupyter Notebook will be copied to the folder
specified in ``notebooks_dir``. Binder links will point to these notebooks.

This comment has been minimized.

@lesteve

lesteve May 25, 2018

Contributor

It would be nice to specify that notebooks_dir is a subfolder of the sphinx output directory (or whatever the appropriate sphinx name for this is).

binder_fpath = '/'.join([fpath_prefix.strip('/'), binder_fpath])
path_link = '/'.join([fpath_prefix.strip('/'), path_link])
# Make sure we have the right slashes

This comment has been minimized.

@lesteve

lesteve May 25, 2018

Contributor

Maybe add "on Windows"

gallery_conf = app.config.sphinx_gallery_conf
binder_conf = check_binder_conf(gallery_conf.get('binder'))
# Copy the requirements files for binder

This comment has been minimized.

@lesteve

lesteve May 25, 2018

Contributor

Remove the comment.

if entry.endswith('.ipynb'):
# Don't include ipynb files
pass
elif (entry != "images") and os.path.isdir(os.path.join(path, entry)):

This comment has been minimized.

@lesteve

lesteve May 25, 2018

Contributor

Can you explain the use-case for keeping folders named images?

This comment has been minimized.

@choldgraf

choldgraf May 25, 2018

Contributor

not sure, this one was @Titan-C 's , I was just adapting it w/o changing functionality :-)

This comment has been minimized.

@Titan-C

Titan-C May 25, 2018

Member

We keep images because is an ignore list and I don't care about them. We need the notebooks and directories containing them. The image folder just adds to the noise of folders copied.

gallery_dirs, 'copying binder notebooks...', length=len(gallery_dirs))
for i_folder in iterator:
shutil.copytree(os.path.join(app.srcdir, i_folder),

This comment has been minimized.

@lesteve

lesteve May 25, 2018

Contributor

In terms of readability I feel it would be clearer to walk the folder and copy all the .ipynb files explicitly. You would have to change the the beginning of the path from app.srcdir to i_folder by hand.

In other words if we only want to select *.ipynb files, I find the ignore API is not that well suited.

Reflecting about this though, I am wondering whether notebooks need other files to be fully functional. One example that comes to mind is if you do an import of a local python module (I seem to remember someone, maybe matplotlib, was doing it).

This comment has been minimized.

@Titan-C

Titan-C May 25, 2018

Member

I first proposed to use os.walk but after looking at @choldgraf's use of it I though it was a very bad idea to push forward, I needed few nested loops and you are joining paths all the time manually. I find the copytree implementation way cleaner. It copies the entire tree structure of the examples but only keeping the notebook files. It may feel counter intuitive to think about the ignore list but it's simpler to extend later on.
The ignore list is easier to extend and test as it is a separate function.

Regarding your use case of needing to copy other files that are not an example, the ignore list makes it easier. Unfortunately we will meet other roadblocks on that area regarding the way Sphinx-Gallery works. Examples are run in the examples_dirs, and thus look for file local to that place, in that place we put local modules, data files(like nilearn), or what ever else needed to run the script. But the gallery_dirs was only planned for web display, thus has only source code of executed file, rst representation, jupyter notebook representation and output images.

That means if we need to copy other files to give a full reproducible experience where we need local files, we would need to add an extra copy procedure to copy from the examples_dirs

This comment has been minimized.

@Titan-C

Titan-C May 25, 2018

Member

No offence is meant to @choldgraf for his implementation. That was probably the way to use os.walk, it just didn't look good to have it.

This comment has been minimized.

@choldgraf

choldgraf May 25, 2018

Contributor

No worries - I just want a clear and short path forward to get this PR merged. What shall we agree on for this particular point? Keep it the same?

This comment has been minimized.

@lesteve

lesteve May 28, 2018

Contributor

I think it is fine to keep it like this for now.

'gallery_dirs': ['ex'],
'binder': {'url': 'http://test1.com', 'org': 'org',
'repo': 'repo', 'branch': 'branch',
'dependencies': 'requirements.txt'}

This comment has been minimized.

@lesteve

lesteve May 25, 2018

Contributor

It would be nice to have a non-default notebook folder here, just to double-check that the notebook output directory is fine.

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 25, 2018

I believe that I addressed all of @lesteve 's comments (there are two remaining ones where it wasn't clear if/what I should change). Let me know what you guys think!

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 28, 2018

I ran this on scikit-learn and I am seeing this warning:

WARNING: Duplicate file name(s) found. Having duplicate file names will break some links. List of files: ['../examples/tree/plot_iris.py']

Unless I am missing something, we are now mirroring the folder structure, so duplicate file names are not a problem anymore and we should remove this logic.

@larsoner

This comment has been minimized.

Contributor

larsoner commented May 28, 2018

@Titan-C

This comment has been minimized.

Member

Titan-C commented May 28, 2018

Here we are copying the folder structure to generate the binder links. Be it for single notebook or binder lab. In this way we should not see any problems with links, which was the unpredictable part as name collisions would change the filenames when they were moved to the _downloads folder. Regardless of that, keeping this as a warning is very useful and we should keep it.

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 28, 2018

OK fine for leaving it as is. I checked and found #244 (comment) for example. Although the warnings was introduced in a binder context but you could argue that it makes sense in general.

@larsoner

This comment has been minimized.

Contributor

larsoner commented May 28, 2018

@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 28, 2018

so.......is there anything for me to do here? :-)

lesteve added some commits May 31, 2018

@lesteve

This comment has been minimized.

Contributor

lesteve commented May 31, 2018

I pushed some tweaks, merging, thanks a lot @choldgraf, great stuff !

@lesteve lesteve merged commit da0771f into sphinx-gallery:master May 31, 2018

0 of 3 checks passed

ci/circleci CircleCI is running your tests
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@choldgraf

This comment has been minimized.

Contributor

choldgraf commented May 31, 2018

woooo!!!! 🎉 🎉 🎉 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment