Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dask Gateway example #2

Merged
merged 4 commits into from Mar 30, 2020

Conversation

TomAugspurger
Copy link
Contributor

@rabernat a test / example using dask. Some questions

  1. I see the other notebooks have their output included. Is that necessary? Creating the Gateway won't generally work outside of a machine that's been configured (like hub.pangeo.io)
  2. (where) are these run as part of CI / CD? Wherever that is will need to have dask-gateway installed & configured.

If this is likely to cause issues we might want to hold off on merging.

@TomAugspurger
Copy link
Contributor Author

Ah cool CI is set up here, and it did fail with ModuleNotFoundError: No module named 'dask_gateway'\n".

I'm happy to help out here, but I'll wait to hear from you. I suspect you've already given some thought to running real examples that access cluster resources.

@rabernat
Copy link
Contributor

All this stuff is configured here:

name: Example Gallery
description: >-
An example gallery of notebooks used for debugging the Pangeo Gallery
infrastructure.
gallery_repository: pangeo-gallery/pangeo-gallery
binder_url: "http://mybinder.org"
binder_repo: choldgraf/binder-sandbox
binder_ref: master
binderbot_target_branch: binderbot-built

The error is work because it's not pointing at the right binder. I would love if you would update your PR to point at the correct binder repo (pangeo-gallery/default-binder). Let's see if we can turn it green.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Mar 30, 2020

Perfect. Let's see if fd7a0a2 got it.

(fixed the typo in fd7a0a2)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Mar 30, 2020

Great, binderbot finished. Last commit should get this passing (perhaps a recent update to yamllint requires that files start with ---)

##[error]missing document start "---" (document-start)

@TomAugspurger
Copy link
Contributor Author

All green! (and the binderbot stage finished in <1 minute!)

@rabernat rabernat merged commit fc93bd2 into pangeo-gallery:master Mar 30, 2020
@rabernat
Copy link
Contributor

I'd love some help brainstorming how to make this workflow better.

What just happened is that binderbot built all the notebooks. Now that I merged, it will build them all again on the master branch.

In the future, these notebooks might be very expensive, requiring lots of workers and a long run time. I would like to avoid running them more than is necessary. It seems like we should be able to generate an artifact for each notebook tied to a commit hash. If the notebook has not been updated in the current commit, then we can skip building it and instead download it from the artifact.

Does anyone have any idea how to make this work?

@rabernat
Copy link
Contributor

Also, it looks like sphinx did not like this notebook because it didn't have a title:

https://github.com/pangeo-gallery/pangeo-gallery/runs/546585339?check_suite_focus=true#step:7:31

@TomAugspurger
Copy link
Contributor Author

Just to confirm, the second build is done by binderbot and pushed to binderbot-built?

The results are sitting somewhere in the user directory of the "pull request" binder user. However, that's going to go away somewhat soon after the pull request completes.

But... what if binderbot has a admin token for the jupyterhub? Then if we know the name of the binder user (how would we now this?) then we could find the built notebooks and download them. Lots of holes in this idea, but it might be something.

I'll fix the example, and try to update the linter to fail CI when there's no title.

@TomAugspurger TomAugspurger deleted the dask-example branch March 30, 2020 19:59
@rabernat
Copy link
Contributor

the second build is done by binderbot and pushed to binderbot-built?

the results are sitting somewhere in the user directory of the "pull request" binder user.

They are actually downloaded to the local run directory for the github worklfow, overwriting the original notebooks. (This is part of binderbot.) I believe this simplifies things considerably

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants