Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up notebooks with Papermill to add them to CI (#820) #860

Merged
merged 21 commits into from Feb 13, 2020

Conversation

timpitman
Copy link
Contributor

@timpitman timpitman commented Feb 13, 2020

Some notebooks were too slow for CI #820 - we can alter basic parameters like number of epochs, walk lengths etc to make them run faster and add to CI.

This PR adds the following notebooks to CI that were skipped for #820:

calibration-pubmed-node-classification.ipynb
ensemble-link-prediction-example.ipyn
stellargraph-node2vec-weighted-random-walks.ipynb
ensemble-node-classification-example.ipynb

Some notebooks are still too slow and will be addressed in future:

calibration-pubmed-link-prediction.ipynb
stellargraph-metapath2vec.ipynb
movielens-recommender.ipynb

Papermill is used to substitute new (smaller) values for:

epochs
walk_length
batch_size
n_estimators
n_predictions

This is done by adding a special tag to a single cell on each of the notebooks - Papermill inserts a new cell immediately after which replaces the values - this can be seen by viewing the Buildkite artefact generated when running each notebook, which will clearly show the inserted cell.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@codeclimate
Copy link

codeclimate bot commented Feb 13, 2020

Code Climate has analyzed commit e58712a and detected 0 issues on this pull request.

View more on Code Climate.

@timpitman timpitman changed the title Speed up notebooks with Papermill to add them to CI Speed up notebooks with Papermill to add them to CI (#820) Feb 13, 2020
@timpitman timpitman requested a review from huonw February 13, 2020 03:01
@timpitman timpitman marked this pull request as ready for review February 13, 2020 03:05
@codecov-io
Copy link

codecov-io commented Feb 13, 2020

Codecov Report

Merging #860 into develop will decrease coverage by 0.8%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #860     +/-   ##
=========================================
- Coverage     83.1%   82.3%   -0.8%     
=========================================
  Files           47      46      -1     
  Lines         5490    4968    -522     
=========================================
- Hits          4562    4088    -474     
+ Misses         928     880     -48
Impacted Files Coverage Δ
stellargraph/core/graph_networkx.py
stellargraph/core/graph.py 98.9% <0%> (+0.3%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5c611a5...05c3a50. Read the comment docs.

Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just question and a minor tweak.

"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.792604501607717"
"0.4340836012861736"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accuracy here and on lines 805 and 1122 has dropped massively? Has something gone wrong here, or is it just very variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well spotted! I'm not sure what happened here, however I re-ran and got much better results (now pushed); it's possible that I accidentally committed results from a run with altered parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I guess walk_length = 5 would reduce performance...

@@ -51,7 +51,8 @@ echo "+++ :python: running $f"
cd "$(dirname "$f")"
# run the notebook, saving it back to where it was, printing everything
exitCode=0
papermill --execution-timeout=600 --log-output "$f" "$f" || exitCode=$?
# papermill will replace parameters on some notebooks to make them run faster in CI
papermill --execution-timeout=600 -p epochs 2 -p walk_length 5 -p batch_size 5 -p n_estimators 2 -p n_predictions 2 --log-output "$f" "$f" || exitCode=$?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be clearer as the YAML format, e.g.:

parameters='
---
epochs: 2
walk_length: 5
batch_size: 5
n_estimators: 2
n_predictors: 2
'

papermill --execution-timeout=600 --parameters_yaml "$parameters" ...

Or, even have it as a separate file like .buildkite/notebook-parameters.yml and write:

papermill --execution-timeout=600 --parameters_file ".buildkite/notebook-parameters.yml" ...

Copy link
Contributor Author

@timpitman timpitman Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a separate YAML file keeps it clearest, I think 👍

@timpitman timpitman requested a review from huonw February 13, 2020 03:34
@@ -4,4 +4,4 @@ epochs: 2
walk_length: 5
batch_size: 5
n_estimators: 2
n_predictors: 2
n_predictors: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the best unless it's a huge speed-up. My thinking is that there's more likely to be a difference between 1 and 2, than between 2 and 3, or 2 and 4. E.g. axis reductions might not happen correctly, or there might be special cases for a single predictor.

That is,

Suggested change
n_predictors: 1
n_predictors: 2

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants