Speed up notebooks with Papermill to add them to CI (#820) #860

timpitman · 2020-02-13T01:00:04Z

Some notebooks were too slow for CI #820 - we can alter basic parameters like number of epochs, walk lengths etc to make them run faster and add to CI.

This PR adds the following notebooks to CI that were skipped for #820:

calibration-pubmed-node-classification.ipynb
ensemble-link-prediction-example.ipyn
stellargraph-node2vec-weighted-random-walks.ipynb
ensemble-node-classification-example.ipynb

Some notebooks are still too slow and will be addressed in future:

calibration-pubmed-link-prediction.ipynb
stellargraph-metapath2vec.ipynb
movielens-recommender.ipynb

Papermill is used to substitute new (smaller) values for:

epochs
walk_length
batch_size
n_estimators
n_predictions

This is done by adding a special tag to a single cell on each of the notebooks - Papermill inserts a new cell immediately after which replaces the values - this can be seen by viewing the Buildkite artefact generated when running each notebook, which will clearly show the inserted cell.

review-notebook-app · 2020-02-13T01:00:11Z

Check out this pull request on

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

codeclimate · 2020-02-13T01:00:38Z

Code Climate has analyzed commit e58712a and detected 0 issues on this pull request.

View more on Code Climate.

codecov-io · 2020-02-13T03:08:49Z

Codecov Report

Merging #860 into develop will decrease coverage by 0.8%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           develop    #860     +/-   ##
=========================================
- Coverage     83.1%   82.3%   -0.8%     
=========================================
  Files           47      46      -1     
  Lines         5490    4968    -522     
=========================================
- Hits          4562    4088    -474     
+ Misses         928     880     -48

Impacted Files	Coverage Δ
stellargraph/core/graph_networkx.py
stellargraph/core/graph.py	`98.9% <0%> (+0.3%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5c611a5...05c3a50. Read the comment docs.

huonw

Looks good, just question and a minor tweak.

huonw · 2020-02-13T03:07:23Z

demos/node-classification/node2vec/stellargraph-node2vec-weighted-random-walks.ipynb

   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
-       "0.792604501607717"
+       "0.4340836012861736"


The accuracy here and on lines 805 and 1122 has dropped massively? Has something gone wrong here, or is it just very variable?

well spotted! I'm not sure what happened here, however I re-ran and got much better results (now pushed); it's possible that I accidentally committed results from a run with altered parameters.

Ah, right, I guess walk_length = 5 would reduce performance...

huonw · 2020-02-13T03:10:46Z

.buildkite/steps/test-demo-notebooks.sh

@@ -51,7 +51,8 @@ echo "+++ :python: running $f"
 cd "$(dirname "$f")"
 # run the notebook, saving it back to where it was, printing everything
 exitCode=0
-papermill --execution-timeout=600 --log-output "$f" "$f" || exitCode=$?
+# papermill will replace parameters on some notebooks to make them run faster in CI
+papermill --execution-timeout=600 -p epochs 2 -p walk_length 5 -p batch_size 5 -p n_estimators 2 -p n_predictions 2 --log-output "$f" "$f" || exitCode=$?


This might be clearer as the YAML format, e.g.:

parameters=' --- epochs: 2 walk_length: 5 batch_size: 5 n_estimators: 2 n_predictors: 2 ' papermill --execution-timeout=600 --parameters_yaml "$parameters" ...

Or, even have it as a separate file like .buildkite/notebook-parameters.yml and write:

papermill --execution-timeout=600 --parameters_file ".buildkite/notebook-parameters.yml" ...

Using a separate YAML file keeps it clearest, I think 👍

huonw · 2020-02-13T03:55:11Z

.buildkite/notebook-parameters.yml

@@ -4,4 +4,4 @@ epochs: 2
 walk_length: 5
 batch_size: 5
 n_estimators: 2
-n_predictors: 2
+n_predictors: 1


I don't think this is the best unless it's a huge speed-up. My thinking is that there's more likely to be a difference between 1 and 2, than between 2 and 3, or 2 and 4. E.g. axis reductions might not happen correctly, or there might be special cases for a single predictor.

That is,

Suggested change

n_predictors: 1

n_predictors: 2

What do you think?

timpitman added 10 commits February 12, 2020 12:26

test param override

9fffedf

fix typo

1322da4

speed up notebooks for ci

14cf5d9

speed up ensemble notebooks for ci

66ebc3a

speed up notebook for ci

4156366

reduce batch size for ensembles

5e83d55

reduce batch size for ensembles

8fa9ad2

reduce estimators for ensembles

b86dcec

reduce predictors for ensembles

87fba3d

more papermill tweaks

2e9568b

timpitman added 6 commits February 13, 2020 12:05

reduce walk length

5678dbb

comment

1277e7a

decrease walk length to speed up metapath2vec

e0d5d76

revert calibration notebook change

2bd90a1

further reduction in params

15bd5a7

skip testing notebooks that are still slow

0440d93

timpitman changed the title ~~Speed up notebooks with Papermill to add them to CI~~ Speed up notebooks with Papermill to add them to CI (#820) Feb 13, 2020

timpitman mentioned this pull request Feb 13, 2020

Some notebooks take a long time to run on CI and therefore aren't tested #820

Closed

timpitman requested a review from huonw February 13, 2020 03:01

timpitman marked this pull request as ready for review February 13, 2020 03:05

huonw reviewed Feb 13, 2020

View reviewed changes

move papermill params to a YAML file

0dbf46a

huonw mentioned this pull request Feb 13, 2020

Run memory-heavy notebooks on CI, post-networkx removal #863

Closed

re-ran notebook to get correct output

05c3a50

timpitman requested a review from huonw February 13, 2020 03:34

huonw approved these changes Feb 13, 2020

View reviewed changes

slight speed up for ensemble notebooks

45641ba

huonw reviewed Feb 13, 2020

View reviewed changes

timpitman added 2 commits February 13, 2020 14:59

speed up emsemble notebooks

40343ad

fix wrong param name to speed up emsembles

e58712a

timpitman merged commit a120fa0 into develop Feb 13, 2020

timpitman deleted the feature/papermill-speedup branch February 13, 2020 04:18

timpitman mentioned this pull request Feb 16, 2020

Add remaining notebooks to CI that were too slow to test (#820) #874

Merged

timpitman mentioned this pull request Mar 9, 2020

Feature/node2vec for issue Word2Vec in StellarGraph #255 #536

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up notebooks with Papermill to add them to CI (#820) #860

Speed up notebooks with Papermill to add them to CI (#820) #860

timpitman commented Feb 13, 2020 •

edited

review-notebook-app bot commented Feb 13, 2020

codeclimate bot commented Feb 13, 2020 •

edited

codecov-io commented Feb 13, 2020 •

edited

huonw left a comment

huonw Feb 13, 2020

timpitman Feb 13, 2020

huonw Feb 13, 2020

huonw Feb 13, 2020

timpitman Feb 13, 2020 •

edited

huonw Feb 13, 2020

Speed up notebooks with Papermill to add them to CI (#820) #860

Speed up notebooks with Papermill to add them to CI (#820) #860

Conversation

timpitman commented Feb 13, 2020 • edited

review-notebook-app bot commented Feb 13, 2020

codeclimate bot commented Feb 13, 2020 • edited

codecov-io commented Feb 13, 2020 • edited

Codecov Report

huonw left a comment

Choose a reason for hiding this comment

huonw Feb 13, 2020

Choose a reason for hiding this comment

timpitman Feb 13, 2020

Choose a reason for hiding this comment

huonw Feb 13, 2020

Choose a reason for hiding this comment

huonw Feb 13, 2020

Choose a reason for hiding this comment

timpitman Feb 13, 2020 • edited

Choose a reason for hiding this comment

huonw Feb 13, 2020

Choose a reason for hiding this comment

timpitman commented Feb 13, 2020 •

edited

codeclimate bot commented Feb 13, 2020 •

edited

codecov-io commented Feb 13, 2020 •

edited

timpitman Feb 13, 2020 •

edited