New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up notebooks with Papermill to add them to CI (#820) #860
Conversation
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
Code Climate has analyzed commit e58712a and detected 0 issues on this pull request. View more on Code Climate. |
Codecov Report
@@ Coverage Diff @@
## develop #860 +/- ##
=========================================
- Coverage 83.1% 82.3% -0.8%
=========================================
Files 47 46 -1
Lines 5490 4968 -522
=========================================
- Hits 4562 4088 -474
+ Misses 928 880 -48
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just question and a minor tweak.
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"0.792604501607717" | ||
"0.4340836012861736" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The accuracy here and on lines 805 and 1122 has dropped massively? Has something gone wrong here, or is it just very variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well spotted! I'm not sure what happened here, however I re-ran and got much better results (now pushed); it's possible that I accidentally committed results from a run with altered parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, I guess walk_length = 5
would reduce performance...
@@ -51,7 +51,8 @@ echo "+++ :python: running $f" | |||
cd "$(dirname "$f")" | |||
# run the notebook, saving it back to where it was, printing everything | |||
exitCode=0 | |||
papermill --execution-timeout=600 --log-output "$f" "$f" || exitCode=$? | |||
# papermill will replace parameters on some notebooks to make them run faster in CI | |||
papermill --execution-timeout=600 -p epochs 2 -p walk_length 5 -p batch_size 5 -p n_estimators 2 -p n_predictions 2 --log-output "$f" "$f" || exitCode=$? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be clearer as the YAML format, e.g.:
parameters='
---
epochs: 2
walk_length: 5
batch_size: 5
n_estimators: 2
n_predictors: 2
'
papermill --execution-timeout=600 --parameters_yaml "$parameters" ...
Or, even have it as a separate file like .buildkite/notebook-parameters.yml
and write:
papermill --execution-timeout=600 --parameters_file ".buildkite/notebook-parameters.yml" ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a separate YAML file keeps it clearest, I think 👍
.buildkite/notebook-parameters.yml
Outdated
@@ -4,4 +4,4 @@ epochs: 2 | |||
walk_length: 5 | |||
batch_size: 5 | |||
n_estimators: 2 | |||
n_predictors: 2 | |||
n_predictors: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the best unless it's a huge speed-up. My thinking is that there's more likely to be a difference between 1 and 2, than between 2 and 3, or 2 and 4. E.g. axis reductions might not happen correctly, or there might be special cases for a single predictor.
That is,
n_predictors: 1 | |
n_predictors: 2 |
What do you think?
Some notebooks were too slow for CI #820 - we can alter basic parameters like number of epochs, walk lengths etc to make them run faster and add to CI.
This PR adds the following notebooks to CI that were skipped for #820:
Some notebooks are still too slow and will be addressed in future:
Papermill is used to substitute new (smaller) values for:
This is done by adding a special tag to a single cell on each of the notebooks - Papermill inserts a new cell immediately after which replaces the values - this can be seen by viewing the Buildkite artefact generated when running each notebook, which will clearly show the inserted cell.