[ENH] Add parallel `fit` and `predict_residuals` for calculation of `residuals_matrix` in `ConformalIntervals` #3414

bethrice44 · 2022-09-12T13:35:59Z

What does this implement/fix? Explain your changes.

The ConformalIntervals wrapper can be slow due to the large number of fit and predict_residuals routines required to create the residuals_matrix.

This PR adds Parallel to the routine as the fit and predict_residuals routines are independent. It follows the pattern used in forecasting.base._meta.py and forecasting.model_selection._tune.py which adds a nested function for the parallelised routine.

What should a reviewer concentrate their feedback on?

Is this the right solution for this issue, or are there other better ways to speed this up. It is particularly an issue for larger training sets, even when sample_frac is set to something small (there is a limit on how small this can be without losing information).

PR checklist

For all contributions

I've added myself to the list of contributors.
Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
I've added unit tests and made sure they pass locally.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.

For new estimators

I've added the estimator to the online documentation.
I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

Update fork with main

fkiraly

Looks good!

Change requests:

please add the new n_jobs parameter to the docstring. Simple copy-paste job from the other estimators with that parameter.
should the default be 1?

bethrice44 · 2022-09-13T19:49:44Z

Looks good!

Change requests:

please add the new n_jobs parameter to the docstring. Simple copy-paste job from the other estimators with that parameter.

should the default be 1?

Will do 👍

None = 1 in this case (and most cases for Parallel)

fkiraly · 2022-09-13T22:17:55Z

None = 1 in this case (and most cases for Parallel)

Then it's probably fine. I was just thinking from consistency, since elsewhere in sktime it seems to be 1?

fkiraly

Looks good.

Comments:

are we sure None always means 1? Your docstring says "unless in a ... context", which is not clear (what is it then?), and may mean it's not?

bethrice44 · 2022-09-14T08:37:25Z

Looks good.

Comments:

are we sure None always means 1? Your docstring says "unless in a ... context", which is not clear (what is it then?), and may mean it's not?

https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html <- Only not 1 if you overwrite with the parallel_backend() context manager

fkiraly · 2022-09-14T16:44:51Z

elsewhere it's 1, do you think we can leave it as is, or should we make it consistent?
Maybe None is the better choice and we should change it everywhere else?

bethrice44 · 2022-09-15T08:15:37Z

elsewhere it's 1, do you think we can leave it as is, or should we make it consistent? Maybe None is the better choice and we should change it everywhere else?

Happy to change to 1 as that seems more common default, but _tune has the default as None (where I copied from)

fkiraly · 2022-09-18T20:05:24Z

argh, we have an inconsistent default then.
Would appreciate an issue opened, and would merge either way then, if it's already inconsistent.

Let me know which way to prefer, and which of us you prefer to open the issue.

If I don't hear from you, I'll merge this whichever way it is end of Monday (tomorrow)

bethrice44 · 2022-09-20T08:25:31Z

argh, we have an inconsistent default then. Would appreciate an issue opened, and would merge either way then, if it's already inconsistent.

Let me know which way to prefer, and which of us you prefer to open the issue.

If I don't hear from you, I'll merge this whichever way it is end of Monday (tomorrow)

More files use n_jobs=1 so lets stick with that and I'll raise an issue.

fkiraly

Great, thanks!

bethrice44 and others added 9 commits May 27, 2022 14:44

Align with NoaiveVariance and add sample_frac option

81dc2b1

Fix conflict

d035fb4

Merge branch 'alan-turing-institute-main'

d519cec

Merge pull request #2 from alan-turing-institute/main

1194f5b

Update fork with main

Merge remote-tracking branch 'upstream/main'

1e2b9c5

Merge remote-tracking branch 'upstream/main'

275c2c4

Merge remote-tracking branch 'upstream/main'

004ce36

Add parallel fitting and predicting residuals for matrix

27ca19d

Merge remote-tracking branch 'upstream/main' into conformal-parallel

d71fa49

bethrice44 requested review from fkiraly and aiwalter as code owners September 12, 2022 13:35

Add n_jobs attribute

016afc3

fkiraly requested changes Sep 13, 2022

View reviewed changes

Add n_jobs docstring

c68b1a3

bethrice44 requested review from fkiraly and removed request for aiwalter September 13, 2022 19:50

fkiraly previously approved these changes Sep 13, 2022

View reviewed changes

bethrice44 mentioned this pull request Sep 20, 2022

[MNT] Inconsistent default for the n_jobs argument #3448

Open

Change default to 1

e0d46f9

bethrice44 dismissed fkiraly’s stale review via e0d46f9 September 20, 2022 08:27

bethrice44 requested a review from fkiraly September 20, 2022 09:09

fkiraly approved these changes Sep 21, 2022

View reviewed changes

fkiraly merged commit f76b706 into sktime:main Sep 21, 2022

bethrice44 deleted the conformal-parallel branch September 22, 2022 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add parallel `fit` and `predict_residuals` for calculation of `residuals_matrix` in `ConformalIntervals` #3414

[ENH] Add parallel `fit` and `predict_residuals` for calculation of `residuals_matrix` in `ConformalIntervals` #3414

bethrice44 commented Sep 12, 2022 •

edited

fkiraly left a comment

bethrice44 commented Sep 13, 2022

fkiraly commented Sep 13, 2022

fkiraly left a comment

bethrice44 commented Sep 14, 2022 •

edited

fkiraly commented Sep 14, 2022

bethrice44 commented Sep 15, 2022

fkiraly commented Sep 18, 2022

bethrice44 commented Sep 20, 2022

fkiraly left a comment

[ENH] Add parallel fit and predict_residuals for calculation of residuals_matrix in ConformalIntervals #3414

[ENH] Add parallel fit and predict_residuals for calculation of residuals_matrix in ConformalIntervals #3414

Conversation

bethrice44 commented Sep 12, 2022 • edited

What does this implement/fix? Explain your changes.

What should a reviewer concentrate their feedback on?

PR checklist

For all contributions

For new estimators

fkiraly left a comment

Choose a reason for hiding this comment

bethrice44 commented Sep 13, 2022

fkiraly commented Sep 13, 2022

fkiraly left a comment

Choose a reason for hiding this comment

bethrice44 commented Sep 14, 2022 • edited

fkiraly commented Sep 14, 2022

bethrice44 commented Sep 15, 2022

fkiraly commented Sep 18, 2022

bethrice44 commented Sep 20, 2022

fkiraly left a comment

Choose a reason for hiding this comment

[ENH] Add parallel `fit` and `predict_residuals` for calculation of `residuals_matrix` in `ConformalIntervals` #3414

[ENH] Add parallel `fit` and `predict_residuals` for calculation of `residuals_matrix` in `ConformalIntervals` #3414

bethrice44 commented Sep 12, 2022 •

edited

bethrice44 commented Sep 14, 2022 •

edited