Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add parallel fit and predict_residuals for calculation of residuals_matrix in ConformalIntervals #3414

Merged
merged 12 commits into from Sep 21, 2022

Conversation

bethrice44
Copy link
Contributor

@bethrice44 bethrice44 commented Sep 12, 2022

What does this implement/fix? Explain your changes.

The ConformalIntervals wrapper can be slow due to the large number of fit and predict_residuals routines required to create the residuals_matrix.

This PR adds Parallel to the routine as the fit and predict_residuals routines are independent. It follows the pattern used in forecasting.base._meta.py and forecasting.model_selection._tune.py which adds a nested function for the parallelised routine.

What should a reviewer concentrate their feedback on?

Is this the right solution for this issue, or are there other better ways to speed this up. It is particularly an issue for larger training sets, even when sample_frac is set to something small (there is a limit on how small this can be without losing information).

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • I've added unit tests and made sure they pass locally.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Change requests:

  • please add the new n_jobs parameter to the docstring. Simple copy-paste job from the other estimators with that parameter.
  • should the default be 1?

@bethrice44
Copy link
Contributor Author

Looks good!

Change requests:

  • please add the new n_jobs parameter to the docstring. Simple copy-paste job from the other estimators with that parameter.
  • should the default be 1?

Will do 👍

None = 1 in this case (and most cases for Parallel)

@bethrice44 bethrice44 requested review from fkiraly and removed request for aiwalter September 13, 2022 19:50
@fkiraly
Copy link
Collaborator

fkiraly commented Sep 13, 2022

None = 1 in this case (and most cases for Parallel)

Then it's probably fine. I was just thinking from consistency, since elsewhere in sktime it seems to be 1?

fkiraly
fkiraly previously approved these changes Sep 13, 2022
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Comments:

  • are we sure None always means 1? Your docstring says "unless in a ... context", which is not clear (what is it then?), and may mean it's not?

@bethrice44
Copy link
Contributor Author

bethrice44 commented Sep 14, 2022

Looks good.

Comments:

  • are we sure None always means 1? Your docstring says "unless in a ... context", which is not clear (what is it then?), and may mean it's not?

https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html <- Only not 1 if you overwrite with the parallel_backend() context manager

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 14, 2022

elsewhere it's 1, do you think we can leave it as is, or should we make it consistent?
Maybe None is the better choice and we should change it everywhere else?

@bethrice44
Copy link
Contributor Author

elsewhere it's 1, do you think we can leave it as is, or should we make it consistent? Maybe None is the better choice and we should change it everywhere else?

Happy to change to 1 as that seems more common default, but _tune has the default as None (where I copied from)

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 18, 2022

argh, we have an inconsistent default then.
Would appreciate an issue opened, and would merge either way then, if it's already inconsistent.

Let me know which way to prefer, and which of us you prefer to open the issue.

If I don't hear from you, I'll merge this whichever way it is end of Monday (tomorrow)

@bethrice44
Copy link
Contributor Author

argh, we have an inconsistent default then. Would appreciate an issue opened, and would merge either way then, if it's already inconsistent.

Let me know which way to prefer, and which of us you prefer to open the issue.

If I don't hear from you, I'll merge this whichever way it is end of Monday (tomorrow)

More files use n_jobs=1 so lets stick with that and I'll raise an issue.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@fkiraly fkiraly merged commit f76b706 into sktime:main Sep 21, 2022
@bethrice44 bethrice44 deleted the conformal-parallel branch September 22, 2022 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants