Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Student's t-distribution #5050

Merged
merged 19 commits into from Aug 24, 2023
Merged

Conversation

Alex-JG3
Copy link
Contributor

@Alex-JG3 Alex-JG3 commented Aug 7, 2023

Reference Issues/PRs

Related to #4518

What does this implement/fix? Explain your changes.

Add t-distribution to the proba subpackage.

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Not yet.

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • Optionally, I've added myself and possibly others to the CODEOWNERS file - do this if you want to become the owner or maintainer of an estimator you added.
    See here for further details on the algorithm maintainer role.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

sktime/proba/t.py Outdated Show resolved Hide resolved
sktime/proba/t.py Outdated Show resolved Hide resolved
sktime/proba/t.py Outdated Show resolved Hide resolved
sktime/proba/t.py Outdated Show resolved Hide resolved
@Alex-JG3
Copy link
Contributor Author

Alex-JG3 commented Aug 7, 2023

@fkiraly, my apologies, this is basically a copy paste from the normal currently. I have only changed the pdf function at the moment. Still working through the rest. I should have made this clearer.

@fkiraly fkiraly added implementing algorithms Implementing algorithms, estimators, objects native to sktime enhancement Adding new functionality module:probability&simulation probability distributions and simulators labels Aug 8, 2023
@fkiraly
Copy link
Collaborator

fkiraly commented Aug 8, 2023

just posting this here in case you're not aware
https://xkcd.com/1347/

@fkiraly fkiraly changed the title Initialise student's t-distribution [ENH] Student's t-distribution Aug 9, 2023
@Alex-JG3 Alex-JG3 requested a review from fkiraly August 12, 2023 10:07
@Alex-JG3
Copy link
Contributor Author

@fkiraly, I couldn't find a formula for energy, if you have one I am happy to implement it. Also, from what I have read it sounds like there isn't an analytic solution for probability point function for the t-distribution. Are you aware of an analytic solution for the ppf?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 12, 2023

I couldn't find a formula for energy, if you have one I am happy to implement it.

Thanks! It's nasty though, so don't do it if it feels to unpleasant.

Appendix A.2 of "evaluating forecasts with scoringRules" has a few explicit formulae for the energy, including the t-distribution. The expression is hard to track in implementation, so I would advise comparing against the Monte-Carlo default if you implement it.

Are you aware of an analytic solution for the ppf?

The inverse t cdf tends to appear, here's what Wolfram Alpha says: https://www.wolframalpha.com/input/?i=inverse+cdf+student%27s+t+distribution

@Alex-JG3
Copy link
Contributor Author

here's what Wolfram Alpha says: https://www.wolframalpha.com/input/?i=inverse+cdf+student%27s+t+distribution

Just added the ppf using the Wolfram Alpha link. Thanks for sharing that link.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 20, 2023

Have you checked that it's indeed inverse to the cdf?

This should probably be part of the tests, too, note to self.

@Alex-JG3
Copy link
Contributor Author

Alex-JG3 commented Aug 20, 2023

Have you checked that it's indeed inverse to the cdf?

Here is an example showing the output of ppf is the inverse of cdf.

>>> import pandas as pd
>>> from sktime.proba.t import TDistribution
>>> n = TDistribution(mu=0, sigma=1, df=[[1, 1], [1, 1], [1, 1]])
>>> p = pd.DataFrame({"col1": [0.1, 0.3, 0.5], "col2": [0.7, 0.9, 1.0]})
>>> n.cdf(n.ppf(p))
   col1  col2
0   0.1   0.7
1   0.3   0.9
2   0.5   NaN

@Alex-JG3
Copy link
Contributor Author

@fkiraly, I am unsure where to start with the energy function. I see the section for student's t-distribution in A.2 of "evaluating forecasts with scoringRules" but, since there isn't a flashing neon sign saying "this is the energy function!" I am unsure of what needs to be implemented.

I assume the continuous ranked probability score (CRPS) is related to energy but it is not obvious to me how.

Would it be possible to get a review and perhaps merge in what is currently implemented and leave energy function for a different PR unless you want to take a look at energy on this PR?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 21, 2023

Would it be possible to get a review and perhaps merge in what is currently implemented and leave energy function for a different PR unless you want to take a look at energy on this PR?

Sure!

Have you checked why the tests are failing?
There's something going on with indexing. Happy to have a more detailed look if you're not sure what is wrong here.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 21, 2023

I assume the continuous ranked probability score (CRPS) is related to energy but it is not obvious to me how.

Yes, the general formula is CRPS(d.cdf, y) = d.energy(y) - 0.5 * d.energy(). Unfortunately the reference only gives CRPS, but the scaling behaviour of the terms (d.energy() does not depend on y and is proportional to d.std()) makes me think that the positive terms are d.energy(y), and then negative term is - 0.5 * d.energy(). If not, d.energy(y) should be the positive terms minus a constant multiple of the negative term (the coefficient should be "simple" and guessable by comparison with the Monte Carlo estimate).

As said, we can leave it for now - no worries. It's unpleasant as I alluded to above (and in addition, unfortunately, not 100% clear from the reference).

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 21, 2023

Btw @Alex-JG3, thanks a lot!
What would also be useful is feedback on the implementer experience - e.g., is there boilerplate that is unclear, or interface elements that you think would be better changed?

sktime/proba/t.py Outdated Show resolved Hide resolved
sktime/proba/t.py Outdated Show resolved Hide resolved
@Alex-JG3 Alex-JG3 marked this pull request as ready for review August 23, 2023 16:19
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, really nice!

I haven't checked all the formulae, but I did spot checks and it looked good.

Would you be so kind and also open a PR on skpro?
See sktime/skpro#22
Just copy-paste this in the appropriate folder and change the BaseDistribution import location, the test framework will automatically do the rest.

If you're interested, you could:

(happy to chat on discord for planning!)

@fkiraly fkiraly merged commit 1826772 into sktime:main Aug 24, 2023
24 checks passed
fkiraly pushed a commit to sktime/skpro that referenced this pull request Aug 27, 2023
<!--
Thanks for contributing a pull request! Please ensure you have taken a
look
at our contribution guide:
https://skbase.readthedocs.io/en/latest/contribute.html
-->

#### Reference Issues/PRs
<!--
Example: Fixes #1234. See also #3456.

Please use keywords (e.g., Fixes) to create link to the issues or pull
requests
you resolved, so that they will automatically be closed when your pull
request
is merged. See
https://github.com/blog/1506-closing-issues-via-pull-requests
-->

Mirror of `sktime` sktime/sktime#5050. Towards #22


#### What does this implement/fix? Explain your changes.
<!--
A clear and concise description of what you have implemented. Remember
to implement
unit tests and docstrings if your pull request commits code to the
repository.
-->

Add student's t-distribution.
yarnabrina added a commit to yarnabrina/sktime-fork that referenced this pull request Aug 27, 2023
…encies

* origin/main:
  [BUG] Fix tag to indicate support of exogenous features by `NaiveForecaster` (sktime#5162)
  [BUG] fix check causing exception in `ConformalIntervals` in `_predict` (sktime#5134)
  [ENH] empirical distribution (sktime#5094)
  [MNT] Update versions of pre commit hooks and fix `E721` issues pointed out by `flake8` (sktime#5163)
  [ENH] `tslearn` distances and kernels including adapter (sktime#5039)
  [ENH] Student's t-distribution (sktime#5050)
  [MNT] bound `statsforecast<1.6.0` due to recent failures (sktime#5149)
  [ENH] widen scope of conditional test execution (sktime#5135)
  [MNT] [Dependabot](deps-dev): Update sphinx-gallery requirement from <0.14.0 to <0.15.0 (sktime#5124)
  [MNT] upgrade CI runners to latest stable images (sktime#5031)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality implementing algorithms Implementing algorithms, estimators, objects native to sktime module:probability&simulation probability distributions and simulators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants