Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement betaincinv and gammainc[c]inv functions #502

Merged
merged 18 commits into from
Jan 2, 2024

Conversation

amyoshino
Copy link
Member

Motivation for these changes

Closes #413

Implement betaincinv and gammaincinv functions.
The implementation of these functions will allow us to implement quantile/ICDF functions for distributions like Beta, Gamma, ChiSquared and StudentT distributions, being fundamental to the issue: pymc-devs/pymc#6845

Implementation details

Following other scalar.math functions implementations, the code is extended to call the functions from scipy.special.
Tests were added to check if it matches scipy implementation.

Checklist

Major / Breaking Changes

  • ...

New features

  • ...

Bugfixes

  • ...

Documentation

  • ...

Maintenance

  • ...

@amyoshino
Copy link
Member Author

amyoshino commented Nov 16, 2023

I finally was able to get my hands back to it, apologies for resuming it very late.

I have only implemented the function "impl" so far, gradients were not implemented and not sure if it is needed in the case of those inverse regularized functions.

Let me know what can be improved here.

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2023

Codecov Report

Merging #502 (753d150) into main (230a808) will increase coverage by 0.00%.
The diff coverage is 88.23%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #502   +/-   ##
=======================================
  Coverage   80.77%   80.77%           
=======================================
  Files         162      162           
  Lines       46105    46122   +17     
  Branches    11266    11267    +1     
=======================================
+ Hits        37240    37255   +15     
- Misses       6638     6640    +2     
  Partials     2227     2227           
Files Coverage Δ
pytensor/scalar/math.py 89.22% <88.23%> (-0.02%) ⬇️

@amyoshino amyoshino changed the title Beta gamma incinv Implement betaincinv and gammaincinv functions Nov 16, 2023
Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking it up!

tests/scalar/test_math.py Outdated Show resolved Hide resolved
pytensor/scalar/math.py Outdated Show resolved Hide resolved
pytensor/scalar/math.py Show resolved Hide resolved
@ricardoV94
Copy link
Member

@amyoshino here is a more typical PR to add new scalar/elemwise Ops that you can use as a template: 2dc7591

@ricardoV94
Copy link
Member

For JAX dispatching, it seems like these exist in tensorflow-probability:

https://www.tensorflow.org/probability/api_docs/python/tfp/substrates/jax/math/betaincinv
https://www.tensorflow.org/probability/api_docs/python/tfp/substrates/jax/math/igammainv

We can do something like https://github.com/pymc-devs/pytensor/pull/403/files to provide them in the JAX backend

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>
@amyoshino
Copy link
Member Author

amyoshino commented Nov 16, 2023

@amyoshino here is a more typical PR to add new scalar/elemwise Ops that you can use as a template: 2dc7591

I see I am missing quite a few things, thanks for the guidance!

@amyoshino
Copy link
Member Author

amyoshino commented Nov 26, 2023

I have added the first derivatives for betaincinv, gammaincinv (lower) and gammainccinv (upper) and tried to set up some Elemwise tests.

I have not set up the JAX dispatching yet, but plan to do so in the next push.

Thanks for all your guidance, and I am hoping to improve it soon!

@amyoshino
Copy link
Member Author

amyoshino commented Dec 19, 2023

@ricardoV94 Phew, what a journey to get all tests good for this PR. I believe now it is all good and it is ready for review.
Again, sorry for the slow pace in this issue.

Let me take this comment to summarize all the steps and assumptions used in this PR, I hope it helps the reviewing process.

Overall I implemented the wrapper of scipy.special.gammaincinv, scipy.special.gammainccinv and scipy.special.betaincinv with:

  • the first derivatives implementation under the grad function, with respect to arguments that I was able to find closed form solutions. In the ones I wasn't able I added grad_not_implemented
  • Include JAX dispatching when available
  • Write tests to cover outputs and broadcasting

In order to write the derivative for scipy.special.betaincinv w.r.t. x, I needed to add scipy.special.beta (beta function), and with it, add its derivatives wrt a and b and required tests for it. I wasn't able to find the JAX version for this function, so I haven't added it as well.

References

Derivative of:

I wasn't able to implement the derivatives of inverse gamma function with respect to the first argument k, and the inverse beta function with respect to the first and second arguments a, b because the closed forms require the generalized hypergeometric function with arguments not implemented even in scipy, for example https://www.wolframalpha.com/input?i=D%5BInverseGammaRegularized%5Ba%2C+s%5D%2C+a%5D

This PR will help the issue pymc-devs/pymc#6845, which will benefit from the implemented functions for the icdf formulas.

@ricardoV94
Copy link
Member

ricardoV94 commented Dec 22, 2023

@amyoshino sorry I still didn't have time to review, just this comment here caught my attention:

I needed to add scipy.special.beta (beta function), and with it, add its derivatives wrt a and b and required tests for it. I wasn't able to find the JAX version for this function, so I haven't added it as well.

Instead of implementing a new Op, let's try to use this form instead?

def beta(a, b):
    return (gamma(a) * gamma(b)) / gamma(a + b)

If it proves too unstable, we can try to use the log-form with gammaln and exponentiate the output.
If that also proves too unstable we can reassess :)

This saves us from having to maintain another Op and its grads, plus missing C/JAX implementations.

We can also add a user-facing helper in tensor.math (the scalar version used in the grads wouldn't be user facing) . Here is an example where we do something like that:

def poch(z, m):
"""
Pochhammer symbol (rising factorial) function.
"""
return gamma(z + m) / gamma(z)

@amyoshino
Copy link
Member Author

@ricardoV94 enjoy your holiday time, no worries about it. I delayed the progress of this PR for so long, it would be really nonsense on my end to expect a quick response in this 🥲

Nice suggestions! I will make the changes in the code shortly, and for safety, I'll try to find out if the proposed solution is not unstable in any ways. I will keep you posted on the progress.

@amyoshino
Copy link
Member Author

amyoshino commented Dec 23, 2023

@ricardoV94, I investigated the stability of using the proposed form of the beta function, and here are my findings so far.

Since the connection between beta and gamma function is valid only when x > 0 and y > 0 (reference), as expected, the approximation works well under this condition, and implementing it with gammln is preferred over gamma because of numerical instability on large values:

Unstable when arguments have "high" values:

image

Approximation is great (only one tiny difference found in a relatively large range of arguments)

image

When x < 0. and/or y < 0

The thing starts to go a bit wrong if we want to keep consistency in the results between this implementation and the SciPy's Beta function when x < 0 and/or y < 0.
When checking for these cases, then we find some inconsistencies. Just to illustrate:

Results can change sign depending on the argument value

image

And also can have a large difference when values are high (only when one or more arguments are < 0):

image

Given this information, should we go ahead and use your suggestion? I have already worked on the changes, whatever path you want to take is good for me. Let me know and can push it at anytime. 😄

@ricardoV94
Copy link
Member

ricardoV94 commented Dec 24, 2023

That first sign flip doesn't seem too bad, the answer is pretty much 0. I didn't look carefully and the examples that followed.

We should think about the stability and range in the context that first motivated these Ops: use in the gradients of this PR.

Do the gradients remain stable in a reasonable parameter space of the original functions?

Also the form of beta we use for the gradients need not have anything to do with the one we offer users in the tensor module. For the user facing one I would probably offer the naive way with gamma and also offer the betaln with gammaln.

@ricardoV94
Copy link
Member

ricardoV94 commented Dec 24, 2023

Aren't negative values some special analytic extension? Or is it standard to support it?

Also for assessing closeness it's common to look not only at absolute error but also relative error. Those 1e+16 may not be so unreasonable if they correspond to a small relative error

@ricardoV94
Copy link
Member

Given this information, should we go ahead and use your suggestion?

My gut feeling is we should use exp(beta_ln) in the scalar grads and offer beta and betaln to user in tensor.math.

Let's see if the gradients don't blowup for some reasonable values of the original functions. you can test locally with verify_grad

@amyoshino
Copy link
Member Author

Thanks for the suggestions and pointers to where to look at! I will evaluate all the mentioned points.

We should think about the stability and range in the context that first motivated these Ops: use in the gradients of this PR.

Do the gradients remain stable in a reasonable parameter space of the original functions?

Good call, I will check it and be mindful about the objective of the PR.

Aren't negative values some special analytic extension? Or is it standard to support it?

I will research more about it. Since the function is well defined only for positive values, I imagine it is kind of a special use, but let me find out in which circumstances negative values are used before stating anything.

Also for assessing closeness it's common to look not only at absolute error but also relative error. Those 1e+16 may not be so unreasonable if they correspond to a small relative error

Most of errors are change in sigs, so error is close to zero. But in few cases errors are large, but only happens when values are high as well, so I will evaluate if this is really a concern or if large errors happen only when the functions are itself unusable.

Let's see if the gradients don't blowup for some reasonable values of the original functions. you can test locally with verify_grad

Thank you so much for the hint! I will need some time to check the gradients and be more familiarized with the trace to the gradient computation, having a tip on where to start is very valuable 😄

@amyoshino
Copy link
Member Author

@ricardoV94, after checking the review comments, here is my suggestion:

Do the gradients remain stable in a reasonable parameter space of the original functions?

Given the context of the PR, the use of beta function for the inverse of beta distribution gradient computation will not present us negative values as arguments (as parameters of the beta distribution must be Positive and real-valued). That said, I believe we should go with the approximation you suggested.

Moreover, I could not find use cases for negative-valued arguments for the beta function. So, if any use case comes up in the future, we can revise the user-facing helper in tensor.special and implement the Op for the beta function.

Let's see if the gradients don't blowup for some reasonable values of the original functions. you can test locally with verify_grad

For safety and consistency with previously implemented betainc_grad() function, I used the betaln approach. This is because having alpha or beta with a value of 180 (taking the blown-up example) is not unlikely.

We can also add a user-facing helper in tensor.math (the scalar version used in the grads wouldn't be user facing)

I have done that as well! Thanks for suggesting it.

I just pushed the latest code, I hope everything is good, but if not, do not hesitate in pointing out improvements, I am never tired of improving things and keeping the codebase as neat as possible in a single pass. 😄

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 2, 2024

@amyoshino this looks great, I left a tiny tiny comment above. Otherwise it seems good to merge!

amyoshino and others added 2 commits January 2, 2024 08:57
Default pre-commit to multi-line

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>
@amyoshino
Copy link
Member Author

@ricardoV94 As always, thanks for guididing me through this PR. Learned a lot about the codebase and tests with this one.

@ricardoV94
Copy link
Member

My pleasure, and thanks for taking this one on. It will be really useful for the icdf methods!

@ricardoV94 ricardoV94 changed the title Implement betaincinv and gammaincinv functions Implement betaincinv and gammainc[c]inv functions Jan 2, 2024
@ricardoV94 ricardoV94 added the enhancement New feature or request label Jan 2, 2024
@ricardoV94 ricardoV94 merged commit f951743 into pymc-devs:main Jan 2, 2024
51 of 53 checks passed
@amyoshino amyoshino deleted the beta_gamma_incinv branch January 2, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Add betaincinv and gammaincinv
3 participants