Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented log(BesselK) for fractional orders #1121

Conversation

@martinmodrak
Copy link
Contributor

commented Feb 18, 2019

Summary

Implemented logarithm of the modified Bessel function of the second kind (Bessel K) for fractional orders, supporting differentiation with respect to both variables.

Log(BesselK) is useful for computing the lpdf of some distributions, such as Generalized inverse Gaussian or SICHEL

The computation is based on https://github.com/stan-dev/stan/wiki/Stan-Development-Meeting-Agenda/0ca4e1be9f7fc800658bfbd97331e800a4f50011 (but modified to allow for stan::math::var arguments). Thanks to @bgoodri for providing it, I would not have been able to find this formula on my own.

The code snippet linked above is in turn based on Equation 26 of Rothwell: Computation of the
logarithm of Bessel functions of complex argument and fractional order
https://scholar.google.com/scholar?cluster=2908870453394922596&hl=en&as_sdt=5,33&sciodt=0,33

Both derivatives are computed by auto-diffing over the 1d integrator. It should be possible to get an explicit formula for the derivative with respect to v similarly to the way it is computed in modified_bessel_second_kind.

The name of the main function (log_modified_bessel_second_kind_frac) is provisional. The function is now in rev/arr/fun. This is IMHO weird, but I include rev/arr/functor/integrate_1d so the linter complained when the function was in rev/scal/

In addition, I tried to improve the error messages from integrate_1d and its documentation, as following the current documentation led me astray. This could be moved to a separate pull request if desired.

Tests

A grid of values of the function and its gradients was computed in Mathematica (code is part of the comments in the test file). The relative error between Mathematica and this implementation is <1e-7 for all values and gradients tested.

Side Effects

The error messages of integrate_1d have been modified to include the error threshold.

Checklist

  • Math issue #1112

  • Copyright holder: Institute of Microbiology of the Czech Academy of Sciences

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

CRAN won't allow those sprintf things

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

I think you should complex-step the derivative with respect to the order when it is a var but that is going to require the Boost 1.7.0 PR to get merged so that you can use complex integrands.

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

The derivative with respect to the primary argument has many formulas.

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

Also, these tests against Mathematica are not very strong in my opinion. It is much better to test with identities, such as at http://functions.wolfram.com/Bessel-TypeFunctions/BesselK/17/ , that hold for all input values. Usually you can just divide the left-hand side of an identity by the right-hand side and check that the ratio is near one and 1 + the derivative is near one.

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

Also, it may be worth thinking about whether we need a specialized integration routine for log_foo functions that utilizes log_sum_exp logic.

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

I think you can ultimately avoid using stan::math::integrate_1d and just use boost::math::quadrature::tanh_sinh, in which case you could move the functions back to the scal directories if they do not all get flattened first.

@martinmodrak

This comment has been minimized.

Copy link
Contributor Author

commented Feb 19, 2019

Thanks for the feedback - I assume that there is no major known reason to not include this function in the library and it is therefore worthwhile for me to spend time working on it. I have tried to further improve the code.

In particular, I have added analytical derivative wrt. z, but the derivative wrt. v is still from autodiff. There is a trick I use to combine those in the <var,var> case which I am not sure is great, but it works:

  if(is_var<T_v>::value) {
    operands.push_back(value);
    gradients.push_back(1);
  }
  operands.push_back(z);
  gradients.push_back(gradient_dz);

  return precomputed_gradients(value_of(value), operands, gradients);

Where value contains the result of autodiffed computation when v is var (and is double when v is double).

CRAN won't allow those sprintf things

I updated to snprintf - is that still an issue? Would you suggest using std::ostringstream instead? Or do you believe the improved messages are not worth the hassle?

I think you can ultimately avoid using stan::math::integrate_1d and just use boost::math::quadrature::tanh_sinh, in which case you could move the functions back to the scal directories if they do not all get flattened first.

That's what I have done. I have also tried getting analytical derivative of the inner integral wrt. v, to avoid using nested autodiff, but I could not get it to compute stably. I now have a custom nested autodiff code for this which is just simplified gradient_of_f.

However, I use std::vector to be able to use precomputed_gradients which means the linter complains that the function should not be in scal :-(

It is much better to test with identities

Currently looking into this

@bob-carpenter

This comment has been minimized.

Copy link
Contributor

commented Feb 20, 2019

@syclik

This comment has been minimized.

Copy link
Member

commented Feb 21, 2019

I have no idea about utility.

But all the analytic derivatives are great for anything we do implement.

I'll cosign. If you think it'll help you, then feel free to continue to implement. If it's written well, we can include it in the Math library for your use. ("well" is a very loosely defined term, but essentially, if there's very little maintenance burden for the future, then it's fine.)

There may be a bug in the linter. std::vector is OK to use in the scal directory inside functions. What we don't want is to have functions that apply to std vectors defined in scal. Those should go in arr.

@bob-carpenter, that's not correct. The difference in scal vs arr was that scal does not include any std::vector. In retrospect, that's probably not really a useful distinction and we're trying to flatten the structure, but for the time being, it would have to be in arr since it depends on std::vector.

@martinmodrak, will you put a comment here when it's ready for review?

@bgoodri, would you be able to review once it's ready?

@bob-carpenter

This comment has been minimized.

Copy link
Contributor

commented Feb 21, 2019

I recall the decision going the other way, but it's no big deal as long as @syclik can lay out an acceptable workaround until the scal / arr / mat distinction is collapsed.

@syclik: The question is where the definition of real foo(real) should go if there is astd::vector in the implementation? Is it OK to put it in arr?

@martinmodrak

This comment has been minimized.

Copy link
Contributor Author

commented Mar 29, 2019

Just an informative update if anyone's following:

Complex step
So it turns out that complex-stepping the integral has almost exactly the same error as the integral-of-the-nested-autodiff approach. This is weird - maybe the complex-step magic (which I totally don't understand) is actually somehow related to the way complex-step works (I can see how action on imaginary component might be similar to action on autodiff value)? As the complex-step has shorter code, I will keep it for now.

Asymptotics
The asymptotic formula for small arguments relative to order unfortunately does not work for very small arguments. (the log transform probably messes it up) It does however improve for a few cases where both Rothwell and other asymptotics failed so I've kept it, but I am on the search for a better one.

@bob-carpenter

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

I would expect both autodiff and complex step to be about the same in terms of accuracy but autodiff requires memory allocations. Also, you can probably move more stuff to prim now.

To get more accuracy, you probably have to use long double or complex<long double>. For relatively small orders, there might be some sort of approximation based on a pertubation of K_0(z). Also, in principle, it might be possible to use log_sum_exp logic with quadrature since we are taking the logarithm of a sum that approximates an integral of a positive function. But we would have to figure out where the integrand reaches is maximum.

template <class F>
// Complex step differentiation as described at:
// https://blogs.mathworks.com/cleve/2013/10/14/complex-step-differentiation/
double complex_step(F f, double x) {

This comment has been minimized.

Copy link
@bgoodri
const T_v &v, const T_z &z) {
typedef typename boost::math::tools::promote_args<T_v, T_z>::type T_Ret;

using boost::math::lgamma;

This comment has been minimized.

Copy link
@bgoodri

bgoodri Mar 29, 2019

Contributor

We are using std::lgamma these days

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

@bob-carpenter It is explained at https://blogs.mathworks.com/cleve/2013/10/14/complex-step-differentiation/ which is what @martinmodrak is using in the case the order is a var.

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

... which would seem to require derivatives wrt the order as it goes to zero. According to
http://functions.wolfram.com/Bessel-TypeFunctions/BesselK/20/01/01/0003/
the first derivative is zero for all values of the argument, but I'm not sure about the higher derivatives.

@martinmodrak

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2019

@bgoodri Thanks for the hints. My previous comment however contained an error as the problem is with small argument (z), not order (sorry, I edited the comment now, but it is my fault). The series at https://www.wolframalpha.com/input/?i=series+Log%5BBesselK%5Bv,+z%5D%5D+as+z+-%3E+0 is much less promising as it is a log of sum where some terms are negative.

I am also convinced an important part of the problem is the way I compute my checks. Testing against Mathematica for small z gives relative errors for gradient wrt. v of at most 1e-4 and all other errors <1e-8. Also the errors on the plots above start to show up for v <= 1 which is where the method of evaluation changes.

This change is necessary as for v in [0,1] the recursive formulas (http://functions.wolfram.com/Bessel-TypeFunctions/BesselK/17/01/01/) can no longer be evaluated with log_sum_exp, and I need log_diff_exp instead. I however cannot find formulas that would let me avoid log_diff_exp for evaluation.

@martinmodrak

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2019

Actually, I've just figured out a different formula for the checks and the error stops being inflated - thanks for rubber ducking!

Anyway, here is the current plot of errors - now the maximal relative error is 3.6e-02 for v = 50 and z = 11 and the most problematic region is for v and z comparable but not small.

image

@bgoodri

This comment has been minimized.

Copy link
Contributor

commented Apr 3, 2019

I think you gotta try long double and complex<long double> internally.

@syclik syclik requested a review from SteveBronder Jun 20, 2019

@SteveBronder
Copy link
Contributor

left a comment

Sorry this took so long to review and thanks for the contribution!

I only had time to look over the code tonight. Docs would make this a bit easier to review.

return abs(c) > 0;
}

inline bool non_zero(const double &c) { return c > 0; }

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

I think you can use is_positive here from /prim/scale/err/is_positive.hpp

}

template <typename _Tp>
inline bool non_zero(const std::complex<_Tp> &c) {

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

I think we can just add a complex type version of is_positive to /prim/scale/err/

namespace math {

template <typename _Tp>
inline int is_inf(const std::complex<_Tp> &c) {

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

Can you add a method for complex types to is_scal_finite in /prim/scal/err/is_scal_finite instead of defining one here?


inline bool non_zero(const double &c) { return c > 0; }

namespace besselk_internal {

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

This can just be internal

// https://github.com/stan-dev/stan/wiki/Stan-Development-Meeting-Agenda/0ca4e1be9f7fc800658bfbd97331e800a4f50011
// Which is in turn based on Equation 26 of Rothwell: Computation of the
// logarithm of Bessel functions of complex argument and fractional order
// https://scholar.google.com/scholar?cluster=2908870453394922596&hl=en&as_sdt=5,33&sciodt=0,33

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

Can this all be a multi-line comment?

complex_func, stan::math::value_of(v));

return var(new precomp_v_vari(value, v.vi_, d_dv));
}

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

Can you add docs here on why you need this specialization? I think you could get away without it

// UTILITY FUNCTIONS //
////////////////////////////////////////////////////////////////

void check_params(const double &v, const double &z) {

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

I think you can use our internal stuff instead of this

}

T_v v_ = fabs(v);
switch (choose_computation_type(value_of(v_), value_of(z))) {

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

I prefer if statements instead of switch because to the fallthrough gotchas

}

template <typename T_v>
var log_modified_bessel_second_kind_frac(const T_v &v, const var &z) {

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

I think if you clean up the above you won't need the template specialization

#include <stan/math/rev/scal/fun/log_modified_bessel_second_kind_frac.hpp>
#include <stan/math/rev/scal/fun/modified_bessel_second_kind.hpp>
#include <stan/math/rev/core/set_zero_all_adjoints.hpp>
#include <gtest/gtest.h>

This comment has been minimized.

Copy link
@SteveBronder

SteveBronder Jul 5, 2019

Contributor

Move gtest below stan includes

@martinmodrak

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

Thanks very much @SteveBronder for going through the code. Most of the suggestions are very sensible and once I resume working on this, I will implement them and will explain where I differ in opinion.

Unfortunately however, the feature is far from complete as the function still has excessive errors for some parameters and does not pass tests (see the previous post: #1121 (comment) for some details). I am unsure about the timeline of resolving this as it seems to require some additional math insights (which I do not have and don't have more ideas where to look) and also because the project that drove this feature is now on the back burner as we encountered some other problems.

Should have probably made the status of PR more prominent, sorry if the feature does not make it in the end and your effort ends up not improving Stan codebase.

@SteveBronder

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2019

No worries man! if this is still a WIP then I'm going to close the PR for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.