Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented log(BesselK) for fractional orders #1121

Conversation

martinmodrak
Copy link
Contributor

@martinmodrak martinmodrak commented Feb 18, 2019

Summary

Implemented logarithm of the modified Bessel function of the second kind (Bessel K) for fractional orders, supporting differentiation with respect to both variables.

Log(BesselK) is useful for computing the lpdf of some distributions, such as Generalized inverse Gaussian or SICHEL

The computation is based on https://github.com/stan-dev/stan/wiki/Stan-Development-Meeting-Agenda/0ca4e1be9f7fc800658bfbd97331e800a4f50011 (but modified to allow for stan::math::var arguments). Thanks to @bgoodri for providing it, I would not have been able to find this formula on my own.

The code snippet linked above is in turn based on Equation 26 of Rothwell: Computation of the
logarithm of Bessel functions of complex argument and fractional order
https://scholar.google.com/scholar?cluster=2908870453394922596&hl=en&as_sdt=5,33&sciodt=0,33

Both derivatives are computed by auto-diffing over the 1d integrator. It should be possible to get an explicit formula for the derivative with respect to v similarly to the way it is computed in modified_bessel_second_kind.

The name of the main function (log_modified_bessel_second_kind_frac) is provisional. The function is now in rev/arr/fun. This is IMHO weird, but I include rev/arr/functor/integrate_1d so the linter complained when the function was in rev/scal/

In addition, I tried to improve the error messages from integrate_1d and its documentation, as following the current documentation led me astray. This could be moved to a separate pull request if desired.

Tests

A grid of values of the function and its gradients was computed in Mathematica (code is part of the comments in the test file). The relative error between Mathematica and this implementation is <1e-7 for all values and gradients tested.

Side Effects

The error messages of integrate_1d have been modified to include the error threshold.

Checklist

  • Math issue BesselK continuous function #1112

  • Copyright holder: Institute of Microbiology of the Czech Academy of Sciences

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@bgoodri
Copy link
Contributor

bgoodri commented Feb 18, 2019

CRAN won't allow those sprintf things

@bgoodri
Copy link
Contributor

bgoodri commented Feb 18, 2019

I think you should complex-step the derivative with respect to the order when it is a var but that is going to require the Boost 1.7.0 PR to get merged so that you can use complex integrands.

@bgoodri
Copy link
Contributor

bgoodri commented Feb 18, 2019

The derivative with respect to the primary argument has many formulas.

@bgoodri
Copy link
Contributor

bgoodri commented Feb 18, 2019

Also, these tests against Mathematica are not very strong in my opinion. It is much better to test with identities, such as at http://functions.wolfram.com/Bessel-TypeFunctions/BesselK/17/ , that hold for all input values. Usually you can just divide the left-hand side of an identity by the right-hand side and check that the ratio is near one and 1 + the derivative is near one.

@bgoodri
Copy link
Contributor

bgoodri commented Feb 18, 2019

Also, it may be worth thinking about whether we need a specialized integration routine for log_foo functions that utilizes log_sum_exp logic.

@bgoodri
Copy link
Contributor

bgoodri commented Feb 18, 2019

I think you can ultimately avoid using stan::math::integrate_1d and just use boost::math::quadrature::tanh_sinh, in which case you could move the functions back to the scal directories if they do not all get flattened first.

@martinmodrak
Copy link
Contributor Author

Thanks for the feedback - I assume that there is no major known reason to not include this function in the library and it is therefore worthwhile for me to spend time working on it. I have tried to further improve the code.

In particular, I have added analytical derivative wrt. z, but the derivative wrt. v is still from autodiff. There is a trick I use to combine those in the <var,var> case which I am not sure is great, but it works:

  if(is_var<T_v>::value) {
    operands.push_back(value);
    gradients.push_back(1);
  }
  operands.push_back(z);
  gradients.push_back(gradient_dz);

  return precomputed_gradients(value_of(value), operands, gradients);

Where value contains the result of autodiffed computation when v is var (and is double when v is double).

CRAN won't allow those sprintf things

I updated to snprintf - is that still an issue? Would you suggest using std::ostringstream instead? Or do you believe the improved messages are not worth the hassle?

I think you can ultimately avoid using stan::math::integrate_1d and just use boost::math::quadrature::tanh_sinh, in which case you could move the functions back to the scal directories if they do not all get flattened first.

That's what I have done. I have also tried getting analytical derivative of the inner integral wrt. v, to avoid using nested autodiff, but I could not get it to compute stably. I now have a custom nested autodiff code for this which is just simplified gradient_of_f.

However, I use std::vector to be able to use precomputed_gradients which means the linter complains that the function should not be in scal :-(

It is much better to test with identities

Currently looking into this

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 20, 2019 via email

@syclik
Copy link
Member

syclik commented Feb 21, 2019

I have no idea about utility.

But all the analytic derivatives are great for anything we do implement.

I'll cosign. If you think it'll help you, then feel free to continue to implement. If it's written well, we can include it in the Math library for your use. ("well" is a very loosely defined term, but essentially, if there's very little maintenance burden for the future, then it's fine.)

There may be a bug in the linter. std::vector is OK to use in the scal directory inside functions. What we don't want is to have functions that apply to std vectors defined in scal. Those should go in arr.

@bob-carpenter, that's not correct. The difference in scal vs arr was that scal does not include any std::vector. In retrospect, that's probably not really a useful distinction and we're trying to flatten the structure, but for the time being, it would have to be in arr since it depends on std::vector.

@martinmodrak, will you put a comment here when it's ready for review?

@bgoodri, would you be able to review once it's ready?

@bob-carpenter
Copy link
Contributor

I recall the decision going the other way, but it's no big deal as long as @syclik can lay out an acceptable workaround until the scal / arr / mat distinction is collapsed.

@syclik: The question is where the definition of real foo(real) should go if there is astd::vector in the implementation? Is it OK to put it in arr?

@martinmodrak
Copy link
Contributor Author

Thanks very much @SteveBronder for going through the code. Most of the suggestions are very sensible and once I resume working on this, I will implement them and will explain where I differ in opinion.

Unfortunately however, the feature is far from complete as the function still has excessive errors for some parameters and does not pass tests (see the previous post: #1121 (comment) for some details). I am unsure about the timeline of resolving this as it seems to require some additional math insights (which I do not have and don't have more ideas where to look) and also because the project that drove this feature is now on the back burner as we encountered some other problems.

Should have probably made the status of PR more prominent, sorry if the feature does not make it in the end and your effort ends up not improving Stan codebase.

@SteveBronder
Copy link
Collaborator

No worries man! if this is still a WIP then I'm going to close the PR for now

@bgoodri
Copy link
Contributor

bgoodri commented Nov 24, 2019

@martinmodrak Did you try equation (24) of Rothwell in the remaining problematic region? Rothwell says equation (24) is "difficult to compute when |z| is small", but I think it might be accurate enough when z is not small but close to v. For z = 11 and v = 50, I get function values of
58.0217473497870557 from the log of equation(24) with long doubles
58.02174734978707484 from WolframAlpha
and complex-step derivatives with respect to v of
2.20960269175438162 from log of equation(24) with long doubles
2.209602691754382164 from WolframAlpha

Here is my Rcpp version for the log of equation (24) in Rothwell:

// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/math/quadrature/exp_sinh.hpp>
#include <boost/math/special_functions/digamma.hpp>

// [[Rcpp::export]]
std::complex<double> log_besselK(double x, double nu, double h = 1e-9) {
  using std::exp;
  using std::log;
  using std::log1p;
  
  const std::complex<long double> vmhalf(nu - 0.5, h);

  boost::math::quadrature::exp_sinh<long double> integrator;
  auto log_Q = log(integrator.integrate([&](const long double t) {
    return exp(-t +  vmhalf * (log(t) + log1p(0.5 * t / x)));
  }, std::sqrt(std::numeric_limits<long double>::epsilon()));
  std::complex<long double> log_K(0.5 * log(0.5 * M_PI / x) 
                                  - x - std::lgamma(nu + 0.5) + log_Q.real(),
                                  - h * boost::math::digamma(nu + 0.5) 
                                  + log_Q.imag());
  return static_cast<std::complex<double> >(log_K);
}

@martinmodrak
Copy link
Contributor Author

martinmodrak commented Nov 27, 2019

@bgoodri Thanks for the hint. I actually tried the formula given at the question https://math.stackexchange.com/questions/1960778/approximating-the-log-of-the-modified-bessel-function-of-the-second-kind (the answer is mine), which helps in a bit of the parameter space but does not work toward the larger values. The formula 24 in Rothwell is actually a bit less stable than that: I haven't found parameter values for which the Rothwell 24 integral does not blow up while the "gamma integral" from the SE question does blow up.

Also using long double is not of much help as it increases the range where the integrands don't blow up only very slightly (unsurprisingly - due to the exponential behavior).

One thing that might help a lot would be a reimplementation of the exph_sinh integrator using the log_sum_exp logic, which Bob hinted is a possibility, allowing me to stay on the log scale. (for the problematic parameters the maximum log of the integrand can crawl over several hundred)

Here is the current state of affairs.
image

@bgoodri
Copy link
Contributor

bgoodri commented Nov 27, 2019

Here is a version of equation (24) from Rothwell with a log-sum-exp thing, which I would guess works better when v is large.

// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/math/quadrature/tanh_sinh.hpp>
#include <boost/math/quadrature/exp_sinh.hpp>
#include <boost/math/special_functions/digamma.hpp>

std::complex<double> log_besselK(double x, double nu, double h = 1e-9) {
  using std::exp;
  using std::log;
  using std::log1p;
  
  const std::complex<long double> v(nu, h);
  const std::complex<long double> half(0.5);
  const std::complex<long double> x_(x);
  auto maximum = std::sqrt(v * v - v + x_ * x_ + half * half) + v - x_ - half;

  auto vmhalf = v - half;
  boost::math::quadrature::exp_sinh<long double> integrator;
  auto log_Q = maximum.real() > 0 
             ? maximum + log(integrator.integrate([&](const long double t) {
                 auto t_max = t + maximum;
                 return exp(-t_max + vmhalf * (log(t) + log1p(0.5 * t / x)));
               }, std::sqrt(std::numeric_limits<long double>::epsilon())))
             : log(integrator.integrate([&](const long double t) {
                 return exp(-t + vmhalf * (log(t) + log1p(0.5 * t / x)));
             }, std::sqrt(std::numeric_limits<long double>::epsilon())));
  std::complex<long double> log_K(0.5 * log(0.5 * M_PI / x)
                                  - x - std::lgamma(nu + 0.5) + log_Q.real(),
                                  - h * boost::math::digamma(nu + 0.5)
                                  + log_Q.imag());
  return static_cast<std::complex<double> >(log_K);
}

@martinmodrak
Copy link
Contributor Author

martinmodrak commented Nov 27, 2019

EDIT: Sorry, I misread your suggestion. I didn't try moving the maximum out. That sounds reasonable I will try it!

@martinmodrak
Copy link
Contributor Author

Thanks @bgoodri , moving the maximum out helped a lot. This is the current error plot (and I think I can still tweak it a bit, so maybe I am almost there)

image

@bgoodri
Copy link
Contributor

bgoodri commented Nov 28, 2019 via email

@bgoodri
Copy link
Contributor

bgoodri commented Nov 28, 2019 via email

@bgoodri
Copy link
Contributor

bgoodri commented Nov 28, 2019

@martinmodrak
Copy link
Contributor Author

martinmodrak commented Nov 29, 2019

Thanks for the help, I am finally moving forward. Just to report current status: It turns out the trapezoid rule on the cosh formula from the Arxiv paper (on the log scale, via log_sum_exp) works great for v ~= z, but only for the actual value and d/dz. Autodiffing through the trapezoid rule yields bad values for d/dv.

image

(there are more test points as I added testing for the boundaries where I choose different formulas)

But d/dv was computed OK with the previous algorithms, so my next plan is to combine the value from the trapezoid rule with derivatives from the asymptotic expressions. After that, the derivatives for very small v and z are still bad (error around 1e-3), which I still don't know how to tackle (the other integrals, including the trapezoid give even worse results than Rothwell there).

@bgoodri
Copy link
Contributor

bgoodri commented Nov 29, 2019 via email

@bgoodri
Copy link
Contributor

bgoodri commented Nov 29, 2019

Here is a rough version of equation 1.1 with root finding to get the mode and then use it with log-sum-exp logic. Unfortunately, Boost only added Newton root-finding of complex functions in version 1.70 so we can't (easily) use the complex step method to get the sensitivity to v.

// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/math/quadrature/exp_sinh.hpp>
#include <boost/math/tools/roots.hpp>
#include <tuple> // for std::tuple and std::make_tuple.


// [[Rcpp::export]]
double log_besselK(double x, double nu) {
  using std::exp;
  using std::log;
  using std::log1p;
  using std::cosh;
  using std::sinh;
  using std::tanh;
  
  double ub = 1.0;
  
  while ( -x * sinh(ub) + nu * tanh(ub * nu) > 0 ) {
    ub *= 2.0;
  }
  double guess = 0.75 * ub;
  
  boost::uintmax_t maxit = 20;
  using boost::math::tools::newton_raphson_iterate;
  
  auto maximum = newton_raphson_iterate(
    [&](double t){
      auto sinh_t = sinh(t);
      auto v_t = nu * t;
      auto tanh_vt = tanh(v_t);
      auto cosh_vt = cosh(v_t);
      auto x_sinh_t = x * sinh_t;
      auto value = nu * tanh_vt - x_sinh_t;
      auto ratio = nu / cosh_vt;
      auto ratio2 = ratio * ratio;
      auto D1 = ratio2 - x * cosh(t);
      return std::make_tuple(value, D1);
    }, guess, ub > 1.0 ? 0.5 * ub : 0, ub, 8, maxit);
  
  double error_estimate;
  double L1;

  boost::math::quadrature::exp_sinh<double> integrator;
  double termination = sqrt(std::numeric_limits<double>::epsilon());
  
  auto log_K = maximum + log(integrator.integrate(
    [&](double t) {
      auto nut = nu * t;
      // log(cosh(x)) == x + log1p(exp(-2 * x)) - log(2)
      return exp(-maximum + nut + log1p(exp(-2.0 * nut)) 
                 - log(2.0) - x * cosh(t));
    }, termination, &error_estimate, &L1));

  std::cout << "error_estimate = " << error_estimate << std::endl 
            << "L1 = " << L1 << std::endl
            << "maximum = " << maximum << std::endl;
  return log_K;
}

@bgoodri
Copy link
Contributor

bgoodri commented Nov 29, 2019

Sorry. I was doing all this Newton stuff to get the value of t that maximizes the integrand but then shifting by that argmax instead of shifting by the maximal log-value. This version should actually be good (for the value; we still have to deal with the autodiff) because when shifted properly, there is relatively little area under the curve:

// [[Rcpp::plugins(cpp14)]]
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/math/quadrature/exp_sinh.hpp>
#include <boost/math/tools/roots.hpp>
#include <tuple> // for std::tuple and std::make_tuple.

// [[Rcpp::export]]
double log_besselK(double x, double nu) {
  using std::exp;
  using std::log;
  using std::log1p;
  using std::cosh;
  using std::sinh;
  using std::tanh;
  
  double ub = 1.0;
  
  while ( -x * sinh(ub) + nu * tanh(ub * nu) > 0 ) {
    ub *= 2.0;
  }
  double guess = 0.75 * ub;
  
  boost::uintmax_t maxit = 20;
  using boost::math::tools::newton_raphson_iterate;
  
  auto arg_max = newton_raphson_iterate(
    [&](double t){
      auto sinh_t = sinh(t);
      auto v_t = nu * t;
      auto tanh_vt = tanh(v_t);
      auto cosh_vt = cosh(v_t);
      auto x_sinh_t = x * sinh_t;
      auto value = nu * tanh_vt - x_sinh_t;
      auto ratio = nu / cosh_vt;
      auto ratio2 = ratio * ratio;
      auto D1 = ratio2 - x * cosh(t);
      return std::make_tuple(value, D1);
    }, guess, ub > 1.0 ? 0.5 * ub : 0, ub, 8, maxit);
  
  const double log2 = log(2.0);
  // log(cosh(x)) == x + log1p(exp(-2 * x)) - log(2)
  auto pivot = nu * arg_max + log1p(exp(-2.0 * nu * arg_max)) - log2
             - x * cosh(arg_max);
  
  double error_estimate;
  double L1;

  boost::math::quadrature::exp_sinh<double> integrator;
  double termination = sqrt(std::numeric_limits<double>::epsilon());
  
  auto log_K = pivot + log(integrator.integrate(
    [&](double t) {
      auto nut = nu * t;
      return exp(-pivot + nut + log1p(exp(-2.0 * nut)) - log2 
                 - x * cosh(t));
    }, termination, &error_estimate, &L1));

  std::cout << "error_estimate = " << error_estimate << std::endl 
            << "L1 = " << L1 << std::endl
            << "arg_max = " << arg_max << std::endl
            << "pivot = " << pivot << std::endl;
  return log_K;
}

@jacob-hjortlund
Copy link

Don't have much to add, just wanted to express my gratitude for you guys working on this problem, as I've been having quite a bit of trouble tackling the asymptotic behavior of the modified Bessel function with nowhere near the knowledge you guys have. Thanks and keep up the great work @martinmodrak @bgoodri !

@martinmodrak
Copy link
Contributor Author

martinmodrak commented Nov 30, 2019

I am not sure finding the maximum is important for offsetting the log_sum_exp - everywhere I look, the maximum is at most 1, usually much lower. I however use the maximum to find the step size to integrate with. Luckily, for large v*x > 17, you get log(cosh(v*x)) ~= abs(v*x) - log(2). With this substitution you can analytically find the arg max of the integrand as asinh(v/z).

For large v, the integrand decays very quickly past the maximum.
The behaviour at low v and z seems to be more tricky, and I have yet to figure out if the method would be useful there.

Here's the code I used to create the latest plot:

template <typename T>
T logcosh(const T& x) {
  //Taken from the limma R package, maybe I should check it works as good as I
  //expect it to?
  //Credited in limma as 	Gordon Smyth  8 April 2007
  T x_ = abs(x);
  if(x_ < 1e-4) {
    return 0.5 * x * x;
  } else if(x_ < 17) {
    return log(cosh(x));
  } else {
    return x_ - std::log(2);
  }
}

template <typename T_v>
T_v trapezoid_cosh(const T_v &v, const double &z) {
  const double& approximate_maximum = std::asinh(value_of(v) / z);
  const int max_steps = 500;
  const double& h = approximate_maximum / (0.5 * max_steps);
  std::vector<T_v> terms;
  terms.reserve(max_steps);
  for(int n = 0; n < max_steps; n++) {
    const double x = n * h;
    const T_v last_term = logcosh(v * x) - z * cosh(x);
    terms.push_back(last_term);
    //TODO create some sensible stopping criterion
  }
  return log_sum_exp(terms) + std::log(h);
}

Also, it seems like we are doubling some work here - I keep almost up-to-date code at https://github.com/martinmodrak/math/blob/feature/issue-1112-continuous-besselk/stan/math/rev/scal/fun/log_modified_bessel_second_kind_frac.hpp for you to check out. Maybe I should open a new PR / reopen this one, so that my commits are shown in context? Also - if you have the time - it might be sensible for you to try out your implementations with the tests I've developed...

Finally, I've had mostly unsatisfying experiences with complex step - on the (two) occasions I used it, it produced almost exactly the same results as Stan's autodiff and was slower to compute. Do you have some specific reason to believe it would help here?

@jacob-hjortlund I don't think I have knowledge. Before I started working on this, I didn't even know what Bessel functions are. I am learning it all as I go, and it hurts :-) Also I wouldn't get anywhere near where I am without the help of @bgoodri . Just wanted to let you know this, before you assume my opinions have some large weight :-)

@martinmodrak
Copy link
Contributor Author

Just tried to include your implementation using exp_sinh to integrate and it runs into problems for values like v = nu = 37634, z = x = 4236 where the logarithm of the value swings between -4236 and +70535 so no amount of pivoting can help. However, using your implementation of logcosh(x) = x + log1p_exp(-2 * x) - log(2) makes the stupid trapezoid rule also resolve the derivatives wrt. v. And some of the problems can be ameliorated by increasing resoluion. With:

template <typename T>
T logcosh(const T& x) {
  //By bgoodri
  return x + log1p_exp(-2 * x) - log(2);
}

template <typename T_v>
T_v trapezoid_cosh(const T_v &v, const double &z) {
  const double& approximate_maximum = std::asinh(value_of(v) / z);
  const int max_steps = 5000;
  const double& h = approximate_maximum / (0.1 * max_steps);
  std::vector<T_v> terms;
  terms.reserve(max_steps);
  for(int n = 0; n < max_steps; n++) {
    const double x = n * h;
    const T_v last_term = logcosh(v * x) - z * cosh(x);
    terms.push_back(last_term);
    //TODO create some sensible stopping criterion
  }
  return log_sum_exp(terms) + std::log(h);
}

I get:
image

Leaving the Rothwell for small v and z the biggest problem. However, with max_steps = 5000 this is heavily inefficient and I need to find good ways to determine h and a stopping criterion.

@bgoodri
Copy link
Contributor

bgoodri commented Nov 30, 2019

The trapezoid rule function in Boost

https://www.boost.org/doc/libs/1_71_0/libs/math/doc/html/math_toolkit/trapezoidal.html

has the adaptive stopping criterion but requires a finite upper bound to integrate to. I suppose we could use the largest floating point number, but maybe there is a way to use something smaller.

@martinmodrak
Copy link
Contributor Author

Thanks, I'll look into it. Note that I can't use the boost implementation directly, because I need to keep the log_sum_exp logic. But I might be able to reimplement it. Also sorry for bothering on the weekend - you are in no way obliged to respond at this time :-)

@bgoodri
Copy link
Contributor

bgoodri commented Nov 30, 2019

No worries; it is a holiday here. You can utilize the log-sum-exp logic without a fixed number of max steps, as I did in

https://github.com/stan-dev/math/blob/develop/stan/math/prim/scal/fun/log_modified_bessel_first_kind.hpp#L208

which in this case would be something like

template <typename T_v>
T_v trapezoid_cosh(const T_v &v, const double &z) {
  using stan::math::value_of;
  using stan::math::log_sum_exp;
  
  const double& approximate_maximum = std::asinh(value_of(v) / z);
  const double& h = approximate_maximum / (0.1 * 5000);
  T_v s = 0.0;
  T_v old_s = 1.0;
  int n = 0;
  while (s > old_s || old_s > s) {
    const double x = n * h;
    const double last_term = logcosh(v * x) - z * cosh(x);
    old_s = s;
    s = log_sum_exp(s, last_term);
    n++;
  }
  return s + std::log(h);
}

But it is not super-accurate yet. Now, we need a better procedure to choose h.

@bgoodri
Copy link
Contributor

bgoodri commented Nov 30, 2019

OK, I have some ideas, but need to eat before I attempt to implement the rest of them.

  1. I think it would be best if we find the actual arg_max of the integrand with Newton-Raphson but your approximation is probably a better starting point than the starting point I was using.
  2. The while loop will terminate quicker if we initialize s = logcosh(v * arg_max) - z * cosh(arg_max), then set n = 1, and then evaluate the while loop with both x = arg_max + n * h and x = arg_max - n * h. Although then you have to return s + std::log(0.5 * h).
  3. The h should be set based on the curvature of the integrand at arg_max or maybe we can set h adaptively based on the curvature at the last point we evaluated at.

@bgoodri
Copy link
Contributor

bgoodri commented Dec 3, 2019

I am still having trouble when the order is 37634 and the argument is 4236. WolframAlpha says it is "undefined"

https://www.wolframalpha.com/input/?i=Log%5BBesselK%5B37634%2C+4236.0%5D%5D

but arb says it is 70531.3573385249910109, which is fairly consistent with the asymptotic expansion for large order and fixed argument at

https://dlmf.nist.gov/10.41.E2

which evaluates to 0.5 * log(0.5 * pi / v) - v * (1 + log(z) - log(2 * v)) which is about 70650.37. With some more tricks I can get 70522.99, which I suppose is ok in the relative error sense, but I think we're going to have to wait another week for Boost 1.72 to get released.

@bgoodri
Copy link
Contributor

bgoodri commented Feb 18, 2020

I think we can get a pretty good version of log(besselK) with the new Boost. I just need to pull some things together.

@bbbales2
Copy link
Member

@martinmodrak, @bgoodri just scrolling through old pulls and found this. Ping!

@bgoodri
Copy link
Contributor

bgoodri commented Jul 7, 2021

@martinmodrak I recently read at https://functions.wolfram.com/Bessel-TypeFunctions/BesselK/17/02/01/ that

besselK(v, z) = z / (2 * v) * [besselK(v + 1, z) - besselK(v - 1, z)]

so we can avoid the numerically problematic case where v < 1/2 by computing

log(z) - log(2 * v) + log_diff_exp(log_besselK(v + 1, z), log_besselK(fabs(v - 1), z))

although that might incur too much cancellation if v is too close to zero. That looks like twice as many calls to achieve more numerical stability, but we have to compute log_besselK(v + 1, z) anyway in order to take the derivative wrt z.

Do you want to re-open this PR or open a similar one? I think numerically stable logic would be like

  1. If v is very close to zero, do an asymptotic approximation
  2. Otherwise, if v < 1/2, utilize the above identity to shift the orders to both be greater than 1/2
  3. If v is very large, log an asymptotic approximation (possibly based on a Laplace approximation)
  4. If z is very large, log an asymptotic approximation (possibly based on a Laplace approximation)
  5. If z is much smaller than sqrt(1 + v), log an asymptotic approximation
  6. Otherwise, log Rothwell's integral using the log_sum_exp logic to pull out the peak, also splitting Rothwell's integral into two parts at the mode in case all the area is in the neighborhood of the mode

There might be another case that we have to deal with when z and v are similar but I never understood what drove the numerical inaccuracy in that situation.

@bgoodri
Copy link
Contributor

bgoodri commented Jul 13, 2021

Upon further review, I think that an approach based on the logarithm of

https://functions.wolfram.com/Bessel-TypeFunctions/BesselK/07/01/01/0006/MainEq1.L.gif

may work well. All the overflow / underflow stuff is attributable to the x^v in the denominator, which is well-handled by logarithms. The integral is oscillatory, but Boost has a customized function for integrals like that

https://www.boost.org/doc/libs/1_76_0/libs/math/doc/html/math_toolkit/fourier_integrals.html

that is said to have rapid convergence when the poles of the f function are not too close to the real axis. The underlying Ooura and Mori (1999) paper has a fast-converging example in equation 4.13 that is similar to our case in that the poles are at plus or minus the imaginary unit.

In other words, the main logic of a Stan Math implementation could look like this simple Rcpp one:

// [[Rcpp::plugins(cpp17)]]
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#define BOOST_MATH_INSTRUMENT_OOURA
#include <boost/math/quadrature/ooura_fourier_integrals.hpp>
#include <iostream>

// [[Rcpp::export]]
double log_besselK(const double x, const double nu) {
  
  using std::pow;
  using std::log;
  using std::lgamma;
  
  auto integrator = boost::math::quadrature::ooura_fourier_cos<double>();  
  
  auto f = [&nu](double t) {
    return pow(t * t + 1, -nu - 0.5);
  };

  auto [result, relative_error] = integrator.integrate(f, x);
  std::cout << "Integral = " << result << 
    ", relative error estimate " << relative_error << std::endl;
  
  return nu * (log(2.0) - log(x)) - 0.5 * log(M_PI) + lgamma(nu + 0.5) 
         + log(result);
}

It does not seem to be very accurate for large x and small v, but we could deal with that case separately.

@bgoodri
Copy link
Contributor

bgoodri commented Mar 6, 2022

Leaving this here

https://arxiv.org/abs/2201.00090

which sounds like a good approach too

@spinkney
Copy link
Collaborator

There's this recent implementation https://github.com/tk2lab/logbesselk

@spinkney
Copy link
Collaborator

spinkney commented Jul 21, 2022

@bgoodri here's my attempt at coding up the algorithm in R following the paper description

https://gist.github.com/spinkney/56d8b716248dae8545e5049f4dec33b1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants