Embedded Laplace approximation #755

charlesm93 · 2018-02-07T14:49:46Z

Summary:

Want to create a Laplace approximation for the following distributions:

Normal
Poisson
Binomial
Negative binomial

Description:

Doing a Laplace approximation with the current algebraic solver (which uses a dogleg solver) is prohibitively slow. Instead we could use a Newton line-search method, coupled with some analytical derivatives. This line-search method would be specialized and therefore not exposed to the Stan language, but the approximator would be. Aki and Dan have a prototype in Stan, which uses autodiff. The C++ implementation would naturally be more optimized.

Current Version:

v2.17.0

bob-carpenter · 2018-02-07T17:12:15Z

Is the only use case you anticipate through the Stan language? Is the idea to have approximate distributions, like poisson_approx or something?

This issue eventually needs to specify what the signatures are of any functions that are being proposed for the math library.

@charlesm93---I assigned to you for v3.0.0 (that's where I'm putting things we definitely want to do but not necessarily for the next release). It helps to add labels to classify the issue.

avehtari · 2018-02-11T13:55:49Z

This is part of the path towards INLA speed for Gaussian latent variable models in Stan. Anticipated first use case is compound functions for GLMs with Laplace integration over weights. Issue #493 describes compound function

y ~ logistic_regression(x, beta);

where speed-up is obtained with analytic gradients. With Laplace approximation it would be possible to add something like

y ~ logistic_regression_la(x, sigma);

which would assume Gaussian prior on beta with sd sigma, and would use Laplace approximation to integrate over beta. Instead of logistic, the first ones to implement would be Normal, Poisson, Binomial and Negative-Binomial. Eventually there will be different structured Gaussian priors like hierarchical, Markov random field and Gaussian process. Ping @dpsimpson

wds15 · 2018-02-11T18:51:29Z

@avehtari this sounds great...shouldn't we have a different sigma for the intercept and all other regressors? I would appreciate that as I don't like that much data-driven priors which you essentially end-up if I can only give a single sigma and hence have to transform my data. Maybe too early for these comments, but relevant from my perspective.

avehtari · 2018-02-11T19:41:39Z

shouldn't we have a different sigma for the intercept and all other regressors?

Sure. I didn't say that sigma would be a scalar. We'll make a more detail proposal soon.

SteveBronder · 2020-09-27T19:01:19Z

copy pasta for email

One other little thing if you want to squeeze out the last little bits of speed. Note that stan math functions always return back an actual matrix, where one of the main benefits of eigen is to compose multiple expressions together so during compile time it writes them out very efficiently. If you are working with matrices of doubles you should use eigen methods for faster code. Here's a trick you can do with things like third_diff() etc

template <typename T>
Eigen::Matrix<T, Eigen::Dynamic, 1>
third_diff(const Eigen::Matrix<T, Eigen::Dynamic, 1>& theta) const {
  Eigen::VectorXd exp_theta = exp(theta);
  Eigen::VectorXd one = Eigen::VectorXd::Ones(theta.size());
  // Note the auto type!!
  auto nominator = exp_theta.cwiseProduct(exp_theta - one);
  auto denominator = (one + exp_theta).array().sqrt().matrix().cwiseProduct(one + exp_theta);
  return n_samples_.cwiseProduct((nominator.array() / denominator.array()).matrix());
}

So the main bit here is that numerator and denominator being "auto" means they are a wacky expression like ColWiseProd<CWiseSubtract<Matrix, Matrix>,Matrix> that is not actually evaluated at that line. So when you pass thos into the return line the expression for the result is generated fully. (you could also just slap the code for numerator and denominator into the return line but then it starts becoming a bit of a mouthful). But you should only do this in places where you use the expression once, otherwise it's better to eval it into a temporary like you are currently doing with one. Here's an example that shows on the rhs the weird expression type nominator, denominator, and the return type etc gets

https://godbolt.org/z/7bE5n1

SteveBronder · 2020-09-27T19:03:31Z

Also writing the laplace vari using the reverse_pass_callback() and var<matrix> for the covariance should be very efficient

https://github.com/stan-dev/math/blob/try-laplace_approximation2/stan/math/laplace/laplace_pseudo_target.hpp#L29

dpsimpson · 2020-09-27T19:03:57Z

This is so cool that this is possible now! But this is the wrong Laplace branch. The good one is here https://github.com/stan-dev/math/tree/try-laplace_approximation2/stan/math/laplace

…

On Sun, Sep 27, 2020 at 15:01 Steve Bronder ***@***.***> wrote: copy pasta for email One other little thing if you want to squeeze out the last little bits of speed. Note that stan math functions always return back an actual matrix, where one of the main benefits of eigen is to compose multiple expressions together so during compile time it writes them out very efficiently. If you are working with matrices of doubles you should use eigen methods for faster code. Here's a trick you can do with things like third_diff() etc template Eigen::Matrix<T, Eigen::Dynamic, 1> third_diff(const Eigen::Matrix<T, Eigen::Dynamic, 1>& theta) const { Eigen::VectorXd exp_theta = exp(theta); Eigen::VectorXd one = Eigen::VectorXd::Ones(theta.size()); // Note the auto type!! auto nominator = exp_theta.cwiseProduct(exp_theta - one); auto denominator = (one + exp_theta).array().sqrt().matrix().cwiseProduct(one + exp_theta); return n_samples_.cwiseProduct((nominator.array() / denominator.array()).matrix()); } So the main bit here is that numerator and denominator being "auto" means they are a wacky expression like ColWiseProd<CWiseSubtract<Matrix, Matrix>,Matrix> that is not actually evaluated at that line. So when you pass thos into the return line the expression for the result is generated fully. (you could also just slap the code for numerator and denominator into the return line but then it starts becoming a bit of a mouthful). But you should only do this in places where you use the expression once, otherwise it's better to eval it into a temporary like you are currently doing with one. Here's an example that shows on the rhs the weird expression type nominator, denominator, and the return type etc gets https://godbolt.org/z/7bE5n1 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#755 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRICBRJKRPYPBEJMYHABEDSH6DYXANCNFSM4EPTKGSA> .

SteveBronder · 2020-09-27T20:24:13Z

Whups, good catch! I just searched for laplace issue and this was the nearest looking one

charlesm93 · 2020-09-28T13:10:01Z

Good catch @dpsimpson. The reason we have two branches is because the first branch was written when we submitted the manuscript on the adjoint-differentiated Laplace approximation (with code discussed in the paper). The second branch came after discussion on how to improve the code before StanCon, but we didn't want to break "backward compatibility" with the paper.

We get to update the manuscript before its publication, so I'll link to the new branch and delete the old one.

charlesm93 · 2021-12-06T00:40:22Z

I deleted a bunch of files which I used for my research but wouldn't make it into production code. The commit and the commit prior to those changes, should we need to retrieve those files, are

commit 031a81b8bcc9bfef3dd5c81f791895086db33765 (HEAD -> experimental/laplace, origin/experimental/laplace)
Author: Charles Margossian <charlesm@metrumrg.com>
Date:   Sun Dec 5 19:23:31 2021 -0500

    Delete research tests which are / will not be unit tests.

commit b2e4b2fdfc5a8a0b638eebacb075a7d2e89e572a
Author: Charles Margossian <charlesm@metrumrg.com>
Date:   Sun Dec 5 19:11:08 2021 -0500

    make laplace_marginal_bernoulli_logit_lpmf use general likelihood functor.

dated December 5th. One file we might want to pull back out at some point is the draft for the negative binomial likelihood.

Currently, every unit test under test/unit/math/laplace/ passes (even though some tests are incomplete). I haven't touched the rng tests yet. These files use the analytical derivatives for the likelihood that I worked out in early prototypes of the embedded Laplace. This will be replaced with automatic differentiation, which is a tad more efficient and requires much less coding for every new likelihood we add. There is also a unit test for the user-specified likelihood, even though this feature is more experimental. I have already switched to autodiff for the function that computes the approximate marginal likelihood. Once the rng files are updated, we can delete a good chunk of code under laplace_likelihood_bernoulli_logit and laplace_likelihood_poisson_log.

Since we might want to further compare autodiff and analytical derivatives, I think it's fine to keep this code in for now.

charlesm93 · 2021-12-06T00:42:57Z

For any future modifications, the unit tests that currently pass should be:

laplace_marginal_poisson_log_lpmf_test.cpp
laplace_marginal_bernoulli_logit_lpmf_test.cpp
laplace_marginal_lpdf_test.cpp

All these tests for consistency between (first-order reverse mode) autodiff and finite diff. For most tests, I have benchmarks from GPstuff but where indicated in the code a benchmark is missing.

charlesm93 · 2021-12-06T00:46:58Z

Todo list for myself:

write a primer design doc (per @SteveBronder 's request)
complete unit tests for existing functions using benchmarks from GPstuff (demo for poisson log likelihood)
work out unit tests for rng functions (demo for poisson log likelihood)

EDIT: updated 01/16.

dpsimpson · 2021-12-06T00:48:07Z

Let me know if you need help!

…

On Mon, Dec 6, 2021 at 11:47 Charles Margossian ***@***.***> wrote: Todo list for myself: [] write a primer design doc (per @SteveBronder <https://github.com/SteveBronder> 's request) [] complete unit tests for existing functions using benchmarks from GPstuff [] work out unit tests for rng functions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#755 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRICBS7JNU5BXPDFCJMBWLUPQBY3ANCNFSM4EPTKGSA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

charlesm93 · 2021-12-06T00:52:57Z

What would be the best way to test the rng? I can try to come up with limiting cases for each example, but I'm wondering if there is a general approach. How was this done for INLA and GPstuff?

dpsimpson · 2021-12-06T07:05:01Z

It's hard to work out how to test this as it currently is - the function does 3 things: competes Laplace approximation (tested elsewhere), computes the mean and covariance matrix (not tested), and does the multi normal rng (function tested, this specific call not). My temptation (which might not be very smart) would be to separate that computation of the mean and the covariance into an inline function and test it separately. Then you can test the rest of it in the same way that, say multi_normal_prec_rng is tested.

wds15 · 2021-12-06T07:44:36Z

Ripping code apart into testable things is not at all not smart from what I learned.

dpsimpson · 2021-12-06T08:01:04Z

The other option is to do a probabilistic test that runs the rng 1k+++ times and computes the mean and covariances, but that feels like it would be a very expensive test. (and it would occasionally fail) I can't think of another way to test a multivariate distribution without direct access to that mean and covariance matrix.

dpsimpson · 2021-12-06T08:04:14Z

Actually, there is one other possibility, that depends on how the rng argument works. If you can control its state, you could compare the output from laplace_base_rng to the multi_normal_rng with analytically computed mean and covariance. That would probably be the sensible thing because they should be (floating point) identical.

bob-carpenter · 2021-12-06T16:00:35Z

Ripping code apart into testable things is not at all not smart from what I learned.

There are at least three concerns that need to be balanced,

efficiency
code readability and maintainability
testability

I think @wds15 just means that (1) is usually opposed to (2) and (3).

wds15 · 2021-12-06T16:19:18Z

One different approach I did for one of my R packages where I wanted to run SBC is to proxy the test results from the SBC run. So what I do is:

Run SBC on a cluster and store the final histograms needed to caclculate the chi-square statistic. This is stored with a time-stamp
The unit test is then doing
a) check that the time-stamp of the stored SBC run is no older than 0.5y as compared to the runtime of the test
b) test for uniformity via a chi-square test; allowing some failures according to what I would expect

This way I can have a massive SBC simulation from which the key results are part of the test evidence and I ensure that the SBC outputs are not too old. That's not ideal, but a compromise. Maybe a more sophisticated version of this is useful here as well?

dpsimpson · 2021-12-07T00:25:05Z

I think that’s too heavy for such a simple function. Equality of a matrix and a vector OR equality of a draw made with the same random seed is enough to show correctness. (Actually this has to happen twice because there’s a factored and non-factored version)

…

On Tue, Dec 7, 2021 at 03:19 wds15 ***@***.***> wrote: One different approach I did for one of my R packages where I wanted to run SBC is to proxy the test results from the SBC run. So what I do is: 1. Run SBC on a cluster and store the final histograms needed to caclculate the chi-square statistic. This is stored with a time-stamp 2. The unit test is then doing a) check that the time-stamp of the stored SBC run is no older than 0.5y as compared to the runtime of the test b) test for uniformity via a chi-square test; allowing some failures according to what I would expect This way I can have a massive SBC simulation from which the key results are part of the test evidence and I ensure that the SBC outputs are not too old. That's not ideal, but a compromise. Maybe a more sophisticated version of this is useful here as well? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#755 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRICBQXCNO2LSTRY3DDXTTUPTPBDANCNFSM4EPTKGSA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

charlesm93 · 2022-01-13T11:26:31Z

I ended up running with one of @dpsimpson's suggestions and implementing it in laplace_poisson_log_rng_test.cpp. I took a limiting case with a diagonal prior covariance and worked out the derivative of the objective function. You then pass this to algebra_solver to obtain the mean. The covariance of the Laplace approximation can be calculated by hand (as a function of the mode).

I then ran a bunch of simulations (1e6) and checked estimators of the mean, variance and covariance. On a two group problem, this test takes 8 seconds to run. It's not ideal but it's not horrendous.

I'm not sure how to control the rng state. Both laplace_poisson_log_rng and multinormal_rng admit an rng argument with the type

boost::random::mt19937

but I couldn't fix this state. I'm assuming it's possible, in which case, we can implement Dan's second proposition. It's a bit of work to get the analytical results, but only a bit and I think it's good to have unit tests which rely on analytical calculations.

charlesm93 · 2022-01-16T10:05:56Z

Thanks @SteveBronder for showing me how to control the random seed. Using this, I was able to produce the same sample using the embedded Laplace and multi_normal_rng. The mean and covariance for the latter are calculated (semi-)analytically.

  boost::random::mt19937 rng;
  rng.seed(1954);
  Eigen::MatrixXd theta_pred
    = laplace_poisson_log_rng(sums, n_samples, covariance_function, phi,
                              x_dummy, d0, di0, theta_0, rng);

  // Compute exact mean and covariance.
  Eigen::VectorXd theta_root
    = algebra_solver(stationary_point(), theta_0, phi, d0, di0);
  Eigen::MatrixXd K_laplace
    = laplace_covariance(theta_root, phi);

  rng.seed(1954);
  Eigen::MatrixXd theta_benchmark
    = multi_normal_rng(theta_root, K_laplace, rng);

  double tol = 1e-3;
  EXPECT_NEAR(theta_benchmark(0), theta_pred(0), tol);
  EXPECT_NEAR(theta_benchmark(1), theta_pred(1), tol);

Is this enough? I think this checks the boxes, i.e. making sure the rng function generates a sample from the correct approximating normal distribution.

SteveBronder · 2022-01-17T19:24:27Z

Seems good! Is this all up on a branch somewhere?

charlesm93 · 2022-01-18T09:37:27Z

Yes. experimental/laplace.

SteveBronder · 2022-02-18T21:06:33Z

Just making a note we removed an experimental feature to flip between diagonal and non-diagonal covariance estimates in commit 6b97f1f. We also remove the do_line_search argument there (and just use max_line_search > 0 to decide the line search)

charlesm93 · 2022-05-18T15:20:10Z

I'm going to work on the following update:

constrain hessian_block_size > 0. Right now, hessian_block_size = 0 is used to store W in a vector, as opposed to a sparse Eigen matrix, which is what you would get with hessian_block_size = 1. I find using a vector more efficient than using a sparse matrix. But having the 0 and 1 cases is confusing. So if hessian_block_size = 1, it will use a vector and 0 will no longer be an acceptable value.

Update: ... and job done.

Next up: update the notebook and the examples therein (and done!)

charlesm93 · 2022-05-18T15:31:24Z

Up next, I want to tackle the sparse covariance case. Of interest is the scenario where the Hessian and the covariance are both sparse, as is the case in a model from our BLS colleagues. In this scenario the B-matrix (the one matrix on which we perform a Cholesky decomposition) also becomes sparse. If B is diagonal all operations become O(n), as opposed to O(n^3). More generally, if m is the largest block size between the Hessian and the covariance, the complexity of the method would be O(nm^2).

Before implementing this at a production level, I want to build a proof of concept. My idea would be to add an optional argument:

cov_block_size, which defaults to n.
As a first step, only handle the special case where both hessian_block_size = 1 and cov_block_size = 1 in the Newton solver, and test this. If this works, update the autodiff method.
As a test, I'll start with the disease map model but using a diagonal covariance.

charlesm93 · 2024-05-09T16:59:40Z

Moving this to #3065

bob-carpenter added this to the Future milestone Feb 7, 2018

bob-carpenter added feature algorithm new function labels Feb 7, 2018

bob-carpenter assigned charlesm93 Feb 7, 2018

bob-carpenter modified the milestones: Future, v3.0.0 Feb 7, 2018

charlesm93 mentioned this issue May 28, 2018

Expose Laplace Approximation stan-dev/stan#2522

Open

syclik removed the algorithm label Feb 23, 2019

alashworth mentioned this issue Mar 12, 2019

Expose Laplace Approximation alashworth/test-issue-import#195

Open

serban-nicusor-toptal modified the milestones: 3.2.0, 3.2.0++ Apr 22, 2020

serban-nicusor-toptal modified the milestones: 3.2.0++, 3.3.0++ Jul 28, 2020

serban-nicusor-toptal modified the milestones: 3.3.0++, 3.4.0++ Nov 3, 2020

charlesm93 changed the title ~~Laplace approximation~~ Embedded Laplace approximation Dec 6, 2021

charlesm93 mentioned this issue May 9, 2024

Integrated Laplace approximation #3065

Open

charlesm93 closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedded Laplace approximation #755

Embedded Laplace approximation #755

charlesm93 commented Feb 7, 2018

bob-carpenter commented Feb 7, 2018

avehtari commented Feb 11, 2018

wds15 commented Feb 11, 2018

avehtari commented Feb 11, 2018

SteveBronder commented Sep 27, 2020 •

edited

Loading

SteveBronder commented Sep 27, 2020 •

edited

Loading

dpsimpson commented Sep 27, 2020 via email

SteveBronder commented Sep 27, 2020

charlesm93 commented Sep 28, 2020

charlesm93 commented Dec 6, 2021

charlesm93 commented Dec 6, 2021

charlesm93 commented Dec 6, 2021 •

edited

Loading

dpsimpson commented Dec 6, 2021 via email

charlesm93 commented Dec 6, 2021

dpsimpson commented Dec 6, 2021

wds15 commented Dec 6, 2021

dpsimpson commented Dec 6, 2021

dpsimpson commented Dec 6, 2021

bob-carpenter commented Dec 6, 2021

wds15 commented Dec 6, 2021

dpsimpson commented Dec 7, 2021 via email

charlesm93 commented Jan 13, 2022

charlesm93 commented Jan 16, 2022

SteveBronder commented Jan 17, 2022

charlesm93 commented Jan 18, 2022

SteveBronder commented Feb 18, 2022 •

edited

Loading

charlesm93 commented May 18, 2022 •

edited

Loading

charlesm93 commented May 18, 2022

charlesm93 commented May 9, 2024

Embedded Laplace approximation #755

Embedded Laplace approximation #755

Comments

charlesm93 commented Feb 7, 2018

Summary:

Description:

Current Version:

bob-carpenter commented Feb 7, 2018

avehtari commented Feb 11, 2018

wds15 commented Feb 11, 2018

avehtari commented Feb 11, 2018

SteveBronder commented Sep 27, 2020 • edited Loading

SteveBronder commented Sep 27, 2020 • edited Loading

dpsimpson commented Sep 27, 2020 via email

SteveBronder commented Sep 27, 2020

charlesm93 commented Sep 28, 2020

charlesm93 commented Dec 6, 2021

charlesm93 commented Dec 6, 2021

charlesm93 commented Dec 6, 2021 • edited Loading

dpsimpson commented Dec 6, 2021 via email

charlesm93 commented Dec 6, 2021

dpsimpson commented Dec 6, 2021

wds15 commented Dec 6, 2021

dpsimpson commented Dec 6, 2021

dpsimpson commented Dec 6, 2021

bob-carpenter commented Dec 6, 2021

wds15 commented Dec 6, 2021

dpsimpson commented Dec 7, 2021 via email

charlesm93 commented Jan 13, 2022

charlesm93 commented Jan 16, 2022

SteveBronder commented Jan 17, 2022

charlesm93 commented Jan 18, 2022

SteveBronder commented Feb 18, 2022 • edited Loading

charlesm93 commented May 18, 2022 • edited Loading

charlesm93 commented May 18, 2022

charlesm93 commented May 9, 2024

SteveBronder commented Sep 27, 2020 •

edited

Loading

SteveBronder commented Sep 27, 2020 •

edited

Loading

charlesm93 commented Dec 6, 2021 •

edited

Loading

SteveBronder commented Feb 18, 2022 •

edited

Loading

charlesm93 commented May 18, 2022 •

edited

Loading