# bivariate normal cdf #2356

Open
opened this Issue Jul 14, 2017 · 15 comments

Projects
None yet
4 participants
Contributor

### bob-carpenter commented Jul 14, 2017

#### Summary:

Add a function to compute the bivariate normal cdf.

#### Description:

For size 2 vectors `y` and `mu` and 2 x 2 covariance matrix `Sigma`, compute

``````bivar_normal_cdf(y, mu, Sigma)

= INT_{u < y}  multinormal_pdf(u, mu, Sigma) d.u
``````

where `u` is also a 2-vector and `u < y` is defined componentwise as `u[1] < y[1] && u[2] < y[2]`.

There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter.

This paper is often cited for an algorithm:

• Boys, R.J., 1989. Algorithm AS R80: A remark on Algorithm AS 76: An integral useful in calculating noncentral t and bivariate normal probabilities. Journal of the Royal Statistical Society. Series C (Applied Statistics), 38(3), pp.580-582. https://www.jstor.org/stable/2347755

v2.16.0

Contributor

### bgoodri commented Jul 14, 2017

 If someone is going to implement the CDF, you might as well implement the PMF and other functions with scalar inputs rather than 2x2 matrices and 2-vectors. … On Fri, Jul 14, 2017 at 11:41 AM, Bob Carpenter ***@***.***> wrote: Summary: Add a function to compute the bivariate normal cdf. Description: For size 2 vectors y and mu and 2 x 2 covariance matrix Sigma, compute bivar_normal_cdf(y, mu, Sigma) = INT_{u < y} multinormal_pdf(u, mu, Sigma) d.u where u is also a 2-vector and u < y is defined componentwise as u[1] < y[1] && u[2] < y[2]. There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter. Additional Info: This paper is often cited for an algorithm: - Boys, R.J., 1989. Algorithm AS R80: A remark on Algorithm AS 76: An integral useful in calculating noncentral t and bivariate normal probabilities. Journal of the Royal Statistical Society. Series C (Applied Statistics), 38(3), pp.580-582. https://www.jstor.org/stable/2347755 Current Version: v2.16.0 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2356>, or mute the thread .
Contributor

### bgoodri commented Aug 14, 2017

 This article seems to have reversed the ratios in the numerators for the calculation of `a1` and `a2`. It also does not deal with the 0 / 0 case. Here is a correct Stan function for it `````` real binormal_cdf(real z1, real z2, real rho) { if (z1 != 0 || z2 != 0) { real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number(); real a1 = (z2 / z1 - rho) / denom; real a2 = (z1 / z2 - rho) / denom; real product = z1 * z2; real delta = product < 0 || (product == 0 && (z1 + z2) < 0); return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2); } return 0.25 + asin(rho) / (2 * pi()); } ``````
Contributor

### bgoodri commented Aug 14, 2017

 There is also https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2924071 , which might be better for the log-CDF.
Contributor

### bgoodri commented Aug 14, 2017

 For completeness, I perhaps should have added that if `rho = 1`, then the bivariate CDF is `min(Phi(z1), Phi(z2))` and if `rho = -1`, it is `Phi(z1) + Phi(z2) - 1`.

### khakieconomics commented Sep 26, 2017

 This is extremely handy. Thanks @bgoodri

Open

Open

Contributor

### rtrangucci commented Aug 22, 2018 • edited

 @bob-carpenter Per Ben's comment about doing scalar inputs rather than 2-vectors: If we're set on 2-vectors, I'd like to expose the `std_binormal_integral(real y1, real y2, real rho)` as a function that takes scalars. The usage pattern for the `std_binormal_lcdf` function really doesn't lend itself to a vector input, but rather two scalar inputs. We'll usually be using the `std_binormal_lcdf` in something like a bivariate probit, where `[y1, y2]'` is a vector of vars. So the Stan code would need to do something like: ``````data { int N; int K1; int K2; matrix[N, K1] X1; matrix[N, K1] X2; } parameters { real rho; vector[K1] beta1; vector[K2] beta2; } model { vector[N] mu1 = X1 * beta1; vector[N] mu2 = X2 * beta2; for (n in 1:N) target += std_binormal_lcdf([mu1[n], mu2[n]]' | rho); ... `````` where we lose the advantage of the vectorization. In order to get the benefit of vectorization (i.e. caching the `rho` value that is repeated for every observation), we'd need to instead code the model like: ``````model { ... vector[2] model_mu[N]; for (n in 1:N) { model_mu[n] = [mu1[n], mu2[n]]'; } target += std_binormal_lcdf(model_mu | rho); ... `````` This looks funny to me, and seems inefficient, because we won't be respecting the memory locality in `mu1` and `mu2` and we need an extra loop. Here's the way it would work with the way I've coded it now: ``````model { ... target += std_binormal_lcdf(mu1 | mu2, rho); ... `````` That also looks strange to me given the placement of the conditioning bar. Maybe allowing for matrix inputs would make sense? something like ``````model { matrix[N, 2] mu; mu[,1] = X1 * beta1; mu[,2] = X2 * beta2; target += std_binormal_lcdf(mu | rho); ... `````` That I could get behind, and it would perhaps speed up internal computations.
Contributor

### bob-carpenter commented Aug 22, 2018

 Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way. Even understanding the problem, I'm still opposed to violating statistical convention with something like this: ``````std_binormal_lcdf(mu1 | mu2, rho); `````` I'd be happier with something like this: ``````std_binormal_lcdf({ mu1, mu2 } | rho); `````` but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work. We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this: ``````std_binormal_lcdf(append_col(mu1, mu2) | rho); `````` or it's transpose `append_row(mu1', mu2')`. I could live with this: ``````std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho)); `````` The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy. Or I want to generalize our input syntax to allow ``````std_binormal_lcdf(mu1, mu2 | rho); `````` where both `mu1` and `mu2` are containers. But that's a ton of work.
Contributor

### bgoodri commented Aug 22, 2018

 matrix with 2 columns? … On Wed, Aug 22, 2018 at 10:26 AM Bob Carpenter ***@***.***> wrote: Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way. Even understanding the problem, I'm still opposed to violating statistical convention with something like this: std_binormal_lcdf(mu1 | mu2, rho); I'd be happier with something like this: std_binormal_lcdf({ mu1, mu2 } | rho); but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work. We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this: std_binormal_lcdf(append_col(mu1, mu2) | rho); or it's transpose append_row(mu1', mu2'). I could live with this: std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho)); The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy. Or I want to generalize our input syntax to allow std_binormal_lcdf(mu1, mu2 | rho); where both mu1 and mu2 are containers. But that's a ton of work. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2356 (comment)>, or mute the thread .
Contributor

### rtrangucci commented Aug 22, 2018

 Yeah, I agree with you that generalizing the syntax doesn't make sense for this case. In the case we've settled on: ``````std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho)); `````` are you proposing we add a function `to_array_of_2vectors(reals y1, reals y2)` to the language? Won't that be the same internally as the Stan code: ``````vector[2] mu[N]; for (n in 1:N) mu[n] = [mu1[n], mu2[n]]'; `````` ? If so, that's fine. I'll move forward with updating the lcdf and lpdf to accept 2-vectors as the argument and we can open a separate issue for `to_array_of_2vectors`.
Contributor

### rtrangucci commented Aug 22, 2018

 One further point, then I'll go work on the function. All of the math internal to the `std_binormal_lpdf` and `std_binormal_lcdf` function (the log-density/probability and the gradients) operates on scalars) so we'll be converting to 2-vectors only to convert to scalars internally.
Contributor

### bob-carpenter commented Aug 22, 2018

 Yes. I'm proposing adding: ``````vector[] to_array_of_2vectors(vector x, vector y); `````` where the two vectors have the same size and ``````to_array_of_2vectors(x, y)[i, 1] == x[i] to_array_of_2vectors(x, y)[i, 2] == y[i] `````` But it's not very elegant, to say the least. I could be convinced that the matrix-with-two-columns approach is the way to go, as that'd just be ``````append_col(x, y) ~ ... `````` It's just that we decided that it'd be confusing to read vectorization out of rows or columns of matrices.
Contributor

### bob-carpenter commented Aug 22, 2018

 I'm only concerned about the external client interface, not how it's implemented. But having said that, if it's all scalars under the hood, how will vectorization help? I'd like to hear from @mitzimorris and @seantalts and @VMatthijs on this whole discussion.
Contributor

### rtrangucci commented Aug 22, 2018 • edited

 Both `std_binormal_lpdf` and `std_binormal_lcdf` are implemented nearly identically to `normal_lpdf`: https://github.com/stan-dev/math/blob/feature/issue-962-bivar-norm/stan/math/prim/scal/prob/std_binormal_lcdf.hpp
Contributor

### bob-carpenter commented Aug 22, 2018

 I take it you meant `binormal` and not `normal` in those first two!
Contributor

### rtrangucci commented Aug 22, 2018

 Yes, you're right! Good catch, and I've edited the comment. Upon a bit of reflection, it probably won't be much of a performance hit to just take in a list of 2-vectors given that we have to pull out. I'll use the vector_seq_view metaprogram instead of scalar_seq_view, and just pull the scalar elements out of each the N 2-vectors.