New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bivariate normal cdf #2356

Open
bob-carpenter opened this Issue Jul 14, 2017 · 15 comments

Comments

Projects
None yet
4 participants
@bob-carpenter
Contributor

bob-carpenter commented Jul 14, 2017

Summary:

Add a function to compute the bivariate normal cdf.

Description:

For size 2 vectors y and mu and 2 x 2 covariance matrix Sigma, compute

bivar_normal_cdf(y, mu, Sigma)

  = INT_{u < y}  multinormal_pdf(u, mu, Sigma) d.u

where u is also a 2-vector and u < y is defined componentwise as u[1] < y[1] && u[2] < y[2].

There could also be a parameterization where covariance is given in terms of two scale parameters and a correlation parameter.

Additional Info:

This paper is often cited for an algorithm:

  • Boys, R.J., 1989. Algorithm AS R80: A remark on Algorithm AS 76: An integral useful in calculating noncentral t and bivariate normal probabilities. Journal of the Royal Statistical Society. Series C (Applied Statistics), 38(3), pp.580-582. https://www.jstor.org/stable/2347755

Current Version:

v2.16.0

@bob-carpenter bob-carpenter added this to the Future milestone Jul 14, 2017

@bob-carpenter bob-carpenter changed the title from bivariate normal cdfs to bivariate normal cdf Jul 14, 2017

@bgoodri

This comment has been minimized.

Show comment
Hide comment
@bgoodri

bgoodri Jul 14, 2017

Contributor
Contributor

bgoodri commented Jul 14, 2017

@bgoodri

This comment has been minimized.

Show comment
Hide comment
@bgoodri

bgoodri Aug 14, 2017

Contributor

This article seems to have reversed the ratios in the numerators for the calculation of a1 and a2. It also does not deal with the 0 / 0 case. Here is a correct Stan function for it

  real binormal_cdf(real z1, real z2, real rho) {
    if (z1 != 0 || z2 != 0) {
      real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
      real a1 = (z2 / z1 - rho) / denom;
      real a2 = (z1 / z2 - rho) / denom;
      real product = z1 * z2;
      real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
      return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
    }
    return 0.25 + asin(rho) / (2 * pi());
  }
Contributor

bgoodri commented Aug 14, 2017

This article seems to have reversed the ratios in the numerators for the calculation of a1 and a2. It also does not deal with the 0 / 0 case. Here is a correct Stan function for it

  real binormal_cdf(real z1, real z2, real rho) {
    if (z1 != 0 || z2 != 0) {
      real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
      real a1 = (z2 / z1 - rho) / denom;
      real a2 = (z1 / z2 - rho) / denom;
      real product = z1 * z2;
      real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
      return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
    }
    return 0.25 + asin(rho) / (2 * pi());
  }
@bgoodri

This comment has been minimized.

Show comment
Hide comment
@bgoodri

bgoodri Aug 14, 2017

Contributor

There is also https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2924071 , which might be better for the log-CDF.

Contributor

bgoodri commented Aug 14, 2017

There is also https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2924071 , which might be better for the log-CDF.

@bgoodri

This comment has been minimized.

Show comment
Hide comment
@bgoodri

bgoodri Aug 14, 2017

Contributor

For completeness, I perhaps should have added that if rho = 1, then the bivariate CDF is min(Phi(z1), Phi(z2)) and if rho = -1, it is Phi(z1) + Phi(z2) - 1.

Contributor

bgoodri commented Aug 14, 2017

For completeness, I perhaps should have added that if rho = 1, then the bivariate CDF is min(Phi(z1), Phi(z2)) and if rho = -1, it is Phi(z1) + Phi(z2) - 1.

@khakieconomics

This comment has been minimized.

Show comment
Hide comment
@khakieconomics

khakieconomics Sep 26, 2017

This is extremely handy. Thanks @bgoodri

khakieconomics commented Sep 26, 2017

This is extremely handy. Thanks @bgoodri

@rtrangucci

This comment has been minimized.

Show comment
Hide comment
@rtrangucci

rtrangucci Aug 22, 2018

Contributor

@bob-carpenter Per Ben's comment about doing scalar inputs rather than 2-vectors:

If we're set on 2-vectors, I'd like to expose the std_binormal_integral(real y1, real y2, real rho) as a function that takes scalars. The usage pattern for the std_binormal_lcdf function really doesn't lend itself to a vector input, but rather two scalar inputs. We'll usually be using the std_binormal_lcdf in something like a bivariate probit, where [y1, y2]' is a vector of vars. So the Stan code would need to do something like:

data {
  int N;
  int K1;
  int K2;
  matrix[N, K1] X1;
  matrix[N, K1] X2;
}
parameters {
  real<lower=-1, upper=1> rho;
  vector[K1] beta1;
  vector[K2] beta2;
}
model {
  vector[N] mu1 = X1 * beta1;
  vector[N] mu2 = X2 * beta2;
  for (n in 1:N)
    target += std_binormal_lcdf([mu1[n], mu2[n]]' | rho);
  ...

where we lose the advantage of the vectorization. In order to get the benefit of vectorization (i.e. caching the rho value that is repeated for every observation), we'd need to instead code the model like:

model {
...
  vector[2] model_mu[N];
  for (n in 1:N) {
    model_mu[n] = [mu1[n], mu2[n]]';
  }
  target += std_binormal_lcdf(model_mu | rho);
  ...

This looks funny to me, and seems inefficient, because we won't be respecting the memory locality in mu1 and mu2 and we need an extra loop. Here's the way it would work with the way I've coded it now:

model {
...
  target += std_binormal_lcdf(mu1 | mu2, rho);
  ...

That also looks strange to me given the placement of the conditioning bar. Maybe allowing for matrix inputs would make sense? something like

model {
  matrix[N, 2] mu;
  mu[,1] = X1 * beta1;
  mu[,2] = X2 * beta2;
  target += std_binormal_lcdf(mu | rho);
  ...

That I could get behind, and it would perhaps speed up internal computations.

Contributor

rtrangucci commented Aug 22, 2018

@bob-carpenter Per Ben's comment about doing scalar inputs rather than 2-vectors:

If we're set on 2-vectors, I'd like to expose the std_binormal_integral(real y1, real y2, real rho) as a function that takes scalars. The usage pattern for the std_binormal_lcdf function really doesn't lend itself to a vector input, but rather two scalar inputs. We'll usually be using the std_binormal_lcdf in something like a bivariate probit, where [y1, y2]' is a vector of vars. So the Stan code would need to do something like:

data {
  int N;
  int K1;
  int K2;
  matrix[N, K1] X1;
  matrix[N, K1] X2;
}
parameters {
  real<lower=-1, upper=1> rho;
  vector[K1] beta1;
  vector[K2] beta2;
}
model {
  vector[N] mu1 = X1 * beta1;
  vector[N] mu2 = X2 * beta2;
  for (n in 1:N)
    target += std_binormal_lcdf([mu1[n], mu2[n]]' | rho);
  ...

where we lose the advantage of the vectorization. In order to get the benefit of vectorization (i.e. caching the rho value that is repeated for every observation), we'd need to instead code the model like:

model {
...
  vector[2] model_mu[N];
  for (n in 1:N) {
    model_mu[n] = [mu1[n], mu2[n]]';
  }
  target += std_binormal_lcdf(model_mu | rho);
  ...

This looks funny to me, and seems inefficient, because we won't be respecting the memory locality in mu1 and mu2 and we need an extra loop. Here's the way it would work with the way I've coded it now:

model {
...
  target += std_binormal_lcdf(mu1 | mu2, rho);
  ...

That also looks strange to me given the placement of the conditioning bar. Maybe allowing for matrix inputs would make sense? something like

model {
  matrix[N, 2] mu;
  mu[,1] = X1 * beta1;
  mu[,2] = X2 * beta2;
  target += std_binormal_lcdf(mu | rho);
  ...

That I could get behind, and it would perhaps speed up internal computations.

@bob-carpenter

This comment has been minimized.

Show comment
Hide comment
@bob-carpenter

bob-carpenter Aug 22, 2018

Contributor

Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way.

Even understanding the problem, I'm still opposed to violating statistical convention with something like this:

std_binormal_lcdf(mu1 | mu2, rho);

I'd be happier with something like this:

std_binormal_lcdf({ mu1, mu2 } | rho);

but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work.

We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this:

std_binormal_lcdf(append_col(mu1, mu2) | rho);

or it's transpose append_row(mu1', mu2').

I could live with this:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy.

Or I want to generalize our input syntax to allow

std_binormal_lcdf(mu1, mu2 | rho);

where both mu1 and mu2 are containers. But that's a ton of work.

Contributor

bob-carpenter commented Aug 22, 2018

Thanks. That makes the problem clear. Are you really going to have different number of predictors for the two entries so you can't just make it matrix multiplication? Anyway, it's the same problem either way.

Even understanding the problem, I'm still opposed to violating statistical convention with something like this:

std_binormal_lcdf(mu1 | mu2, rho);

I'd be happier with something like this:

std_binormal_lcdf({ mu1, mu2 } | rho);

but that's a 2-array of K-vectors rather than the K-array of 2-vectors that it should be to match the rest of our vectorizations. So that won't work.

We've decided in the past that having matrixes pack up vectorizations would be too confusing, or we could do something like this:

std_binormal_lcdf(append_col(mu1, mu2) | rho);

or it's transpose append_row(mu1', mu2').

I could live with this:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

The memory locality's not an issue as you're just walking down two vectors, so everything's local on both sides. It is some minimal overhead to allocate and copy.

Or I want to generalize our input syntax to allow

std_binormal_lcdf(mu1, mu2 | rho);

where both mu1 and mu2 are containers. But that's a ton of work.

@bgoodri

This comment has been minimized.

Show comment
Hide comment
@bgoodri

bgoodri Aug 22, 2018

Contributor
Contributor

bgoodri commented Aug 22, 2018

@rtrangucci

This comment has been minimized.

Show comment
Hide comment
@rtrangucci

rtrangucci Aug 22, 2018

Contributor

Yeah, I agree with you that generalizing the syntax doesn't make sense for this case. In the case we've settled on:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

are you proposing we add a function to_array_of_2vectors(reals y1, reals y2) to the language? Won't that be the same internally as the Stan code:

vector[2] mu[N];
for (n in 1:N)
  mu[n] = [mu1[n], mu2[n]]';

?

If so, that's fine. I'll move forward with updating the lcdf and lpdf to accept 2-vectors as the argument and we can open a separate issue for to_array_of_2vectors.

Contributor

rtrangucci commented Aug 22, 2018

Yeah, I agree with you that generalizing the syntax doesn't make sense for this case. In the case we've settled on:

std_binomial_lcdf(to_array_of_2vectors(mu1, mu2) | rho));

are you proposing we add a function to_array_of_2vectors(reals y1, reals y2) to the language? Won't that be the same internally as the Stan code:

vector[2] mu[N];
for (n in 1:N)
  mu[n] = [mu1[n], mu2[n]]';

?

If so, that's fine. I'll move forward with updating the lcdf and lpdf to accept 2-vectors as the argument and we can open a separate issue for to_array_of_2vectors.

@rtrangucci

This comment has been minimized.

Show comment
Hide comment
@rtrangucci

rtrangucci Aug 22, 2018

Contributor

One further point, then I'll go work on the function. All of the math internal to the std_binormal_lpdf and std_binormal_lcdf function (the log-density/probability and the gradients) operates on scalars) so we'll be converting to 2-vectors only to convert to scalars internally.

Contributor

rtrangucci commented Aug 22, 2018

One further point, then I'll go work on the function. All of the math internal to the std_binormal_lpdf and std_binormal_lcdf function (the log-density/probability and the gradients) operates on scalars) so we'll be converting to 2-vectors only to convert to scalars internally.

@bob-carpenter

This comment has been minimized.

Show comment
Hide comment
@bob-carpenter

bob-carpenter Aug 22, 2018

Contributor

Yes. I'm proposing adding:

vector[] to_array_of_2vectors(vector x, vector y);

where the two vectors have the same size and

to_array_of_2vectors(x, y)[i, 1] == x[i]
to_array_of_2vectors(x, y)[i, 2] == y[i]

But it's not very elegant, to say the least.

I could be convinced that the matrix-with-two-columns approach is the way to go, as that'd just be

append_col(x, y) ~ ...

It's just that we decided that it'd be confusing to read vectorization out of rows or columns of matrices.

Contributor

bob-carpenter commented Aug 22, 2018

Yes. I'm proposing adding:

vector[] to_array_of_2vectors(vector x, vector y);

where the two vectors have the same size and

to_array_of_2vectors(x, y)[i, 1] == x[i]
to_array_of_2vectors(x, y)[i, 2] == y[i]

But it's not very elegant, to say the least.

I could be convinced that the matrix-with-two-columns approach is the way to go, as that'd just be

append_col(x, y) ~ ...

It's just that we decided that it'd be confusing to read vectorization out of rows or columns of matrices.

@bob-carpenter

This comment has been minimized.

Show comment
Hide comment
@bob-carpenter

bob-carpenter Aug 22, 2018

Contributor

I'm only concerned about the external client interface, not how it's implemented.

But having said that, if it's all scalars under the hood, how will vectorization help?

I'd like to hear from @mitzimorris and @seantalts and @VMatthijs on this whole discussion.

Contributor

bob-carpenter commented Aug 22, 2018

I'm only concerned about the external client interface, not how it's implemented.

But having said that, if it's all scalars under the hood, how will vectorization help?

I'd like to hear from @mitzimorris and @seantalts and @VMatthijs on this whole discussion.

@rtrangucci

This comment has been minimized.

Show comment
Hide comment
@rtrangucci

rtrangucci Aug 22, 2018

Contributor

Both std_binormal_lpdf and std_binormal_lcdf are implemented nearly identically to normal_lpdf: https://github.com/stan-dev/math/blob/feature/issue-962-bivar-norm/stan/math/prim/scal/prob/std_binormal_lcdf.hpp

Contributor

rtrangucci commented Aug 22, 2018

Both std_binormal_lpdf and std_binormal_lcdf are implemented nearly identically to normal_lpdf: https://github.com/stan-dev/math/blob/feature/issue-962-bivar-norm/stan/math/prim/scal/prob/std_binormal_lcdf.hpp

@bob-carpenter

This comment has been minimized.

Show comment
Hide comment
@bob-carpenter

bob-carpenter Aug 22, 2018

Contributor

I take it you meant binormal and not normal in those first two!

Contributor

bob-carpenter commented Aug 22, 2018

I take it you meant binormal and not normal in those first two!

@rtrangucci

This comment has been minimized.

Show comment
Hide comment
@rtrangucci

rtrangucci Aug 22, 2018

Contributor

Yes, you're right! Good catch, and I've edited the comment.

Upon a bit of reflection, it probably won't be much of a performance hit to just take in a list of 2-vectors given that we have to pull out. I'll use the vector_seq_view metaprogram instead of scalar_seq_view, and just pull the scalar elements out of each the N 2-vectors.

Contributor

rtrangucci commented Aug 22, 2018

Yes, you're right! Good catch, and I've edited the comment.

Upon a bit of reflection, it probably won't be much of a performance hit to just take in a list of 2-vectors given that we have to pull out. I'll use the vector_seq_view metaprogram instead of scalar_seq_view, and just pull the scalar elements out of each the N 2-vectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment