Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Gaussian Process Models in Stan Reference Manual #16

Closed
drezap opened this issue Aug 14, 2018 · 2 comments
Closed

Update Gaussian Process Models in Stan Reference Manual #16

drezap opened this issue Aug 14, 2018 · 2 comments
Milestone

Comments

@drezap
Copy link

drezap commented Aug 14, 2018

Summary:

There are some of GP models in the Example Models/Gaussian Process Section of the Reference Manual that are invalid, and there also need be updated for some of the new kernels.

Description:

I will make a list of things I've noticed, and this is in no way designed to be comprehensive, and I will update this post as I go:

  1. The joint distribution for unobserved y is contained in the total covariance matrix. On Page 259/260, we have something like the following:
transformed data {
  real delta = 1e-9;
  int<lower=1> N = N1 + N2;
  real x[N];
  for (n1 in 1:N1) x[n1] = x1[n1];
  for (n2 in 1:N2) x[N1 + n2] = x2[n2];
}

This does not make sense. We simply need two x vectors: x and x_pred, where x_pred are the out of sample predictions. If we take

generated quantities {
  vector[N2] y2;
  for (n2 in 1:N2)
    y2[n2] = normal_rng(f[N1 + n2], sigma);
}

then we generate predictions for indeces greater than N1 that are essentially just normal random variates, and we are incorporating nothing in we've approximated in the model. Another note, since we have a gaussian likelihood, we do not need the latent f and instead can use y directly. We only need latent f in generated quanties when the likelihood is non-gaussian.

Instead, we use matrix algebra (i.e. using the posterior predictive mean function and posterior predictive variance, and then the data and generated quantities blocks can look the same for all models and look something like this (for ARD/seperate length scale):

data {
  int<lower=1> N;
  int<lower=1> D;
  vector[D] x[N];
  int<lower=0,upper=1> y[N];

  int<lower=1> N_pred;
  vector[D] x_pred[N_pred];
}
parameters {
  real<lower=0> magnitude;
  real<lower=0> length_scale[D];
  vector[N] eta;
}

assuming we generate the predictive posterior correctly, and there is an example below.

  1. I'm also keen on generating out of sample and in sample predictions in my generated quantities block. For binary classifier, assuming we've generated the latent f* properly (using f*, following GPML notation), this is as follows:
generated quantities {
  vector[N_pred] f_pred = gp_pred_rng(x_pred, f, x, magnitude, length_scale);
  int y_pred[N_pred];
  int y_pred_in[N];
  
  for (n in 1:N) y_pred_in[n] = bernoulli_logit_rng(f[n]); // in sample prediction
  for (n in 1:N_pred) y_pred[n] = bernoulli_logit_rng(f_pred[n]); // out of sample predictions
}
  1. We also need note that the posterior predictive can based on likelihood or noise model we're assuming, and also on the covariance function. For example, in the binary classifier, logit example, we only need the mean function (Also note, I'm using the mean function without noisy predictions):
functions {
  vector gp_pred_rng(vector[] x_pred,
                     vector y1, vector[] x,
                     real magnitude, real[] length_scale) {
                     ) {
    int N = rows(y1);
    int N_pred = size(x_pred);
    vector[N_pred] f2;
    {
      matrix[N, N] K = gp_exp_quad_cov(x, magnitude, length_scale);
      matrix[N, N] L_K = cholesky_decompose(K);
      vector[N] L_K_div_y1 = mdivide_left_tri_low(L_K, y1);
      vector[N] K_div_y1 = mdivide_right_tri_low(L_K_div_y1', L_K)';
      matrix[N, N_pred] k_x_x_pred = gp_exp_quad_cov(x, x_pred, magnitude, length_scale);
      f2 = (k_x_x_pred' * K_div_y1);
    }
    return f2;
  }
}

This wasn't as organized as I'd hoped but it hits on some points.

Reproducible Steps:

If you copy and paste some of the notation in the Stan manual and plot the in sample predictive distribution, you will see what I'm talking about.

Current Version:

v2.18.0

@drezap
Copy link
Author

drezap commented Aug 14, 2018

Hi -

I'm trying to locate Example Models section 18 Gaussian Process Models. I've gone through some of the files in stan/src/docs/reference-manual/ sequentially, but I've had no luck. Where can I find this section so I can do a pull request? Thanks!

@rtrangucci
Copy link

rtrangucci commented Aug 16, 2018

@drezap great point on points 2 and 3 for the prediction function, we should add this code to the manual. Just a note, the code in the user guide for the predictions using the latent functions isn't wrong, it's just not as efficient as it could be, as you rightly point out. Re the form for Gaussian models see pages 152 to 154 in the new guide. I wrote the section in the user guide to be more pedagogical, but I can see an argument for not including any inefficient code in the manual, even if it's used as a building block for later more efficient code.

@mitzimorris mitzimorris transferred this issue from stan-dev/stan Dec 23, 2018
@mitzimorris mitzimorris added this to the 2.18.++ milestone Jan 25, 2019
@drezap drezap closed this as completed Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants