Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add orthogonal parameterization of gamma distribution #3040

Open
spinkney opened this issue Mar 26, 2024 · 0 comments
Open

Add orthogonal parameterization of gamma distribution #3040

spinkney opened this issue Mar 26, 2024 · 0 comments

Comments

@spinkney
Copy link
Collaborator

Description

The current parameterization of the gamma distribution results in correlated parameters that bias estimates when doing gamma regression. An alternative parameterization was discussed on the forums at https://discourse.mc-stan.org/t/posterior-estimates-of-rate-and-shape-of-gamma-distribution-are-dependent/3220/14?u=spinkney.

I decided to look at 3 of the parameterizations using this problem on the forums, https://discourse.mc-stan.org/t/gamma-regression-in-stan-vs-frequentist-approach/16274/11.

  1. Current Stan parameterization
  2. The parameterization suggested by Max Mantei in the 2nd thread above
  3. The parameterization suggested by JohnnyZoom4H in the 1st thread above

Frequentist Fit for comparison given in the forums

                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.1812      0.005     33.899      0.000       0.171       0.192
x2             0.0485      0.005      9.632      0.000       0.039       0.058
x3            -0.0600      0.006     -9.760      0.000      -0.072      -0.048
x4            -0.0218      0.008     -2.753      0.006      -0.037      -0.006
const         13.7931      0.019    719.882      0.000      13.756      13.831

All models were run on a mac m1promax using cmdstanr with 4 parallel chains and everything else default.

Model 1: Current Stan parameterization
Notes: Lots of warnings during warmup about Shape parameter[2] is inf
Total execution time: 1.0 seconds

data {
  int N; //the number of observations
  int K; //the number of columns in the model matrix
  vector[N] y; //the response
  matrix[N,K] X; //the model matrix
}
parameters {
  vector[K] betas; //the regression parameters
  real<lower=0> phi; //the variance parameter
}
transformed parameters {
  vector[N] mu = exp(X * betas); //the expected values (linear predictor)
  vector[N] alpha = mu .* mu / phi; //shape parameter for the gamma distribution
  vector[N] beta = mu / phi; //rate parameter for the gamma distribution
}
model {  
  betas ~ normal(0, 4); 

  y ~ gamma(alpha,beta);
}

Fit of betas

# A tibble: 5 × 10
  variable    mean  median      sd     mad      q5     q95  rhat ess_bulk ess_tail
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>    <dbl>    <dbl>
1 betas[1]  0.177   0.177  0.00418 0.00412  0.171   0.184   1.00    1888.    2375.
2 betas[2]  0.0469  0.0469 0.00385 0.00387  0.0404  0.0534  1.00    4099.    3191.
3 betas[3] -0.0692 -0.0691 0.00452 0.00461 -0.0766 -0.0620  1.00    4077.    2936.
4 betas[4] -0.0228 -0.0228 0.00462 0.00463 -0.0303 -0.0151  1.00    3902.    2627.
5 betas[5] 13.8    13.8    0.0202  0.0205  13.8    13.8     1.00    1774.    1851.

Model 2: Max Mantei's suggested parameterization
Notes: Lots of warnings during wamup about shape parameter is 0, but must be positive finite.
Total execution time: 0.6 seconds

data {
  int N; //the number of observations
  int K; //the number of columns in the model matrix
  vector[N] y; //the response
  matrix[N,K] X; //the model matrix
}
parameters {
  vector[K] betas; //the regression parameters
  real<lower=0> invphi; //the variance parameter
}
transformed parameters {
  vector[N] mu = exp(X * betas); //the expected values (linear predictor)
  vector[N] beta = invphi / mu ; //shape parameter for the gamma distribution
}
model {  
  betas ~ normal(0, 4); //prior for the intercept following Gelman 2008
  invphi ~ exponential(1);
  y ~ gamma(invphi,beta);
}

Fit of betas

# A tibble: 5 × 10
  variable    mean  median      sd     mad      q5      q95  rhat ess_bulk ess_tail
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>    <dbl> <dbl>    <dbl>    <dbl>
1 betas[1]  0.181   0.181  0.00543 0.00551  0.173   0.190    1.00    5011.    3282.
2 betas[2]  0.0483  0.0483 0.00520 0.00531  0.0397  0.0566   1.00    4159.    3110.
3 betas[3] -0.0599 -0.0601 0.00598 0.00595 -0.0697 -0.0501   1.00    5119.    3138.
4 betas[4] -0.0218 -0.0218 0.00843 0.00854 -0.0358 -0.00810  1.00    4824.    3090.
5 betas[5] 13.8    13.8    0.0198  0.0197  13.8    13.8      1.00    4208.    2571.

Model 3: JohnnyZoom4H's suggested parameterization
Notes: No warnings!
Total execution time: 0.4 seconds

functions {
  real johnnys_gamma_lpdf(vector x, real tau, vector mu) {
    int N = num_elements(x);
    
    return (tau - 1) * sum(log(x)) + tau * sum(log(tau) - log(mu)) - N * lgamma(tau) - sum(x * tau ./ mu);
  }
}
data {
  int N; //the number of observations
  int K; //the number of columns in the model matrix
  vector[N] y; //the response
  matrix[N,K] X; //the model matrix
}
parameters {
  vector[K] betas; //the regression parameters
  real<lower=0> tau; //the variance parameter
}
transformed parameters {
  vector[N] mu = exp(X * betas); //the expected values (linear predictor)
}
model {  
  betas ~ normal(0, 4); //prior for the intercept following Gelman 2008
  tau ~ exponential(1);
  y ~ johnnys_gamma(tau, mu);
}

Fit of betas

  variable    mean  median      sd     mad      q5      q95  rhat ess_bulk ess_tail
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>    <dbl> <dbl>    <dbl>    <dbl>
1 betas[1]  0.181   0.181  0.00552 0.00554  0.172   0.190    1.00    4753.    3146.
2 betas[2]  0.0484  0.0483 0.00514 0.00518  0.0400  0.0569   1.00    4772.    3333.
3 betas[3] -0.0599 -0.0600 0.00619 0.00612 -0.0701 -0.0497   1.00    5290.    3082.
4 betas[4] -0.0216 -0.0216 0.00845 0.00831 -0.0356 -0.00794  1.00    5227.    3273.
5 betas[5] 13.8    13.8    0.0198  0.0197  13.8    13.8      1.00    4511.    3077.

Summary

Clearly, parameterizations 2 and 3 are superior for this regression. I believe 3 is the best and when coded in cpp and given derivatives may increase efficiency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant