Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recasts to a non-linear model #42

Merged
merged 7 commits into from
Nov 30, 2021
Merged

Recasts to a non-linear model #42

merged 7 commits into from
Nov 30, 2021

Conversation

seabbs
Copy link
Contributor

@seabbs seabbs commented Jun 24, 2021

This PR recasts as a non-linear model and adds uncertainty for Rt by assuming they are normally distributed. Not sure this is strictly better than what was there previously but interesting perhaps as there is lots of scope for using this for other problems (i.e this is close to a Rt model implemented on top of brms). I also looked at adding uncertainty on sgft proportion using a dummy stanadard error but this caused some fairly major fittings issues (maybe could do this using a seperate equation to fit to sgft positive tests).

Noticed the normalize argument which is new to stan 2.2.5 and allows dropping constants (i.e what I think using the ~ approach does). .devcontainer stuff is the automated docker stuff from vscode I am trying out but if this is being merged can be gitignored.

Not entirely clearl if this branch is up to date and or what the most recent data is so model fit is largely for show.

Model formula

model <- bf(
  rt_mean | se(rt_sd, sigma = TRUE) ~ (1 + var) * exp(logRt),
  var ~ 0 + prop_sgtf,
  logRt ~ 1,
  family = student(),
  nl = TRUE
  # this is the change that allows future fancy stuff
  # but as it stands tries to do a dot product :( (may need to open an issue)
  #loop = FALSE
)

priors <- c(prior(normal(0, 0.5), nlpar = "var"),
            prior(student_t(3, 0, 0.5), nlpar = "logRt"))

fit <- brm(
  model,
  data = utla_rt_with_covariates %>%
   rename(rt_mean = rt_mean_long_gt,
          rt_sd = rt_sd_long_gt),
  prior = priors,
  control = list(adapt_delta = 0.95))

Generated stan:

// generated with brms 2.15.0
functions {
  /* compute the logm1 link 
   * Args: 
   *   p: a positive scalar
   * Returns: 
   *   a scalar in (-Inf, Inf)
   */ 
   real logm1(real y) { 
     return log(y - 1);
   }
  /* compute the inverse of the logm1 link 
   * Args: 
   *   y: a scalar in (-Inf, Inf)
   * Returns: 
   *   a positive scalar
   */ 
   real expp1(real y) { 
     return exp(y) + 1;
   }
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  vector<lower=0>[N] se;  // known sampling error
  int<lower=1> K_var;  // number of population-level effects
  matrix[N, K_var] X_var;  // population-level design matrix
  int<lower=1> K_logRt;  // number of population-level effects
  matrix[N, K_logRt] X_logRt;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  vector<lower=0>[N] se2 = square(se);
}
parameters {
  vector[K_var] b_var;  // population-level effects
  vector[K_logRt] b_logRt;  // population-level effects
  real<lower=0> sigma;  // residual SD
  real<lower=1> nu;  // degrees of freedom or shape
}
transformed parameters {
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] nlp_var = X_var * b_var;
    // initialize linear predictor term
    vector[N] nlp_logRt = X_logRt * b_logRt;
    // initialize non-linear predictor term
    vector[N] mu;
    for (n in 1:N) {
      // compute non-linear predictor values
      mu[n] = (1 + nlp_var[n]) * exp(nlp_logRt[n]);
    }
    target += student_t_lpdf(Y | nu, mu, sqrt(square(sigma) + se2));
  }
  // priors including constants
  target += normal_lpdf(b_var | 0, 0.5);
  target += student_t_lpdf(b_logRt | 3, 0, 0.5);
  target += student_t_lpdf(sigma | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += gamma_lpdf(nu | 2, 0.1)
    - 1 * gamma_lccdf(1 | 2, 0.1);
}
generated quantities {
}

Basic fit with no predictors:

 Family: student 
  Links: mu = identity; sigma = identity; nu = identity 
Formula: rt_mean | se(rt_sd, sigma = TRUE) ~ (1 + var) * exp(logRt) 
         var ~ 0 + prop_sgtf
         logRt ~ 1
   Data: utla_rt_with_covariates %>% rename(rt_mean = rt_me (Number of observations: 2194) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
var_prop_sgtf       0.11      0.01     0.09     0.14 1.00     2184     2383
logRt_Intercept     0.03      0.01     0.02     0.04 1.00     2240     2600

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.21      0.00     0.21     0.22 1.00     2603     2722
nu       41.44     15.14    20.58    79.98 1.00     2575     2642

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

@seabbs
Copy link
Contributor Author

seabbs commented Jun 25, 2021

I went a bit extra (closing the stable door after the horse has bolted) and extended this to include the proportion SGTF as a latent variable (learning some interesting brms hacks along the way). Quite a big impact on run time but still within a feasible range I think. I am fairly confident this is the correct approach but its certainly not something people seem to be doing that much of (except here: paul-buerkner/brms#304).

Code:

Model:

// generated with brms 2.15.0
functions {
  /* compute the logm1 link 
   * Args: 
   *   p: a positive scalar
   * Returns: 
   *   a scalar in (-Inf, Inf)
   */ 
   real logm1(real y) { 
     return log(y - 1);
   }
  /* compute the inverse of the logm1 link 
   * Args: 
   *   y: a scalar in (-Inf, Inf)
   * Returns: 
   *   a positive scalar
   */ 
   real expp1(real y) { 
     return exp(y) + 1;
   }
}
data {
  int<lower=1> N;  // total number of observations
  int<lower=1> N_sgtfsamples;  // number of observations
  int Y_sgtfsamples[N_sgtfsamples];  // response variable
  int trials_sgtfsamples[N_sgtfsamples];  // number of trials
  int<lower=1> Ksp_sgtfsamples;  // number of special effects terms
  int<lower=1> N_sgtf;  // number of observations
  vector[N_sgtf] Y_sgtf;  // response variable
  int<lower=0> Nmi_sgtf;  // number of missings
  int<lower=1> Jmi_sgtf[Nmi_sgtf];  // positions of missings
  int<lower=1> N_rtmean;  // number of observations
  vector[N_rtmean] Y_rtmean;  // response variable
  vector<lower=0>[N_rtmean] se_rtmean;  // known sampling error
  int<lower=1> Ksp_rtmean_var;  // number of special effects terms
  int<lower=1> K_rtmean_logRt;  // number of population-level effects
  matrix[N_rtmean, K_rtmean_logRt] X_rtmean_logRt;  // population-level design matrix
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  vector<lower=0>[N_rtmean] se2_rtmean = square(se_rtmean);
}
parameters {
  vector<lower=0,upper=1>[Nmi_sgtf] Ymi_sgtf;  // estimated missings
  vector[Ksp_rtmean_var] bsp_rtmean_var;  // special effects coefficients
  vector[K_rtmean_logRt] b_rtmean_logRt;  // population-level effects
  real<lower=0> sigma_rtmean;  // residual SD
  real<lower=1> nu_rtmean;  // degrees of freedom or shape
}
transformed parameters {
  vector[Ksp_sgtfsamples] bsp_sgtfsamples;  // special effects coefficients
  real Intercept_sgtf;  // temporary intercept for centered predictors
  real<lower=0> phi_sgtf;  // precision parameter
  bsp_sgtfsamples = rep_vector(1, rows(bsp_sgtfsamples));
  Intercept_sgtf = 1;
  phi_sgtf = 1;
}
model {
  // likelihood including constants
  if (!prior_only) {
    // vector combining observed and missing responses
    vector[N_sgtf] Yl_sgtf = Y_sgtf;
    // initialize linear predictor term
    vector[N_sgtfsamples] mu_sgtfsamples = rep_vector(0.0, N_sgtfsamples);
    // initialize linear predictor term
    vector[N_sgtf] mu_sgtf = Intercept_sgtf + rep_vector(0.0, N_sgtf);
    // initialize linear predictor term
    vector[N_rtmean] nlp_rtmean_var = rep_vector(0.0, N_rtmean);
    // initialize linear predictor term
    vector[N_rtmean] nlp_rtmean_logRt = X_rtmean_logRt * b_rtmean_logRt;
    // initialize non-linear predictor term
    vector[N_rtmean] mu_rtmean;
    Yl_sgtf[Jmi_sgtf] = Ymi_sgtf;
    for (n in 1:N_sgtfsamples) {
      // add more terms to the linear predictor
      mu_sgtfsamples[n] += (bsp_sgtfsamples[1]) * Yl_sgtf[n];
    }
    for (n in 1:N_rtmean) {
      // add more terms to the linear predictor
      nlp_rtmean_var[n] += (bsp_rtmean_var[1]) * Yl_sgtf[n];
    }
    for (n in 1:N_sgtf) {
      // apply the inverse link function
      mu_sgtf[n] = inv_logit(mu_sgtf[n]);
    }
    for (n in 1:N_rtmean) {
      // compute non-linear predictor values
      mu_rtmean[n] = (1 + nlp_rtmean_var[n]) * exp(nlp_rtmean_logRt[n]);
    }
    target += binomial_lpmf(Y_sgtfsamples | trials_sgtfsamples, mu_sgtfsamples);
    target += beta_lpdf(Yl_sgtf | mu_sgtf * phi_sgtf, (1 - mu_sgtf) * phi_sgtf);
    target += student_t_lpdf(Y_rtmean | nu_rtmean, mu_rtmean, sqrt(square(sigma_rtmean) + se2_rtmean));
  }
  // priors including constants
  target += normal_lpdf(bsp_rtmean_var | 0, 0.5);
  target += student_t_lpdf(b_rtmean_logRt | 3, 0, 0.5);
  target += student_t_lpdf(sigma_rtmean | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += gamma_lpdf(nu_rtmean | 2, 0.1)
    - 1 * gamma_lccdf(1 | 2, 0.1);
}
generated quantities {
  // actual population-level intercept
  real b_sgtf_Intercept = Intercept_sgtf;
}

Output (no covariates and winter 2020 data):

 Family: MV(binomial, beta, student) 
  Links: mu = identity
         mu = logit; phi = identity
         mu = identity; sigma = identity; nu = identity 
Formula: sgtf_samples | trials(samples) ~ 0 + mi(sgtf) 
         sgtf | mi() ~ 1 
         rt_mean | se(rt_sd, sigma = TRUE) ~ (1 + var) * exp(logRt) 
         var ~ 0 + mi(sgtf)
         logRt ~ 1
   Data: structure(list(week_infection = structure(c(18540, (Number of observations: 2194) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sgtf_Intercept             1.00      0.00     1.00     1.00 1.00     4000     4000
rtmean_logRt_Intercept     0.03      0.01     0.02     0.04 1.00     6830     2933
rtmean_var_misgtf          0.12      0.02     0.09     0.15 1.00     7278     2577
sgtfsamples_misgtf         1.00      0.00     1.00     1.00 1.00     4000     4000

Family Specific Parameters: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi_sgtf         1.00      0.00     1.00     1.00 1.00     4000     4000
sigma_rtmean     0.21      0.00     0.21     0.22 1.00     7000     3206
nu_rtmean       41.28     14.69    20.39    76.94 1.00     6992     3434

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

One of the cool things with this approach is you can prototype using the point estimates and then set them to NA in your data and model their uncertainty for the final model. Same fit without uncertainty (and 10% of the runtime):

 Family: MV(binomial, beta, student) 
  Links: mu = identity
         mu = logit; phi = identity
         mu = identity; sigma = identity; nu = identity 
Formula: sgtf_samples | trials(samples) ~ 0 + mi(sgtf) 
         sgtf | mi() ~ 1 
         rt_mean | se(rt_sd, sigma = TRUE) ~ (1 + var) * exp(logRt) 
         var ~ 0 + mi(sgtf)
         logRt ~ 1
   Data: structure(list(week_infection = structure(c(18540, (Number of observations: 2194) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sgtf_Intercept             1.00      0.00     1.00     1.00 1.00     4000     4000
rtmean_logRt_Intercept     0.03      0.01     0.02     0.04 1.00     2787     2774
rtmean_var_misgtf          0.11      0.01     0.08     0.14 1.00     2687     2192
sgtfsamples_misgtf         1.00      0.00     1.00     1.00 1.00     4000     4000

Family Specific Parameters: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi_sgtf         1.00      0.00     1.00     1.00 1.00     4000     4000
sigma_rtmean     0.21      0.00     0.21     0.22 1.00     3192     2444
nu_rtmean       40.97     14.26    20.72    76.13 1.00     2773     2913

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Base automatically changed from may-update to main July 27, 2021 15:42
Copy link
Contributor Author

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (self review)

@seabbs seabbs merged commit 8ab1189 into main Nov 30, 2021
@seabbs seabbs deleted the non-linear branch November 30, 2021 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant