Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing a known number of individuals from a specific compartment? #79

Closed
gavincotterill opened this issue Nov 8, 2018 · 3 comments
Closed
Assignees

Comments

@gavincotterill
Copy link

gavincotterill commented Nov 8, 2018

*edited: Found the answer to secondary issue of assigning start values for mif2 in 'pomp2'.

I have a 4-compartment model for a wildlife disease. All individuals are born susceptible (S), can become infected and infectious (I), then recover but remain test-positive (R1), and can eventually serorevert with retained immunity (R2). This subpopulation which I'm modeling had some individuals that were tested and killed. I've simplified the problem somewhat here, but this is the context for the question.

I'm treating births as known each year, and it would be convenient if I could treat these removals from I similarly. The issue is that the population is relatively small, and the number of removals can be high enough to deplete I, leading to negative or NAN values, which makes the MLE search impossible.

I'm using a discrete time model and trying to fit both a disease trend and the population trend simultaneously. I'm also modeling seroprevalence (rather than something like case number) which is the proportion of individuals in I and R1 from the total population. My data for this are the number of positive tests, pos, from the total number of tests, tot.

I'm working in pomp2.

library(pomp2)
library(tidyverse)
set.seed(594709947L)
# sample data = dat
   dyear pos   apparent dpop
1   2005   9 0.30000000  242
2   2006  58 0.33918129  263
3   2007  15 0.18518519  228
4   2008  20 0.12658228  249
5   2009   8 0.07017544  301
6   2010   7 0.04964539  253
7   2011   4 0.05555556  274
8   2012   9 0.12857143  317
9   2013   9 0.16363636  319
10  2014  12 0.16901408  332
11  2015  12 0.21428571  301
12  2016  22 0.32352941  369
13  2017  20 0.25641026  392
14  2018   4 0.44444444  308
# sample covariates = covar
   cyear         MJ tot pop births remI
1   2004  0.5419351  15 325     36    0
2   2005  0.7719222  30 242     42    0
3   2006  1.6173053 171 263     51   31
4   2007  1.1820159  81 228     38   10
5   2008  0.4223871 158 249     34   13
6   2009  0.9824686 114 301     57    5
7   2010  0.8365436 141 253     32    5
8   2011  0.3372473  72 274     48    0
9   2012  7.0426718  70 317     63    0
10  2013  0.7907661  55 319     65    0
11  2014  0.4765223  71 332     48    0
12  2015  1.5915086  56 301     47    0
13  2016  0.4123216  68 369     57    0
14  2017  0.5733078  78 392     59    0
15  2018 10.6978306   9 308     45    0

My model:

rproc <- Csnippet("
                  double foi, mu;
                  double rate[7], trans[7];

                  // expected force of infection
                  foi = (beta + (bp * MJ)) * (I + iota) / pow(pop, theta);

                  // time-varying death rates
                  if(t <= 2004 || t >= 2017) {
                    mu = mu2;
                  } else {
                    mu = mu1;
                  };

                  rate[0] = foi * dt;       // force of infection
                  rate[1] = mu;             // S death
                  rate[2] = sigma;         // rate of recovery
                  rate[3] = mu;             // I death
                  rate[4] = gamma;      // seroreversion rate
                  rate[5] = mu;             // R1 death
                  rate[6] = mu3;           // R2 death
    
                  // transitions between compartments
                  reulermultinom(2, S, &rate[0], dt, &trans[0]);
                  reulermultinom(3, I, &rate[2], dt, &trans[2]);
                  reulermultinom(2, R1, &rate[4], dt, &trans[4]);
                  reulermultinom(1, R2, &rate[6], dt, &trans[6]);

                  S += births   - trans[0] - trans[1];
                  I += trans[0] - trans[2] - trans[3] - remI;
                  R1 += trans[2] - trans[4] - trans[5];
                  R2 += trans[4] - trans[6]; 
                  N = S + I + R1 + R2;
                  truesp = (I + R1) / N;               // true seroprevalence
                  ")

initlz <- Csnippet("
                   double m = pop / (S_0 + I_0 + R1_0 + R2_0);
                   S = nearbyint(m * S_0);
                   I = nearbyint(m * I_0);
                   R1 = nearbyint(m * R1_0);
                   R2 = nearbyint(m * R2_0);
                   truesp = (I + R1) / pop;
                   ")

dmeas <- Csnippet("if (ISNA(pos)) {
                    lik = (give_log) ? 0 : 1;
                  } else {
                  lik = dbinom(pos, tot, truesp, give_log) +
                  dnorm(dpop, N, 20, give_log); 
                  }")

rmeas <- Csnippet("dpop = rnorm(N, 20);
                  pos = rbinom(tot, truesp);
                  apparent = pos/tot;")

# transform parameters for use with unconstrained optimizer
log_params <- c("beta", "bp", "iota")         # params > 0
logit_params <- c("mu1", "mu2", "mu3","sigma", "gamma", "theta",
                              "S_0", "I_0", "R1_0", "R2_0")     # 1 > params > 0

m2 <- pomp(dat, 
           time = "dyear",
           t0 = 2004,
           rprocess = discrete_time(rproc, delta.t = 1), 
           rinit = initlz,
           dmeasure = dmeas,
           rmeasure = rmeas,
           covar = covariate_table(covar, times = "cyear"),
           statenames = c("S", "I", "R1", "R2", "truesp", "N"),
           paramnames = c("beta", "bp","mu1", "mu2","mu3","sigma","gamma","theta","iota",
                          "S_0","I_0","R1_0","R2_0"),
           partrans = parameter_trans(log = log_params, logit = logit_params))

I can simulate from this model, and some runs (as expected) deplete I and 'break'.

paramvals <- c(beta = 0.12,  # some baseline transmission rate
               bp = 0.5,    # beta prime
               mu1 = 1/7,   # death rate 2005 - 2016
               mu2 = 1/4,   # death rate <2005 and >2016
               mu3 = 1/3,   # R2 death rate
               sigma = 1/3, # recovery I to R1
               gamma = 1/7, # recovery R1 to R2
               theta = 0.9, # degree of freq. v dens. dep.
               iota = 3,    # mean imported infections 
               S_0 = .4, I_0 = .15, R1_0 = .15, R2_0 = .3)

sims <- simulate(m2, 
                 params = paramvals,
                 nsim = 10, 
                 format = "data.frame", include.data = TRUE)

These parameter values at least seem to be getting us in the ballpark. I've omitted some other data,
and expect a bit of noise, so to me this doesn't look too awful.

# disease trend
ggplot(sims, mapping = aes(x = dyear, y = apparent, group = .id, color = .id == "data"))+
  geom_line()+
  guides(color=FALSE)

# population trend
ggplot(sims, mapping = aes(x = dyear, y = dpop, group = .id, color = .id == "data"))+
  geom_line()+
  guides(color=FALSE)

pfilter should fail:

pf <- pfilter(m2, Np = 100,
              params = c(paramvals))

Ultimately I want to be able to conduct the MLE search with mif2, so I've attached code below for that in case it is useful. One idea I tried is to replace the transition equations in rproc to include an ifelse statement... basically if I gets depleted, assign it a zero value:

 if((I += trans[0] - trans[2] - trans[3] - remI) < 1){
     I = 0;
 } else {
     I += trans[0] - trans[2] - trans[3] - remI;
 };

Surprisingly, this doesn't throw errors, but from the simulations it's clear that it's not doing what I intended... actually I can't tell what, if anything, it is doing.

Any suggestions? What options do I have?

This has been fixed, leaving in case it's useful for others:
This is an aside, so maybe it belongs in a separate question, but I've been getting this error in pomp2 which didn't occur in the previous pomp version I had installed. I'm not sure what is going on. Possibly my foreach code is incorrect. If I don't run this line:
coef(m2) <- paramvals don't run this! this assigns a point estimate
then the following call to stew returns an error message. I think you should be able to replicate the error here, even though once it's fixed stew should fail and spit out the parameters when I gets depleted. (I have this same issue for models of other populations that don't include removals.)

Here's the error:
Error in { : task 1 failed - "in 'mif2': the following parameter(s), given random walks in 'rw.sd', are not present in 'start': 'beta','bp','mu1','mu2','mu3','sigma','gamma','theta','iota','S_0','I_0','R1_0','R2_0'."

Here's the stew code block:

library(foreach)
library(doParallel)
registerDoParallel(detectCores() - 1)
getDoParWorkers()
set.seed(998468235L, kind="L'Ecuyer")

# a local search of the likelihood surface
start_time <- Sys.time()
stew(file = "./local_search_7Nov18.rda",{
  t_local_mif <- system.time({
    mifs_local <- foreach(i=1:20,
                          .packages='pomp2',
                          .combine=c,
                          .options.multicore=list(set.seed = TRUE),
                          .export=c("m2", "paramvals")
    ) %dopar%
    {
      mif2(
        m2,
        #start=paramvals, #start is deprecated, use params= instead
        params=paramvals,
        Np=1e3,
        Nmif=200,
        cooling.type="geometric",
        cooling.fraction.50=0.25,
        #transform=TRUE, #transform is deprecated, already assigned in pomp object
        rw.sd=rw.sd(beta=0.02,
                    bp=0.02,
                    mu1=ifelse(time >= 2005 & time <= 2016, 0.02, 0),
                    mu2=ifelse(time <= 2004 | time >= 2017, 0.02, 0),
                    mu3=0.02,
                    sigma=0.005,
                    gamma=0.005,
                    theta=0.02,
                    iota=0.02,
                    S_0=ivp(0.02,lag=1), 
                    I_0=ivp(0.02,lag=1), 
                    R1_0=ivp(0.02,lag=1), 
                    R2_0=ivp(0.02,lag=1))
      )
    }
  })
}, seed = 482947940, kind = "L'Ecuyer")
end_time <- Sys.time()
end_time - start_time 

Since all of those 'start' parameters are included in paramvals, which is included in .export=, I thought this should run. Assigning paramvals to coef(m2) gets rid of the error. What's happening?

@kingaa
Copy link
Owner

kingaa commented Feb 17, 2019

@gxvxn: apologies for being so very slow to reply. You may have resolved this issue already, or moved on entirely, but your main question, about how to remove a known number of individuals from a stochastic compartment, is an interesting one.

In a certain sense, there is no inference problem here: the model is doing what it is supposed to do and negative numbers of infectives is one of the things it is asked to handle. The question really is a modeling question: what kind of model is appropriate in this situation?

In this context, recall that a model is nothing other than a probability distribution, the hypothetical probability distribution of the data. In your case, this distribution is conditional on the number of removals. You could try to work out the proper form of the simulator conditional on these removals. That would appear to be difficult (at least to me, at this moment early on a Sunday morning) and unlikely to scale well.

Are there other approaches you might consider? What about treating the removals as data? In the POMP context, you would have to allow for some (possibly trivial) measurement error on the removals. You would also have to model the removal process itself, which might be annoying and might or might not yet be worthwhile.

If your limited goal is to estimate and maximize likelihoods using pfilter and mif2, you might try the following, variants of which I have used in the past. Create a new state variable called, say, error. Let error be initialized at zero and set to something other than zero whenever the number of removals exceeds the pool available. If error!=0, have the process-model simulator leave it alone, i.e., it should never be reset to zero under any circumstances. Then, in the measurement model, assign a likelihood of zero whenever error is nonzero, regardless of anything else. This is correct, since the probability of having more removals than there are individuals to remove is zero, aside from anything else. Thus, the particle filter will return the correct likelihood and iterated filtering should be able to handle the situation.

If the model is a good one, in the sense that it is capable of accounting for the data in a way that is compatible with the known removals, then you should, in principle have no problem. What I mean by this is that simulations of the model at the MLE, for example, will rarely if ever have fewer infectives than removals. Of course, it is also possible that, even at the MLE, a fair fraction of the simulations run afoul of this constraint. What are you to make of the simulations then, i.e., simulations that routinely produce error!=0?

@gavincotterill
Copy link
Author

Hi Dr. King,

Thanks for your reply. I like your suggestion of treating the removals as data, and I think I can see how that would be worked in. I have moved on from this issue, although I may still have occasion to revisit this. The interim work-around I used (which seems to suit my purposes enough) was to treat removals as a rate, and from simulations at the MLE, double check that the median estimates using that rate fell reasonably close to the number removed (and they do). It's a little round-about, and it might be sacrificing a bit of information that could otherwise help inform parameter estimation, but there's additional ambiguity with these data beyond what's described above, so I think this is a reasonable approach. Thank you for your help!

@kingaa
Copy link
Owner

kingaa commented Feb 25, 2019

Sounds like a sensible compromise. Closing this issue now, but feel free to reopen if necessary.

@kingaa kingaa closed this as completed Feb 25, 2019
@kingaa kingaa assigned kingaa and unassigned kingaa Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants