Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Poisson likelihood for catch with negative binomial #3

Open
marksorel8 opened this issue Aug 21, 2019 · 58 comments
Open

Replace Poisson likelihood for catch with negative binomial #3

marksorel8 opened this issue Aug 21, 2019 · 58 comments
Assignees
Labels
enhancement New feature or request

Comments

@marksorel8
Copy link
Owner

marksorel8 commented Aug 21, 2019

I also invite @ebuhle to do this and do the comparison with LOO of which you speak ;)

I leave the exercise to the reader...errr, dissertator.

The tricky part will be making sure you're using a parameterization of the NB that is closed under addition, so that this still holds:

M_hat_cumsum[i] = sum(M_hat_t[(trap_day[i] - elapsed_time[i] + 1):trap_day[i], trap_year[i]]);

As you probably know, there are several parameterizations in common use, and even the "standard" one in Stan differs from the one on Wikipedia, for example. Further, the most common and useful parameterization in ecology (mean and overdispersion) is yet another variant. The catch is that closure under addition requires that all the NB RVs being summed have the same p (in the standard parameterization), so you can't simply sum the expectations like we do with the Poisson.

Then it will also be slightly tricky to figure out the binomial thinning:

C_hat[i] = M_hat_cumsum[i] * p[trap_year[i],trap_week[i]];

since thinning the NB (like the Poisson) scales the expectation. So you'll have to switch between parameterizations (i.e., solve one set of parameters for the other) at least once.

I would have to sit down with pencil and paper to figure this out. Sounds like a good grad student project, amirite??

Originally posted by @ebuhle in #1 (comment)

@ebuhle ebuhle changed the title > I also invite @ebuhle to do this and do the comparison with LOO of which you speak ;) Replace Poisson likelihood for catch with negative binomial Aug 21, 2019
@mdscheuerell mdscheuerell added the enhancement New feature or request label Aug 21, 2019
@ebuhle
Copy link
Collaborator

ebuhle commented Aug 21, 2019

My contribution is to make the issue title informative.

@mdscheuerell
Copy link
Collaborator

I agree that it would be good for @marksorel8 to take a crack at the NB formulation.

@mdscheuerell
Copy link
Collaborator

My contribution is to make the issue title informative.

And I just added an informative label to the issue!

@ebuhle
Copy link
Collaborator

ebuhle commented Aug 21, 2019

Good job, team!

@marksorel8
Copy link
Owner Author

marksorel8 commented Aug 22, 2019

Alright, here goes.

closure under addition requires that all the NB RVs being summed have the same p (in the standard parameterization)

The Stan function neg_binomial_lpmf(y | alpha, beta) is similar to the the standard paramaterization provided here, except the binomial coefficient part is different and [beta/ (beta+1)] replaces p. The mean of the Stan distribution is alpha/ beta, and the variance is [alpha/(beta^2)] * (beta+1).

Since the Stan parameterization is similar to the standard parameterization in the part with p, would it then be closed to addition as long as we fit a single beta (and therefore p) parameter for each year?

alpha = expected value * beta. So, can we just replace

with C~neg_binomial_lpmf(C_hat*alpha, beta)?

@mdscheuerell
Copy link
Collaborator

If you want C ~ neg_binomial(alpha, beta) then I think you need

beta <- alpha / C_hat

@marksorel8
Copy link
Owner Author

@mdscheuerell , would that make the distribution not closed to addition, because the betas are different?

The closed to addition thing isn't a problem in the Wenatchee because they are legally obligated to check the traps daily, but it seems like it is important to crack, especially since it will be necessary in other places like the Tucannon.

@mdscheuerell
Copy link
Collaborator

I don't understand the issue re: alpha and beta in the negative binomial complicating the estimation. The mean (and variance) for the Poisson (C_hat) is indeed a function of p[trap_year,trap_week], but that calculation is done prior to its inclusion in the likelihood. The negative binomial can also be parameterized in terms of the mean and variance, which means using the same estimate of C_hat.

@ebuhle
Copy link
Collaborator

ebuhle commented Aug 26, 2019

There are two distinct steps in the likelihood calculation where going from the Poisson to the NB presents complications.

The one @marksorel8 is referring to is the summation of "true" migrants over multiple days, if the trap isn't emptied daily (though apparently that's a non-issue in the Wenatchee). This is implicitly a summation of discrete-valued latent states, but when they're Poisson-distributed it's trivial b/c you can just sum the expectations and it's still Poisson. With the NB, this doesn't work -- the sum of NBs is NB iff they have the same p, in which case you sum the r's (in the standard parameterization). So you'll have to (1) declare p (or a one-to-one transformation of it) as a primitive parameter, which is less than ideal for interpretation and for characterizing the overdispersion; (2) after calculating the daily expectation M_hat_t, use the fixed p to solve for r[t]; (3) sum the r[t]'s across days; and then (4) convert back to the expectation M_hat_cumsum for subsequent likelihood calcs.

The second issue is the binomial thinning in calculating C_hat. If we were using the (mean, overdispersion) parameterization, this would be simple: thinning scales the NB mean just like it does the Poisson. Here, though, we'll have to either solve for the overdispersion to use the mean directly, or solve for r again (after thinning) to use the standard parameterization.

@mdscheuerell
Copy link
Collaborator

I had made the assumption that we had a distributional form for the expected catch for each trap, which was initially Poisson. The expectation (and variance) for the Poisson comes from a Gaussian AR(1) model for log-migrants, which is then thinned at some interval based upon when the trap is sampled. I had thought p was indeed constant within an interval, and hence the NB should be OK because we could parameterize it in terms of a mean (fixed) and r (unknown).

@ebuhle
Copy link
Collaborator

ebuhle commented Aug 26, 2019

I had made the assumption that we had a distributional form for the expected catch for each trap, which was initially Poisson.

What I had in mind in formulating the model was that it's actually the daily arrivals at the trap that are Poisson. The daily outmigrants available for capture are a Poisson-log AR(1) process, which are then possibly accumulated over multiple days and thinned by binomial sampling with trap efficiency p (uhh, the other p...that's confusing). In JAGS, you would model the daily arrivals explicitly:

M[t] ~ dpois(M_hat[t])

We can't do that in Stan, but luckily the Poisson is closed under addition and binomial thinning, so everything works out and we get to have our discrete-latent-state cake and eat it too (in generated quantities).

Things aren't quite as elegant if the daily arrivals are NB. It's still doable as outlined above, but I suspect having to parameterize in terms of p (vs. overdispersion) induces a wonkier prior that's harder to interpret (since p is involved in both the mean and overdispersion).

@mdscheuerell
Copy link
Collaborator

I suspect having to parameterize in terms of p (vs. overdispersion) induces a wonkier prior that's harder to interpret (since p is involved in both the mean and overdispersion).

I agree with this.

@mdscheuerell
Copy link
Collaborator

FYI, I did something similar for the San Nicolas data wherein the expectation for the NB was based on the log-abundance from a MAR(1) model, but the observations weren't nearly as sparse (ie, NA's were very rare), so no summations were necessary and solving for r wasn't such a hassle.

@ebuhle
Copy link
Collaborator

ebuhle commented Aug 26, 2019

@marksorel8, you could explore the difference b/w the two parameterizations with the Wenatchee data, since summation over missing days isn't an issue.

@marksorel8
Copy link
Owner Author

Hi @mdscheuerell @ebuhle, I have started adding the ability to use the NB distribution for the observations in the single year Stan model. I am using the mu and phi parameterization of the NB.

C ~ neg_binomial_2(C_hat,phi_obs[1]);

Would you be able to check the changes that I made in juv_trap.stan in the last commit and let me know if I am on the right track? What would an appropriate prior on the phi parameter in the NB be?

I am not getting radically different results with the NB than the Poisson at this point. Should I us LOO to compare them once I have a descent prior on phi? How would I go about that?

Finally, I found an e-book that might be of use in this project. Weisse- An Introduction to Discrete-Valued Time Series.pdf
The first chapter and specifically section 2.1.3 on pg 22 seem relevant, and there is some included Rcodes.zip. The password to access the Rcodes is "DiscrValTS17". NB-INAR1.r might have some useful reparamitization algebra.

@mdscheuerell
Copy link
Collaborator

@marksorel8 The changes you made are not sufficient because you're using the primitive parameter phi_obs directly in the NB likelihood without using p to solve for r prior to summing r over t. Here's the pseudo-code from @ebuhle

  1. declare p as a primitive parameter (right now it's in transformed parameters);
  2. after calculating the daily expectation M_hat_t, use the fixed p to solve for r[t];
  3. sum r[t] across days; and
  4. convert back to the expectation M_hat_cumsum.

@mdscheuerell
Copy link
Collaborator

Re: models for discrete-valued data, you could opt to model the true, but unknown mean number of migrants as an AR(1) process for discrete values, but it's just as reasonable to assume the log-mean is actually real-valued (Gaussian), which is just fine for the Poisson or NB.

@ebuhle
Copy link
Collaborator

ebuhle commented Sep 11, 2019

@mdscheuerell pretty much covered it. The key point is that you can't use the neg_binomial_2 parameterization as long as you need to sum expected migrants over multiple days,

for(i in 1:N_trap)
M_hat_cumsum[i] = sum(M_hat[(trap_day[i] - elapsed_time[i] + 1):trap_day[i]]);

because the NB is only closed under addition if all the terms are distributed with the same p (i.e., in Stan's neg_binomial version, the same beta) and the only way to enforce this is to declare p (i.e., beta) as a primitive parameter.

However, if you don't do the summation (which you can get away with in the Wenatchee case) then either parameterization is fine, and this would allow you to compare the two. As for a prior on the overdispersion, I'd suggest a half-normal or half-Cauchy, diffuse enough to cover plausible values (recalling that smaller phi means more overdispersion).

@ebuhle
Copy link
Collaborator

ebuhle commented Sep 11, 2019

Oh, and just to clarify:

1. declare `p` as a primitive parameter (right now it's in `transformed parameters`);

That's a different p -- the weekly capture probability of the trap, not the binomial probability underlying the "number of failures before first success" interpretation of the NB -- and it's fine as is. Not confusing at all...

@marksorel8
Copy link
Owner Author

Thanks @mdscheuerell and @ebuhle . I added a prior on the dispersion parameter (actually, I put the prior on the inverse of the square root of the dispersion parameter) and played around with fitting a few different years of the Chiwawa data with the NB and the Poisson.

For one thing, I am getting some divergences with the NB.

Beyond that, the degree of overdispersion depends on the year. In some years it is negligible and in others it is more substantial. The year shown below is one with more overdispersion.

Negative Binomial
image

Poisson
image

@marksorel8
Copy link
Owner Author

Here is an example where the NB seems much more appropriate. The overdispersion seems important for fry, which are logically observed with more error.

Poisson
image

Negative Binomial
image

@mdscheuerell
Copy link
Collaborator

Correct me if I'm wrong, @marksorel8, but it looks to me like you still haven't changed the Stan code to address the proper thinning of the NB per our discussion above (i.e., you are still summing over weeks with different p's).

@marksorel8
Copy link
Owner Author

You are correct that I haven't changed the Stan code, @mdscheuerell, but I am not summing over weeks with different p's (in the Wenatchee example). There is no summing over weeks or days in the Wenatchee data because each catch observation represents a single day. So, the NB Stan model is currently appropriate for the Wenatchee data but not the Tucannon data (where the trap is fished for consecutive days without checking).

Addressing the proper thinning of the NB is the next step, but I wanted to compare the NB and the Poisson with the Wenatchee data first because it was simple. Looks like the NB is a big improvement for sure. So, good idea!

@ebuhle
Copy link
Collaborator

ebuhle commented Sep 13, 2019

So, the NB Stan model is currently appropriate for the Wenatchee data but not the Tucannon data (where the trap is fished for consecutive days without checking).

Except the code still includes the summation, which the Wenatchee data doesn't need (b/c elapsed_time[i] is always equal to 1). In general, it's not advisable to write code that produces an invalid model unless the data meet certain conditions, without checking or warning about violations of those conditions -- i.e., it fails silently.

The simplest solution in this case would be to make a separate version of juv_trap.stan that ditches the summation and uses the neg_binomial_2 parameterization. That would also facilitate model comparison.

@marksorel8
Copy link
Owner Author

Here I tried to implement the first parameterization of the NB in Stan (y~neg_binomia(alpha, beta)), which would theoretically be closed to addition, but I'm getting very different answers than when I used the other (mu and phi) parameterization, as well as some menacing warning messages. Do you see the problem(s)? I'm stumped.

I'll work on reverting the original single-year Poisson model and adding a separate neg-binomial_2 version without the summation. What is the easiest way to revert to an earlier commit in GitHub?

@mdscheuerell
Copy link
Collaborator

It sounds like you want git reset --mixed BLAH where BLAH should be replaced by the SHA of the commit you'd like to return everything to. If you are unsure about this, I'd suggest making a copy of the file(s) and placing them elsewhere before resetting.

@marksorel8
Copy link
Owner Author

This is great @ebuhle , thank you! I agree that there there should be different .stan models for each distribution, and will work on that after AFS. Will edit the neg_binomial model per your suggestions too.

@marksorel8
Copy link
Owner Author

We now have three different single-year Stan models using the three different observation likeliehoods: Poisson, Negative Binomial (alpha, beta), and Negative Binomial (mu, phi). I have lots of questions moving forward :)

  1. How do we want to do model comparison/selection?
  2. Should we revive the discussion of what process model(s) to consider (in a new thread)?
  3. Would we ever consider using different observation models for different years?
  4. Does this model seem like it is worth a stand-alone publication in a journal like CJFAS? or better as a component of a larger paper integrating it with other data (e.g. redd counts) and telling more of an ecological story?

Fits with the different models to an example Chiwawa year
image

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 8, 2019

A few quick thoughts:

1. How do we want to do model comparison/selection?

LOO!

2. Should we revive the discussion of what process model(s) to consider (in a new thread)?

Sure, pending answer to (1).

3. Would we ever consider using different observation models for different years?

My vote is no, especially if we're still contemplating sharing some observation-related parameters among years.

4. Does this model seem like it is worth a stand-alone publication in a journal like CJFAS? or better as a component of a larger paper integrating it with other data (e.g. redd counts) and telling more of an ecological story?

Seems like the main thrust of a methods-oriented paper would be to compare this to existing approaches, in particular BTSPAS. Whether that's worth doing depends largely on your level of interest, but also I don't think our model is quite up to the task yet; BTSPAS does several things (esp. related to the efficiency trials) that ours doesn't.

@marksorel8
Copy link
Owner Author

marksorel8 commented Oct 8, 2019

LOO!

sounds good...but looks hard. Are you available to sit down and talk about how to implement this? Or do you have an example?

Would we do LOO for the catch data only?

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 9, 2019

sounds good...but looks hard.

Really, even after reading the vignettes?

The theory is subtle (though not that much worse than AIC / DIC / *IC, except for the computational part) but the loo package makes it dead simple in practice. The only change needed to the Stan code would be a few lines in generated quantities to calculate the pointwise log-likelihoods for catch and recaptures, so that you can monitor them and pass the result to loo(). If you get stuck on this, I can help out.

Would we do LOO for the catch data only?

While that's not necessarily wrong (there are situations where you're only interested in the predictive skill of one part of a model, or for one type of data), here I'd recommend using the total likelihood.

@marksorel8
Copy link
Owner Author

Ok, I'll give it a try, since it sounds like you think I can do it. I got thrown off by some of the more complicated examples in the vignette.

a few lines in generated quantities to calculate the pointwise log-likelihoods for catch and recaptures

This sounds like a log_lik vector with a pointwise log-likelihood value for each data point (so length of data set #1 + length of data set # 2). Let me know if I'm on the wrong track here. I'll give this a shot though :)

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 9, 2019

This sounds like a log_lik vector with a pointwise log-likelihood value for each data point (so length of data set #1 + length of data set # 2). Let me know if I'm on the wrong track here. I'll give this a shot though :)

Ultimately, yeah. For readability's sake I'd probably declare two vectors,

generated quantities {
  vector[N_MR] LL_MR;
  vector[N_trap] LL_trap;

  for(i in 1:N_MR) {
    // evaluate mark-recap log-likelihood for obs i
  }

  for(i in 1:N_trap) {
    // evaluate catch log-likelihood for obs i
  }
}

Then you could either concatenate them into one long vector, or monitor them both and cbind() them together. The latter might make it easier to keep track of which likelihood component is which.

@marksorel8
Copy link
Owner Author

I attempted to compare the Poisson and NB for the 2001 Tucannon data. I got the following warnings after running the NB model.

Warning messages:
1: In validityMethod(object) :
The following variables have undefined values: M[1],The following variables have undefined values: M[2],The following variables have undefined values: M[3],The following variables have undefined values: M[4],The following variables have undefined values: M[5],The following variables have undefined values: M[6],The following variables have undefined values: M[7],The following variables have undefined values: M[8],The following variables have undefined values: M[9],The following variables have undefined values: M[10],The following variables have undefined values: M[11],The following variables have undefined values: M[12],The following variables have undefined values: M[13],The following variables have undefined values: M[14],The following variables have undefined values: M[15],The following variables have undefined values: M[16],The following variables have undefined values: M[17],The following variables have undefined values: M[18],The following variables have undefine [... truncated]
2: There were 7 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
3: Examine the pairs() plot to diagnose sampling problems
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess

Not sure what the deal with the undefined values is. The low ESS's were for the process error sigma and the NB overdispersion p, which would logically be negatively correlated. R-hats were still 1.01 for those parameters after 1500 iterations though.

When I ran loo for the Poisson model I got the following warnings:

1: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
 
2: In log(z) : NaNs produced
3: In log(z) : NaNs produced

and this output:


         Estimate   SE
elpd_loo   -591.6 23.9
p_loo       136.9  7.3
looic      1183.2 47.7
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     47    21.0%   325       
 (0.5, 0.7]   (ok)       66    29.5%   88        
   (0.7, 1]   (bad)      93    41.5%   13        
   (1, Inf)   (very bad) 18     8.0%   5         
See help('pareto-k-diagnostic') for details.

They were better for the NB model, but still not great.

> print(loo_NB)

Computed from 3000 by 224 log-likelihood matrix

         Estimate   SE
elpd_loo   -664.9 28.2
p_loo        96.9  6.8
looic      1329.8 56.4
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     103   46.0%   132       
 (0.5, 0.7]   (ok)        71   31.7%   17        
   (0.7, 1]   (bad)       45   20.1%   12        
   (1, Inf)   (very bad)   5    2.2%   3         
See help('pareto-k-diagnostic') for details.

@mdscheuerell
Copy link
Collaborator

I took a quick glance at your code and you're not asking for the pointwise likelihoods of the data (catch), but rather an estimated state (migrants), so these comparisons via loo() aren't really germane anyway.

@marksorel8
Copy link
Owner Author

marksorel8 commented Oct 10, 2019

Hi @mdscheuerell , I forgot to push the changes that I made to the code last night. Doh! sorry. I am asking for the pointwise log_likelihood of the data with the following code, right?

 vector[N_MR] LL_MR;  //pointwise log-likelihood values for Mark Recapture data
  vector[N_trap] LL_trap; // pointwise log-likelood values for trap catch data

  for(i in 1:N_MR) {
    LL_MR[i] = binomial_lpmf( recap[i] | mark[i], p[MR_week[i]]); // evaluate mark-recap log-likelihood for obs i
  }

  for(i in 1:N_trap) {
    LL_trap[i] = neg_binomial_lpmf(C[i] | C_hat[i], phi_obs); // evaluate catch log-likelihood for obs i
  }

apologies for not pushing the code I was referencing last night.

@marksorel8
Copy link
Owner Author

Might we need to do some of the fancy stuff described here for the catch time series data, @ebuhle ?

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 14, 2019

Might we need to do some of the fancy stuff described here for the catch time series data, @ebuhle ?

Good Q, but that is only necessary if you're trying to do step-ahead forecasting, which is not our scenario.

I still need to dig into the Tucannon results you posted above and understand where those numerical errors in the generated quantities M[t] are coming from. Presumably the issue is under- / overflow either in (1) calculating b (per my proposed notation), which you're still calling beta_NB, or (2) evaluating the PMF for the neg_binomial_rng function. When I tried it, I didn't get those errors, so it depends on what regions of parameter space the chains visit.

I'm also not clear on why you were using Tucannon data for the model comparison, since that prohibits the use of the NB2 model. I thought the idea was to investigate the differences b/w the NB (awkward and nonintuitive, but usable with gappy data) and NB2 (more appealing, but unusable when gaps are present) parameterizations using data that can accommodate both.

@marksorel8
Copy link
Owner Author

Yep, not sure which of these two reasons caused the errors, but I agree those seem like the likely culprits. I see that for neg_binomial_rng the expected value must be less than 2^29, so maybe the sampler occasionally visits values that high? I also only get those errors sometimes, so maybe it's not a big deal?

Also, good Q why I fit the Tucannon data. I guess I thought it might be interesting because I assumed it had MR data for most weeks. I've been playing around with the Chiwawa data too. Lot's of bad Pareto K diagnostic values, but if loo_compare is to be believed, it appears to prefer the Poisson to either NB for 2013. I need to read up and try to better understand what LOO is doing so I can assess how serious those bad diagnostics are.

@marksorel8
Copy link
Owner Author

Perhaps we would have better luck with K-fold than leave one out?

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 15, 2019

I need to read up and try to better understand what LOO is doing so I can assess how serious those bad diagnostics are.

Good idea. You will learn that those high Pareto-k values are (very) bad. Basically they mean that the corresponding pointwise likelihoods are so long-tailed that their importance sampling distribution cannot be fit by the Pareto smoothing distribution -- they may not have finite variance or even finite mean -- and therefore the approximation to the leave-one-out predictive density underlying LOO is invalid. This isn't too surprising, TBH. I expect @mdscheuerell would agree that ecological models in the wild have bad (or very bad) Pareto ks more often than not. The loo package provides some nice diagnostic plots that can help you understand which observations or data types (e.g., catch vs. recaptures) have these extremely skewed likelihoods.

In principle, the solution is to do brute-force cross-validation for the problematic observations. In practice, and depending on how much you care about the answer, re-fitting the model that many times may or may not be worth it. An alternative, as you suggest, is to abandon LOO altogether and do K-fold CV, which is still a heckuva lot more work than LOO. (Just to be clear, brute-force leave-one-out is a special case of brute-force K-fold CV with K = N. What you've been doing so far is neither of these: LOO is an approximation to the leave-one-out posterior predictive density. In that narrow sense it is analogous to AIC, DIC, WAIC, etc.)

In any event, I think we have to root out what is giving those numerical errors when converting from (mu,q) to (a,b) (again, using my proposed notation). If the NB RNG is throwing errors, presumably the likelihood is unstable as well, which could certainly have something to do w/ the extreme skewness.

I'm also not crazy about the prior on the inverse square-root of the overdispersion parameter in the NB2 formulation, but that's another post.

@marksorel8
Copy link
Owner Author

we have to root out what is giving those numerical errors when converting from (mu,q) to (a,b) (again, using my proposed notation). If the NB RNG is throwing errors, presumably the likelihood is unstable as well, which could certainly have something to do w/ the extreme skewness.

From the info on neg_binomial_rng: "alpha / beta must be less than 2^29". Is it possible to extract all of the alphas and betas that the chain visited and see if alpha/beta exceeded this threshold in any iteration?

What do we do if the likelihood is unstable? Can we constrain it with priors? Or is there something about the model itself that would need to be changed?

Thanks!

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 25, 2019

for(t in 1:max_day){
M[t] = neg_binomial_rng(M_hat[t]*beta_NB,beta_NB);
}

Assuming you're already monitoring beta_NB (that is, b in my notation) and/or p_NB (q), you would just need to also monitor M_hat to figure out what's going on. Depending on what it is, it's possible that more constraining priors might help, yes. Although hopefully not on q, because the simplicity of putting a uniform prior on the probability is one of the nice things about this parameterization.

Like I mentioned yesterday, if you're going to experiment with this I'd suggest saving the random seed so we can reproduce the errors.

@marksorel8
Copy link
Owner Author

Ok, just added a reproducible example, with the seed set for the Stan models. I also added a model using that neg_binomial_2_log distribution that used the log(mean), thinking it might be less susceptible to over- or underflow. I think it was probably underflow causing the warnings. This paramaterization does seem to eliminate those warnings (Undefined values of M[x]).

Are there tests that we should conduct to evaluate the stability of the likelihood (e.g. evaluating the effect of seed on result)?

For comparing observation models (poisson vs. NB1 vs NB2), it occurs to me that we should probably use the full multi-year model. I envision this model having independent M_hat trajectories but some shared parameters in the model of trap-capture-efficiency. Does that make sense?

@mdscheuerell
Copy link
Collaborator

mdscheuerell commented Oct 29, 2019

@marksorel8 Can you produce some violin plots of the pointwise likelihoods? IME, they will give you an indication of where, exactly, the long tails are arising and giving the lousy pareto-K estimates. For example, here is a case from our Skagit steelhead model where tmp_lp is an num_mcmc_samples by num_likelihood_elements matrix.

## base plot
tidy_lp <- tidyr::gather(data.frame(tmp_lp), key = time, value = lp)
pp <- ggplot(tidy_lp, aes(x=time, y=lp)) +
  labs(title = "", x = "Data element", y = "Pointwise likelihood") + 
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank())
## violin plot
p1 <- pp + geom_violin()
p1

Screen Shot 2019-10-29 at 6 46 36 AM

It's pretty obvious that there are 2 different contributions to the overall likelihood, and that the first (age composition) is behaving pretty badly compared to the second (escapement).

@marksorel8
Copy link
Owner Author

Hi @mdscheuerell , the first plot below is of the pointwise likelihoods of the 15 mark-recapture efficiency trials for the Chiwawa in 2013 and the second plot is for 50 of the 252 catch observations. Looks like there are long tails in the pointwise likelihoods for all the data points.

image

image

For the actual analysis I will use a multi-year model, which combines information from mark-recapture trials ac cross years. It would be great if the multi-year model behaved better, but that may be wishful thinking?

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 29, 2019

FYI, @marksorel8, I'm currently investigating your reprex on the numerical errors in M under the juv_trap_NB.stan model. Diagnosing the problem is challenging b/c when the errors occur during sampling as opposed to warmup, and M and/or M_tot is being monitored, the summary stats throw an error and no result is returned. Further complicating things, seemingly irrelevant changes to the model in an effort to circumvent this catch-22 evidently change the state of the RNG so that the errors go away. (Not entirely sure what's up w/ that, but I suspect it has to do with floating-point weirdness, which I first learned of thanks to a frustrating Stan situation a couple years ago.) Even more baffling, simply removing M and M_tot from the list of parameters to be monitored causes the errors to go away. No idea what's going on there -- this seems to defy the basic programmatic logic of how Stan works.

From the info on neg_binomial_rng: "alpha / beta must be less than 2^29". Is it possible to extract all of the alphas and betas that the chain visited and see if alpha/beta exceeded this threshold in any iteration?

The actual errors printed in the Viewer pane are like this:

Exception: neg_binomial_rng: Random number that came from gamma distribution is 1.72119e+015, but must be less than 1.07374e+009 (in 'model2cf869f0741_juv_trap_NB' at line 103)

This refers to the method of generating pseudo-random NB variates using the gamma-Poisson mixture. AFAICT, either of the NB parameters (M_hat * beta_NB or beta_NB) could be responsible for these huge gamma RVs. We need a way to monitor these parameters when the errors are occurring to see if a pattern emerges in the "good" vs. "bad" iterations, and that in turn requires tricking the sampler as described above.

Stay tuned...

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 30, 2019

Aha! I think at least part of the reason I'm having such a hard time reproducing your reprex is because you didn't set the RNG seed for R, thus the inits will be different from one call to the next. I must've gotten "lucky" the first time or two yesterday. Will just have to continue with trial-and-error (as in, keep trying until I get an error).

@marksorel8
Copy link
Owner Author

Oh man, I'm sorry @ebuhle. At first I only set the RNG in R and that didn't work, so I switched to setting it in Stan only, which i thought was enough. SMH. I think my R seed may have been 10403.

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 31, 2019

See #3 (comment); I found one that "works". Better still, these seeds give two chains that are fine and one (chain 3) that appears to throw the error at the initial values and every iteration thereafter. Nothing obvious pops out from comparing the inits, though. Will have to mess around with a print statement in generated quantitites.

@mdscheuerell
Copy link
Collaborator

An aside, but perhaps worth mentioning. Once upon a time, I had a JAGS model wherein a particular seed had the rather undesirable effect of allowing the MCMC chains to get X steps along (like 75%) before barfing every time at the same iteration. No reboot, update, downdate(?), etc made a difference.

In this particular case, however, the behavior is much more pathological in that there a multiple pathways to the problem. Don't ask me how I know this, but setting really bad bounds on priors (eg, setting the lower bound on an SD prior to be much greater than the true value for the simulated data) can do wonders for the actual fitting process (eg, eliminates all divergent transitions, rapidly decreases run times), even if the answers are obviously wrong.

So, perhaps it's worth changing the prior bound(s) for p_NB (ie, the scale param for the NB) away from one or both of the current bounds to see if we can eliminate the problem? Right now the range on p_NB is [0,1].

real<lower=0,upper=1> p_NB; //scale paramater for the Negative Binomial observation model for catch.

Perhaps we should try [0.2,0.8] (or something else that's biased and more precise than the current range)?

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 31, 2019

Once upon a time, I had a JAGS model wherein a particular seed had the rather undesirable effect of allowing the MCMC chains to get X steps along (like 75%) before barfing every time at the same iteration. No reboot, update, downdate(?), etc made a difference.

Haha, even I remember this; that's how annoying it was!

So, perhaps it's worth changing the prior bound(s) for p_NB (ie, the scale param for the NB) away from one or both of the current bounds to see if we can eliminate the problem? Right now the range on p_NB is [0,1].

Not a bad idea, although I'd suggest something without the un-Gelmanian hard endpoints inside the feasible domain, like Beta(2,2). I do have the sneaking suspicion that p_NB is the culprit, but I was hoping to get some more decisive evidence. I didn't think it would be this difficult, though.

@ebuhle
Copy link
Collaborator

ebuhle commented Oct 31, 2019

OK, so a bit of print-statement sleuthing suggests that the problem may be in both M_hat and beta_NB.

for(t in 1:max_day){
print("M_hat[", t, "] = ", M_hat[t], " beta_NB = ", beta_NB);
M[t] = neg_binomial_rng(M_hat[t]*beta_NB,beta_NB);
}

Typical output in cases that produce the error:

Chain 3: M_hat[1] = 1.31254e+014 beta_NB = 9.99201e-016
M_hat[2] = 1.31131e+014 beta_NB = 9.99201e-016

Chain 3: Exception: neg_binomial_rng: Random number that came from gamma distribution is 3.83462e+011, but must be less than 1.07374e+009 (in 'model2a1471a922e3_juv_trap_NB' at line 102)

Compare that to, for example:

M_hat[53] = 1058.57 beta_NB = 0.311853
M_hat[54] = 1047.58 beta_NB = 0.311853
M_hat[55] = 1316 beta_NB = 0.311853
M_hat[56] = 312.761 beta_NB = 0.311853

It's too bad the "canonical" priors that would regularize p_NB away from 0 and 1 (like the aforementioned Beta(a,b) where a and b are slightly greater than 1) are also fairly informative about the bulk. We could use something like the Subbotin distribution, as we do in salmonIPM, but it's hacky.

It's less obvious how to prevent M_hat from blowing up, but maybe tighter priors on the log-scale regression coefs could help:

Anyway, I'm just leaving this update here for now; I'm out of time to spend on this today.

@marksorel8
Copy link
Owner Author

Thank you so so much for looking into this @ebuhle. I will try some different priors and report back!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants