Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest using probability-integral-transform (PIT) residuals for delta/Tweedie models #168

Closed
James-Thorson-NOAA opened this issue May 1, 2020 · 9 comments · Fixed by #169

Comments

@James-Thorson-NOAA
Copy link

Florian and whomever may be interested,

I'm exploring using DHARMa to automate packaging, plotting, and testing residuals for R package VAST. However, the DHARMa calculation for quantile residuals performs poorly for the delta-models used in that package, and I think an easy solution (which would also reduce to existing practices in other cases, plus have additional theoretical support) would be to use probability-integral-transform PIT residuals. I am writing to suggest easy changes to implement this in DHARMa (to push the change up my stack of code), and am happy to do some back-and-forth or provide code.

DHARMa appears to calculate residuals by first detecting whether there are duplicates (in observations or predictions) and if there then proceeding as if the response is discrete-valued (Poisson or binomial, etc). In this case, DHARMa apparently jitters response and simulated values by +/- 0.5. However, a delta- or Tweedie-model has mass at 0 and then continuous values (representing e.g. a biomass response) above this. In these cases, DHARMa detects an integer response and jitters values. However, the amount of jitter is always 0.5, whereas the scale of positive values in a delta-model depends upon the units where residuals should be scale invariant. e.g., when increasing measurement units from grams to tons, it then decreases the distribution of positive values, such that they are then increasingly "smeared" across the zeros, resulting in DHARMa typically stating the model fits better.

The solution is to calculate PIT residuals for every observation by:

  1. detecting the proportion of simulated values less than the observation ("Min")
  2. detecting the proportion of simulated values less than or equal to the observation ("Max")
  3. simulating a uniform distribution between Min and Max.

For a continuous-valued response, this should be identical to what DHARMa is already calculating (although I haven't tested this). For discrete-valued response, it should be similar to the current jittering approach. For delta-valued response it obviously fixes the issue I flagged. (I haven't tested the continuous and discrete-valued cases, but have tested the improvement for a delta-model) So it contains existing use-cases and is easy to implement in approx. three lines of R code. Presumably Dave Warton could provide a detailed explanation for the theoretical benefits of PIT residuals, e.g., explored here for other purposes .

@florianhartig
Copy link
Owner

florianhartig commented May 3, 2020

Hi James, thanks for this suggestion, I'm happy to give this a go.

I find it hard to wrap my head around the concept of the PIT residuals in general though, and specifically to decide if and when the PIT Residuals would be different from DHARMa residuals. I wonder if you have any thoughts about the following points:

  1. The min/max calculation sounds maybe more general, but is there any important case where the (theoretical) difference between min / max wouldn't be one?

  2. More general, I'm a bit concerned about whether simulations are able to properly determine the appropriate amount of jitter (= randomization). If I simulate from a discrete distribution, it could easily happen (with 250 simulations) that I get for a Binomial, for example, only values of 1 (with observed = 1), and theoretically it could also happen that I skip an intermediate step, e.g. that I observer in a Poisson 3, and I simulate values of 1 and 3, 4. In the first case, it seems as if the PIT method doesn't produce any jitter, in the second, it would jitter with a uniform of width 2, which seems inefficient, because it should jitter with U(0,1). In general, I don't really see the advantage of determining the appropriate "width" of the uniform by simulation, as opposed to determining it from the distribution directly, as it is currently done in DHARMa (except for the point that this would allow to determine the jitter for each data point independently, but that could also be solved differently).

  3. For the specific case of the Tweedie: I was aware of the problem, and I we had an issue open about this before (I think this actually led to me implementing this test for duplicates in continuous distributions). I had considered that you could put the jitter only on the "discrete" = zero values, which is still an option, but had at that point decided that there is not much lost by jittering everything, and it seemed safer / easier to me. However, if I would apply the PIT procedure on the Tweedie, wouldn't that fail? Let's say I observe zero, and then the Tweedie will basically simulate a lot of zeros, plus larger values - how would the algorithm that you propose above create proper residuals for that? There are no values simulated < 0, so how do you set min?

Best,
Florian

@James-Thorson-NOAA
Copy link
Author

Florian,

Thanks for your interest and quick response! I realize maintaining a package is a ton of work, and also appreciate your attention in carefully reviewing any proposed changes.

Based on your response, I think we might be talking past each about the algorithm for calculating PIT residuals. So first I suggest that I explain the algorithm more carefully:

  1. For each observation i with observation y(i), simulate R replicated simulations yprime(i,r) conditional on fixed effects and some subset of random effects
  2. calculate the proportion pmin(i) of simulated values yprime(i,r) less than the observation y(i), where 0 <= pmin(i) <= 1
  3. calculate the proportion pmax(i) of simulated values yprime(i,r) less than of equal to the observation y(i), where 0 <= pmax(i) <= 1
  4. simulate a the quantile-residual pvalue(i) for observation i as a uniform random number between pmin(i) and pmax(i).

This algorithm has a few use-cases and properties:

  • When dealing with a continuous distribution (PDF), all values in the set {y(i), yprime(i)} will be unique (or almost always unique for a machine-double). So the difference between pmin(i) and pmax(i) is zero for all observations, such that the algorithm reduces to just defining pvalue(i) as the proportion of yprime(i,r,) less than y(i)
  • When dealing with a discrete distribution (PMF), values in set {y(i), yprime(i)} will often (but not always) contain duplicates. Whenever y(i) equals one or more value from yprime(i), then pmin(i) doesn't equal pmax(i) and the difference between pmin(i) and pmax(i) is equal to the probability-mass associated with simulating value y(i). Therefore, when the model is correctly specified (i.e., simulating and data-generating distributions are equal) then stacking these uniform-random draws across the set of unique values in yprime(i,r) is expected to be uniform.
  • When dealing with a delta-distribution (like a Tweedie for biomass-samples in fish/insect/pollen traps, or 0- and 1-inflated proportions in stomach samples), it combines the benefits of both examples above.

Conceptually, the point is that we want to define compare quantiles from a cumulative distribution function with a uniform distribution, and this is not easy for a discrete-valued distribution (PMF) because the cumulative function is discontinuous and its not clear what to do at these discontinuities. It appears that DHARMa resolves this by jittering the x-axis of the PMF so that yprime(i,r) converges on a continuous distribution (with a well-behaved cumulative distribution) as the number of samples R->Inf. A PIT residual instead resolves this by jittering the y-axis values of the PMF. They both randomize to deal with discontinuities, but do so differently.

Returning to your questions:

The min/max calculation sounds maybe more general, but is there any important case where the (theoretical) difference between min / max wouldn't be one?

As I hope my notation clarifies, the difference between pmin and pmax is zero for a continuous distribution (or when y is in the continuous portion of a delta-distribution) and between 0 and 1 for a discrete distribution.

More general, I'm a bit concerned about whether simulations are able to properly determine the appropriate amount of jitter (= randomization). If I simulate from a discrete distribution, it could easily happen (with 250 simulations) that I get for a Binomial, for example, only values of 1 (with observed = 1), and theoretically it could also happen that I skip an intermediate step, e.g. that I observer in a Poisson 3, and I simulate values of 1 and 3, 4. In the first case, it seems as if the PIT method doesn't produce any jitter, in the second, it would jitter with a uniform of width 2, which seems inefficient, because it should jitter with U(0,1). In general, I don't really see the advantage of determining the appropriate "width" of the uniform by simulation, as opposed to determining it from the distribution directly, as it is currently done in DHARMa (except for the point that this would allow to determine the jitter for each data point independently, but that could also be solved differently).

PIT residuals eliminate the need to jitter observed and simulated (y(i) and yprime(i,r)) used to define the x-axis of the cumulative function, and instead jitter from the y-axis of the (potentially discontinuous) cumulative function. PIT residuals use the same algorithm for discrete, continuous and mixture (delta) distributions and therefore eliminates the need to detect which distribution is occurring.

For the specific case of the Tweedie: I was aware of the problem, and I we had an issue open about this before (I think this actually led to me implementing this test for duplicates in continuous distributions). I had considered that you could put the jitter only on the "discrete" = zero values, which is still an option, but had at that point decided that there is not much lost by jittering everything, and it seemed safer / easier to me. However, if I would apply the PIT procedure on the Tweedie, wouldn't that fail? Let's say I observe zero, and then the Tweedie will basically simulate a lot of zeros, plus larger values - how would the algorithm that you propose above create proper residuals for that? There are no values simulated < 0, so how do you set min?

For the Tweedie, delta, or 0- and 1-inflated proportions, whenever y(i)=0 then pmin(i)=0 and pmax(i) is Monte Carlo simulation of the predicted probability of observing a zero conditional on the model and estimates. So it draws from a uniform distribution with bounds equal to the PMF for zero in that distribution.

Anyhoo, sorry that this is a long response. I'd be happy to explore examples or sample code.

@florianhartig
Copy link
Owner

Hi Jim,

thanks for your time / explanations. Somehow I didn't really get the idea until I implemented it, but now it's quite clear, and I agree, it seems very elegant and possibly a more robust solution than what I had before.

I have made a test in the branch called "PIT-Residuals" - you can install DHARMa with this variant via

devtools::install_github(repo = "florianhartig/DHARMa", subdir = "DHARMa", 
                         ref = "PIT-Residuals", dependencies = T, build_vignettes = T)

The PIT implementation is here

scaledResiduals = rep(NA, n)

@florianhartig
Copy link
Owner

p.s. - the PIT is set as default in this branch, so if you run your model, it should (ideally) work now.

@James-Thorson-NOAA
Copy link
Author

Great! Yes, I just tried the version on the "PIT-residuals" branch and it gives results that are identical "by eye" to my previous implementation of PIT residuals. Specifically, I compared a nontrivial case of a delta-model, with some zeros, and many positive values <<1, where the previous jittering of observed and simulated values resulted in smearing together the zeros and the positive values; the PIT residuals work as intended in this case. I also read the code in getQuantile(.) and confirm that it does what I was intending. However, please note that I can't claim to understand the intent of all choices made in DHARMa.ecdf, so I can't vouch for what might be lost in not using that S3-class in when calculating PIT residuals.

Do you have other integrated or unit-tests to do, or any plan for when this change will be added to the main branch with a new release number?

Once it is a main branch, I will add that semantic-version release number as a dependency of my package VAST (to ensure that users have installed a version of DHARMa that does PIT residuals). Please tell me if I can help further. Finally, thanks again for creating and maintaining DHARMa! I think it will be very useful for my package, and obviously is for many others as well.

@florianhartig
Copy link
Owner

Hi Jim,

thanks for your feedback! I will merge this into master now, and will probably roll this out with the 0.3.1 update of DHARMa. Will let you know when this happens, maybe in a week or two.

The changes already pass all my unit tests, but it would be helpful to create a few non-trivial examples with Tweedie or other similar data. If you have code ready for simulating / fitting this data, would you mind sharing it with me here or via email, so that I could consider if it should be added to the unit tests?

Best
Florian

florianhartig added a commit that referenced this issue May 4, 2020
@florianhartig
Copy link
Owner

Hi Jim, one other question - are you planning to use DHARMa in VAST for internal calculations, or would it be possible to support VAST from the DHARMa side, so that a fitted VAST model could be treated like any other supported regression model? As I haven't worked with VAST, I don't know how much sense this makes, but feel free to comment here #170

@James-Thorson-NOAA
Copy link
Author

James-Thorson-NOAA commented May 4, 2020

Florian,

I don't have an example application for applying PIT residuals to a Tweedie distribution (or other continuous-delta mixture model) that would be suitable as a unit-test; the VAST application requires a bunch of dependencies (including TMB and INLA) that can be a bit more finicky than you'd want for a unit-test.

I could look to find time to make one using an artificially simple example using known parameter values and the tweedie::rtweedie function: would that be helpful, or facilitate maintenance of PIT residuals in DHARMa?

And I envision using DHARMa when calling summary on the S3 class generated when using the user-interface functions to fit VAST, see here for now. In the future I'd like to learn more about how glmmTMB interfaces with DHARMa, in particular to define unconditional simulations and using the TMB functionality for oneStepPredict residuals. But I doubt I have time to explore that for a bit. I would of course welcome any thoughts from you or anyone on residuals, so please email me if you're curious to learn more about VAST (it combines features of kriging, joint SDM and generalized linear latent-variable models, empirical orthogonal function analysis, and glmmTMB).

@florianhartig
Copy link
Owner

Hi Jim,

no worries about the example, I can also create one myself.

About the interface: in general, for DHARMa to interface with the fitted model, what is mostly needed is a simulate function. Details are described here https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA. If you would provide those functions, I could interface directly.

If you do this, it would be ideal if both predict and simulated would allow to specify which model structures (in particular REs) to condition on. glmmTMB's predict and simulate functions were in the beginning only unconditional, which was a problem. Now at least one can switch from unconditional to conditional. I have been discussing with the glmmTMB crew about this, and they plan to support the re.form argument of lme4 in the future, but I think that is not fully implemented yet.

I see you already return a DHARMa object invisible though ... I guess for most users, this will be sufficient if one shows in the help how this object can be further tested / plotted with DHARMa functions. The only thing that would be improved by a direct DHARMa interface is that the user would have tighter control about the simulation settings, and could possible also use the refit options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants