-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest using probability-integral-transform (PIT) residuals for delta/Tweedie models #168
Comments
Hi James, thanks for this suggestion, I'm happy to give this a go. I find it hard to wrap my head around the concept of the PIT residuals in general though, and specifically to decide if and when the PIT Residuals would be different from DHARMa residuals. I wonder if you have any thoughts about the following points:
Best, |
Florian, Thanks for your interest and quick response! I realize maintaining a package is a ton of work, and also appreciate your attention in carefully reviewing any proposed changes. Based on your response, I think we might be talking past each about the algorithm for calculating PIT residuals. So first I suggest that I explain the algorithm more carefully:
This algorithm has a few use-cases and properties:
Conceptually, the point is that we want to define compare quantiles from a cumulative distribution function with a uniform distribution, and this is not easy for a discrete-valued distribution (PMF) because the cumulative function is discontinuous and its not clear what to do at these discontinuities. It appears that DHARMa resolves this by jittering the x-axis of the PMF so that Returning to your questions:
As I hope my notation clarifies, the difference between
PIT residuals eliminate the need to jitter observed and simulated (
For the Tweedie, delta, or 0- and 1-inflated proportions, whenever Anyhoo, sorry that this is a long response. I'd be happy to explore examples or sample code. |
Hi Jim, thanks for your time / explanations. Somehow I didn't really get the idea until I implemented it, but now it's quite clear, and I agree, it seems very elegant and possibly a more robust solution than what I had before. I have made a test in the branch called "PIT-Residuals" - you can install DHARMa with this variant via
The PIT implementation is here Line 60 in b4a828b
|
p.s. - the PIT is set as default in this branch, so if you run your model, it should (ideally) work now. |
Great! Yes, I just tried the version on the "PIT-residuals" branch and it gives results that are identical "by eye" to my previous implementation of PIT residuals. Specifically, I compared a nontrivial case of a delta-model, with some zeros, and many positive values <<1, where the previous jittering of observed and simulated values resulted in smearing together the zeros and the positive values; the PIT residuals work as intended in this case. I also read the code in Do you have other integrated or unit-tests to do, or any plan for when this change will be added to the main branch with a new release number? Once it is a main branch, I will add that semantic-version release number as a dependency of my package VAST (to ensure that users have installed a version of DHARMa that does PIT residuals). Please tell me if I can help further. Finally, thanks again for creating and maintaining DHARMa! I think it will be very useful for my package, and obviously is for many others as well. |
Hi Jim, thanks for your feedback! I will merge this into master now, and will probably roll this out with the 0.3.1 update of DHARMa. Will let you know when this happens, maybe in a week or two. The changes already pass all my unit tests, but it would be helpful to create a few non-trivial examples with Tweedie or other similar data. If you have code ready for simulating / fitting this data, would you mind sharing it with me here or via email, so that I could consider if it should be added to the unit tests? Best |
Hi Jim, one other question - are you planning to use DHARMa in VAST for internal calculations, or would it be possible to support VAST from the DHARMa side, so that a fitted VAST model could be treated like any other supported regression model? As I haven't worked with VAST, I don't know how much sense this makes, but feel free to comment here #170 |
Florian, I don't have an example application for applying PIT residuals to a Tweedie distribution (or other continuous-delta mixture model) that would be suitable as a unit-test; the VAST application requires a bunch of dependencies (including TMB and INLA) that can be a bit more finicky than you'd want for a unit-test. I could look to find time to make one using an artificially simple example using known parameter values and the And I envision using DHARMa when calling |
Hi Jim, no worries about the example, I can also create one myself. About the interface: in general, for DHARMa to interface with the fitted model, what is mostly needed is a simulate function. Details are described here https://github.com/florianhartig/DHARMa/wiki/Adding-new-R-packages-to-DHARMA. If you would provide those functions, I could interface directly. If you do this, it would be ideal if both predict and simulated would allow to specify which model structures (in particular REs) to condition on. glmmTMB's predict and simulate functions were in the beginning only unconditional, which was a problem. Now at least one can switch from unconditional to conditional. I have been discussing with the glmmTMB crew about this, and they plan to support the re.form argument of lme4 in the future, but I think that is not fully implemented yet. I see you already return a DHARMa object invisible though ... I guess for most users, this will be sufficient if one shows in the help how this object can be further tested / plotted with DHARMa functions. The only thing that would be improved by a direct DHARMa interface is that the user would have tighter control about the simulation settings, and could possible also use the refit options. |
Florian and whomever may be interested,
I'm exploring using DHARMa to automate packaging, plotting, and testing residuals for R package VAST. However, the DHARMa calculation for quantile residuals performs poorly for the delta-models used in that package, and I think an easy solution (which would also reduce to existing practices in other cases, plus have additional theoretical support) would be to use probability-integral-transform PIT residuals. I am writing to suggest easy changes to implement this in DHARMa (to push the change up my stack of code), and am happy to do some back-and-forth or provide code.
DHARMa appears to calculate residuals by first detecting whether there are duplicates (in observations or predictions) and if there then proceeding as if the response is discrete-valued (Poisson or binomial, etc). In this case, DHARMa apparently jitters response and simulated values by +/- 0.5. However, a delta- or Tweedie-model has mass at 0 and then continuous values (representing e.g. a biomass response) above this. In these cases, DHARMa detects an integer response and jitters values. However, the amount of jitter is always 0.5, whereas the scale of positive values in a delta-model depends upon the units where residuals should be scale invariant. e.g., when increasing measurement units from grams to tons, it then decreases the distribution of positive values, such that they are then increasingly "smeared" across the zeros, resulting in DHARMa typically stating the model fits better.
The solution is to calculate PIT residuals for every observation by:
For a continuous-valued response, this should be identical to what DHARMa is already calculating (although I haven't tested this). For discrete-valued response, it should be similar to the current jittering approach. For delta-valued response it obviously fixes the issue I flagged. (I haven't tested the continuous and discrete-valued cases, but have tested the improvement for a delta-model) So it contains existing use-cases and is easy to implement in approx. three lines of R code. Presumably Dave Warton could provide a detailed explanation for the theoretical benefits of PIT residuals, e.g., explored here for other purposes .
The text was updated successfully, but these errors were encountered: