-
Notifications
You must be signed in to change notification settings - Fork 0
/
zeroinfl.Rmd
108 lines (93 loc) · 4.41 KB
/
zeroinfl.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: "Zero-inflated discrete"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
### Zero-Inflation regression models {.tabset .tabset-pills}
`resid_zeroinfl()` is used for calculating the DPIT residuals for regression models with zero-inflation outcomes and drawing corresponding QQ-plots. The suitable model objects are as follows:
* Zero-Inflated Poisson, `pscl::zeroinfl(dist = "poisson")`
* Zero-Inflated negative binomial, `pscl::zeroinfl(dist = "negbin")`
#### Zero-Inflation Poisson
We simulate data using a zero-inflated Poisson model. The probability of excess
zeros is modeled with $\mathrm{logit}(p_0) = \beta_{00} + \beta_{10}X_1$,
and the Poisson component has a mean $\lambda = \exp(\beta_0 +\beta_1X_1 +\beta_2X_2)$,
where $X_1 \sim N(0,1)$ and $X_2$ is a dummy variable with a probability of 1 equal to 0.7,
and $( \beta_{00} ,\beta_{10}, \beta_0, \beta_1, \beta_2) = (-2, 2, -2, 2, 1)$.
```{r zeroinfl poi, message=FALSE}
## Zero-Inflated Poisson
library(assessor)
library(pscl)
n <- 500
set.seed(1234)
# Covariates
x1 <- rnorm(n)
x2 <- rbinom(n, 1, 0.7)
# Coefficients
beta0 <- -2
beta1 <- 2
beta2 <- 1
beta00 <- -2
beta10 <- 2
# Mean of Poisson part
lambda1 <- exp(beta0 + beta1 * x1 + beta2 * x2)
# Excess zero probability
p0 <- 1 / (1 + exp(-(beta00 + beta10 * x1)))
## simulate outcomes
y0 <- rbinom(n, size = 1, prob = 1 - p0)
y1 <- rpois(n, lambda1)
y <- ifelse(y0 == 0, 0, y1)
```
```{r 2, fig.align='center', fig.width=10}
par(mfrow=c(1,2))
## True model
modelzero1 <- zeroinfl(y ~ x1 + x2 | x1, dist = "poisson", link = "logit")
resid1 <- resid_zeroinfl(modelzero1, plot = TRUE, scale = "uniform")
## Zero inflation
modelzero2 <- glm(y ~ x1 + x2, family = poisson(link = "log"))
resid2 <- resid_disc(modelzero2, plot = TRUE, scale = "normal")
```
The figure above illustrates QQ plots corresponding to `modelzero1` and `modelzero2`. Given that the true underlying distribution of `y` follows a zero-inflated Poisson distribution,
we anticipate observing deviations from the diagonal line in the QQ plot of `modelzero2`.
As expected, the left QQ plot aligns closely along the diagonal line. However, in the right panel, both the left and right tails of the QQ plot for `modelzero2` deviate from the diagonal line, indicating that the assumption of a Poisson distribution may not be accurate.
Moreover, a noticeable S-shaped pattern arises due to overdispersion.
The discrepancies observed in the QQ plot of `modelzero2` suggest that the Poisson distribution assumption is not well-supported by the data. This finding emphasizes the importance of considering alternative distributional assumptions, such as the zero-inflated Poisson distribution, which may better capture the characteristics of the simulated data.
#### Zero-Inflation Negative Binomial
```{r zero bn, message=FALSE, fig.align='center', fig.width=10}
## Zero-inflated Negative Binomial
library(assessor)
library(pscl)
library(MASS)
n <- 500
set.seed(1234)
# Covariates
x1 <- rnorm(n)
x2 <- rbinom(n, 1, 0.7)
# Coefficients
beta0 <- -2
beta1 <- 2
beta2 <- 1
beta00 <- -2
beta10 <- 2
size1 <- 2
# Mean of Poisson part
lambda1 <- exp(beta0 + beta1 * x1 + beta2 * x2)
# Excess zero probability
p0 <- 1 / (1 + exp(-(beta00 + beta10 * x1)))
## simulate outcomes
y0 <- rbinom(n, size = 1, prob = 1 - p0)
y1 <- rnbinom(n, size=size1, mu= lambda1)
y <- ifelse(y0 == 0, 0, y1)
```
Similarly, we generate a zero-inflated negative binomial random variable denoted as `y`, which is dependent on the variables `x1` and `x2`. The simulation involves modeling the occurrence of zeros in the variable `y` through a zero-inflated negative binomial distribution.
```{r zero2 nb}
model.negbin1 <- zeroinfl(y ~ x1 + x2 | x1, dist = "negbin")
model.negbin2 <- glm.nb(y ~ x1 + x2)
par(mfrow=c(1,2))
resid.zero1 <- resid_zeroinfl(model.negbin1, plot = TRUE, scale = "uniform")
resid.zero2 <- resid_disc(model.negbin2, plot = TRUE, scale = "normal")
```
The figure also presents QQ plots corresponding to `model.negbin1` and `model.negbin2`. In contrast to our simulation setting, `model.negbin2` assumes a negative binomial distribution rather than a zero-inflated negative binomial distribution. Consequently, the right panel of the figure displays deviations at both the left and right tails. While these deviations may not be substantial, a comparison with the left panel suggests that the QQ plots lean towards supporting the zero-inflated negative binomial regression.