-
Notifications
You must be signed in to change notification settings - Fork 0
/
semiconti.Rmd
111 lines (90 loc) · 3.99 KB
/
semiconti.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
title: "Semicontinuous"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
### Semicontinuous outcome regression models {.tabset .tabset-pills}
`resid_semiconti()` is used for calculating the DPIT residuals for regression models with semicontinuous outcomes and constructing corresponding QQ-plots. Specifically, a Tobit regression and a Tweedie regression model are suitable models for `resid_semiconti()`. The suitable model objects are as follows:
* Tweedie, `glm(family= tweedie())`
* Tobit(VGAM), `VGAM::vglm()`
* Tobit(AER), `AER::tobit()`
#### Tweedie
We simulate `y1` to follow Tweedie distribution depending on covariates `x11` and `x12`.
```{r tweedie1}
## Tweedie model
library(assessor)
library(tweedie)
library(statmod)
n <- 500
x11 <- rnorm(n)
x12 <- rnorm(n)
beta0 <- 5
beta1 <- 1
beta2 <- 1
lambda1 <- exp(beta0 + beta1 * x11 + beta2 * x12)
y1 <- rtweedie(n, mu = lambda1, xi = 1.6, phi = 10)
```
In constructing `model2`, the intentional omission of the covariate `x12` was aimed at facilitating a direct comparison with the true model, `model1`. As expected, the QQ plot in the right panel corresponding to `model2` exhibits a substantial deviation from the diagonal line, attributing this deviation to the deliberate omission. Conversely, the left panel shows a closer alignment along the diagonal, implying a better fit when including both covariates, `x11` and `x12`. This outcome strongly suggests that incorporating all covariates results in a more appropriate and improved model.
```{r tweedie2, fig.align='center', fig.width=10}
# True model
model1 <-
glm(y1 ~ x11 + x12,
family = tweedie(var.power = 1.6, link.power = 0)
)
# missing covariate
model2 <- glm(y1 ~ x11 ,
family = tweedie(var.power = 1.6, link.power = 0)
)
par(mfrow=c(1,2))
resid1 <- resid_semiconti(model1)
resid2 <- resid_semiconti(model2)
```
#### Tobit
`resid_semiconti()` function supports calculating DPIT residuals for a Tobit regression from both `VGAM::vglm` and `AER::tobit` packages.
In this example,
we assume that the latent variable $Y^*$ follows a normal distribution with a mean given by
$$\mu=\beta_0+\beta_1X_1+\beta_2X_2, $$ where $X_1, X_2 \sim Unif (0, 1)$ independently, and $\beta_0 = 1, \beta_1 = -3,
\beta_2 = 3$.
We observe $Y=0$ if $Y^*<0$.
These variables will be employed as inputs for Tobit regression analyses provided by the `VGAM` and `AER` packages.
```{r tobit vgam1, message=FALSE}
## Tobit regression model
library(VGAM)
beta13 <- 1
beta14 <- -3
beta15 <- 3
set.seed(123)
x11 <- runif(n)
x12 <- runif(n)
lambda1 <- beta13 + beta14 * x11 + beta15 * x12
sd0 <- 0.3
yun <- rnorm(n, mean = lambda1, sd = sd0)
y <- ifelse(yun >= 0, yun, 0)
```
The model `fit1miss`, corresponding to the right QQ plot, intentionally omits the covariate x12. This omission leads to a deviation from the diagonal line in the QQ plot, observed in the right panel. Thus, it can be interpreted as the model misspecification. In contrast, the left panel, corresponding to the model including all covariates, aligns closely along the diagonal line.
```{r tobit vgam2, fig.align='center', fig.width=10}
# Using VGAM package
# True model
fit1 <- vglm(formula = y ~ x11 + x12, tobit(Upper = Inf, Lower = 0, lmu = "identitylink"))
# Missing covariate
fit1miss <- vglm(formula = y ~ x11, tobit(Upper = Inf, Lower = 0, lmu = "identitylink"))
par(mfrow=c(1,2))
resid1 <- resid_semiconti(fit1, plot = TRUE)
resid2 <- resid_semiconti(fit1miss, plot = TRUE)
```
The interpretation remains the same as the `VGAM` example. Note that the results from the `AER` are exactly the same as those from the `VGAM` example.
```{r tobit aer, fig.align='center', fig.width=10, message=FALSE}
# Using AER package
library(AER)
# True model
fit2 <- tobit(y ~ x11 + x12, left = 0, right = Inf, dist = "gaussian")
# Missing covariate
par(mfrow=c(1,2))
fit2miss <- tobit(y ~ x11, left = 0, right = Inf, dist = "gaussian")
reisd1 <- resid_semiconti(fit2, plot = TRUE)
resid2 <- resid_semiconti(fit2miss, plot = TRUE)
```