-
Notifications
You must be signed in to change notification settings - Fork 0
/
ord.Rmd
39 lines (31 loc) · 2.25 KB
/
ord.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
title: "Ordered curve"
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
### Assessing the mean structure
`ord_curve()`creates a plot to assess the mean structure of regression models. The plot compares the cumulative sum of the response variable and its hypothesized value. Deviation from the diagonal suggests the possibility that the mean structure of the model is incorrect.
Ordered curve method is not restricted to a discrete outcome regression model. This function also supports a continuous outcome regression such as `lm` object as well as `glm`, `glm.nb`, or `polr`.
In the example below, the underlying model is a logistic regression with the probability of 1 as $\mathrm{logit}^{-1}(\beta_0+\beta_1 X_1+\beta_2 X_2+\beta_3X_1 X_2)$, where $(\beta_0,\beta_1,\beta_2,\beta_3)=(-5,2,1,3)$, $X_1\sim N(1,1)$, and $X_2$ is a dummy variable with a probability of one equal to 0.7. For the misspecified model, the binary covariate and the interaction term are omitted.
#### Example
```{r ord curve poisson, fig.align='center', fig.width=10}
library(assessor)
## Binary example of ordered curve
n <- 500
set.seed(1234)
x1 <-rnorm(n,1,1); x2 <- rbinom(n,1,0.7)
beta0 <- -5; beta1 <- 2; beta2<- 1; beta3 <- 3
q1 <-1/(1+exp(beta0+beta1*x1+beta2*x2+beta3*x1*x2))
y1 <- rbinom(n,size=1,prob = 1-q1)
par(mfrow=c(1,2))
model0 <- glm(y1~x1*x2,family =binomial(link = "logit") )
ord_curve(model0,thr=model0$fitted.values)
model1 <- glm(y1~x1,family =binomial(link = "logit") )
ord_curve(model1,thr=x2)
```
The figures above illustrate ordered curves plots corresponding to `model0` and `model1`. In the left panel, the curve closely aligns with the diagonal line, indicating that the mean structure of `model0` is correctly specified. On the contrary, `model1` exhibits a deviation from the diagonal line due to the omission of the variable $x_2$. This misspecification, coupled with the choice of the threshold value as $x_2$, results in the observed discrepancy in the ordered curve in the right panel.
The substantial disparity between the observed curve in `model1` and the diagonal line strongly suggests the necessity of including the variable $x_2$ in the model. This inclusion is crucial for accurately capturing the underlying mean structure.