-
Notifications
You must be signed in to change notification settings - Fork 0
/
recodes.Rmd
155 lines (111 loc) · 4.22 KB
/
recodes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "Recoding Variables"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Recoding Variables}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(knitr)
library(kableExtra)
```
```{r setup, echo=FALSE}
library(qacr)
df <- data.frame(sex=c(1,2,1,2,2,2),
race=c("b", "w", "a", "b", "w", "h"),
outcome=c("better", "worse", "same", "same", "better", "worse"),
Q1=c(20, 30, 44, 15, 50, 99),
Q2=c(15, 23, 18, 86, 99, 35),
age=c(12, 20, 33, 55, 30, 100),
rating =c(1,2,5,3,4,5))
```
The `recodes()` functions makes it very easy to recode one or more variables in the your data frame. The format is </br></br>
**`newdata <- recodes(olddata, variables, from values, to values)`**
### Original dataset
Consider the following data set (below). Lets make the following changes.
```{r, echo=FALSE}
kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
```
#### sex
For `sex`, set 1 to "Male" and 2 to "Female".
```{r}
df <- recodes(data=df, vars="sex",
from=c(1,2), to=c("Male", "Female"))
```
```{r, echo=FALSE}
kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
```
#### race
Recode `race` to "White" vs. "Other".
```{r}
df <- recodes(data=df, vars="race",
from=c("w", "b", "a", "h"),
to=c("White", "Other", "Other", "Other"))
```
```{r, echo=FALSE}
kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
```
#### outcome
Recode `outcome` to 1 (better) vs. 0 (not better).
```{r}
df <- recodes(data=df, vars="outcome",
from=c("better", "same", "worse"),
to=c(1, 0, 0))
```
```{r, echo=FALSE}
kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
```
#### Q1 and Q2
For `Q1` and `Q2` set values of 86 and 99 to missing.
```{r}
df <- recodes(data=df, vars=c("Q1", "Q2"),
from=c(86, 99), to=NA)
```
```{r, echo=FALSE}
kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
```
#### age
For `age`, set values
* less than 20 or greater than 90 to missing,
* 20 <= age <= 30 to "Younger",
* 30 < age <= 50 to "Middle Aged", and
* 50 < age <= 90 to "Older".
You can use expressions in your `from` fields. When they are `TRUE`, the corresponding `to` values will be applied. We will use the dollar sign (**\$**) to represent the variable (age in this case). The symbols ( **\|**, **\&** ) mean **OR** and **AND** respectively.
```{r}
df <- recodes(data=df, vars="age",
from=c("$ < 20 | $ > 90",
"$ >= 20 & $ <= 30",
"$ > 30 & $ <= 50",
"$ > 50 & $ <= 90"),
to=c(NA, "Younger", "Middle Aged", "Older"))
```
We can also write this as
```{r eval=FALSE}
df <- recodes(data=df, vars="age",
from=c("$ < 20", "$ <= 30", "$ <= 50", "$ <= 90", "$ > 90"),
to= c(NA, "Younger", "Middle Aged", "Older", "NA"))
```
This works because once the age value for an observations meets a criteria that is `TRUE` (working left to right), it is recoded. It isn't changed again by later criteria in the same `recodes` statement.
```{r, echo=FALSE}
kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
```
#### rating
Finally, for the `rating` variable, reverse the scoring so that 1 to 5 becomes
5 to 1.
```{r}
df <- recodes(data=df, vars="rating", from=1:5, to=5:1)
```
```{r, echo=FALSE}
kbl(df) %>% kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
```
### Note
Remember that `recodes` returns a data frame, not a variable.
* `df <- recodes(data=df, vars="rating", from=1:5, to=5:1)` **is correct**.
* `df$rating <- recodes(data=df, vars="rating", from=1:5, to=5:1)` **is not**.
This allows you to apply the same recoding scheme to more than one variable at a time (e.g., Q1 and Q2 above).
**<span style="color: red;">And that's it (APPLAUSE, APPLAUSE)!</span>**