generated from jtr13/cctemplate
-
Notifications
You must be signed in to change notification settings - Fork 57
/
forest_plots.Rmd
200 lines (155 loc) · 8.82 KB
/
forest_plots.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
# Forest Plots
Shreya Rao
```{r, include=F}
library(tidyverse)
library(forestplot)
```
## Introduction
- Forest Plots provide tabular and graphical information about estimates of comparisons or associations, corresponding precision, and statistical significance.
- A forest plot arrays point estimates (e.g., mean) and confidence intervals (e.g., 95% CI) represented by whiskers for multiple studies and/or multiple findings within a study in a horizontal orientation.
- Explanatory text displayed next to these points and whisker diagrams provide details regarding the data being presented.
- A vertical reference line is typically plotted at the null hypothesis, with the statistical significance of an individual point and whiskers compared to that reference line.
- In cases where the data being compared are difference between means, the null is zero (0) and the x scales are normal. When a ratio (e.g., odds ratio) is being compared, the null has a value of 1 and the scales are logarithmic.
**Example 1:**
![](resources/forest_plot/SA_Cognitive.JPG)
**Example 2:**
![](resources/forest_plot/wiki.PNG)
## Data
Assume a study conducted over a year from May 2020 to June 2021 collected the mortality rates of people along with socio-demographic and medical information. Researchers want to see if factors such as Sex, Socioeconomic condition, Period of the pandemic and Preexisting conditions influenced the mortality rate. To do so, they calculated the odds ratio of mortality rates of the factors with respect of a base factor within the four different categories.
After this analysis, the data looks like this:
```{r}
or.df <- data.frame(
factors = c("Female (ref)", "Male",
"High (ref)", "Low",
"Period 1 (05/17/2020-09/30/2020) (ref)", "Period 2 (10/01/2020-01/31/2021)",
"Period 3 (02/01/2021-05/14/2021)", "Period 4 (05/14/2021-06/14/2021)",
"No comorbidity (ref)", "Diabetes", "Hypertension", "Chronic respiratory illness", "Chronic renal disease", "Chronic liver disease", "Other"),
groups = c("Sex", "Sex",
"Socio-demographic factors", "Socio-demographic factors",
"Period of the pandemic", "Period of the pandemic",
"Period of the pandemic", "Period of the pandemic",
"Preexisting conditions", "Preexisting conditions", "Preexisting conditions",
"Preexisting conditions", "Preexisting conditions", "Preexisting conditions", "Preexisting conditions"),
or = c(NA, 1.15, NA, 1.76, NA, 0.90, 0.84, 6.09, NA, 1.05, 1.13, 1.05, 2.33, 2.08, 1.44),
ll = c(NA, 1.09, NA, 1.56, NA, 0.85, 0.77, 4.23, NA, 0.97, 1.04, 0.92, 2.04, 1.57, 1.29),
ul = c(NA, 1.22, NA, 1.98, NA, 0.95, 0.91, 8.23, NA, 1.15, 1.23, 1.20, 2.67, 2.74, 1.61))
or.df
```
Note that this dataset has the columns - factors, groups/categoies to which the factors belong to, odds ratio, lower limits of confidence interval and upper limits of confidence intervals of the odds ratio. The factors columns also indicates the reference factor of every group. The reference groups within each category has NA values for the odds ratio, lower and upper confidence level limits.
Just looking at the values in the table, we can not fully understand the nuances. To better understand this analysis, we need to make a forest plot. We will use the library **forestplot** to plot it.
## Formatting
The variables need to be formatted in a specific way to utilize the *forestplot()* function. Since we want to display the confidence intervals and or values, they can also be formatted appropriately.
```{r}
#formating confidence intervals as (ll, ul) and adding to dataset
or.df$ci <- ifelse(!is.na(or.df$or), paste("(", as.character(or.df$ll), ", ", as.character(or.df$ul), ")", sep = ""), "")
ci <- c("95% CI",
"", or.df$ci[1:2], "", #sex
"", or.df$ci[3:4], "", #socio-demographic
"", or.df$ci[5:8], "", #period
"", or.df$ci[9:15] #pre-existing condition
)
```
To make the plot more presentable, we want to separates the factor categories from each other by adding a blank line after each category. To do so in *forestplot()*, we add an empty string to character vectors and NA values to numeric vectors.
```{r}
#adding NA values and factor categories to or, upper limit and lower limit vectors
#formatting OR values that are displayed on graph
or.char <- c("OR",
"", as.character(or.df$or[1:2]), "", #sex
"", as.character(or.df$or[3:4]), "", #socio-demographic
"", as.character(or.df$or[5:8]), "", #period
"", as.character(or.df$or[9:15]) #pre-existing condition
)
#formatting OR values forestplot uses
or <- c(NA,
NA, or.df$or[1:2], NA,#sex
NA, or.df$or[3:4], NA, #socio-demographic
NA, or.df$or[5:8], NA, #period
NA, or.df$or[9:15] #pre-existing condition
)
#formatting lower limit values forestplot uses
ll <- c(NA,
NA, or.df$ll[1:2], NA, #sex
NA, or.df$ll[3:4], NA, #socio-demographic
NA, or.df$ll[5:8], NA, #period
NA, or.df$ll[9:15] #pre-existing condition
)
#formatting upper limit values forestplot uses
ul <- c(NA,
NA, or.df$ul[1:2], NA, #sex
NA, or.df$ul[3:4], NA, #socio-economic
NA, or.df$ul[5:8], NA, #period
NA, or.df$ul[9:15] #pre-existing condition
)
#formatting the categories and factor names
factors <- c("",
"Sex", or.df$factors[1:2], "", #sex
"Socioeconomic condition", or.df$factors[3:4], "", #socio-demographic
"Period of pandemic", or.df$factors[5:8], "", #period
"Preexisting conditions", or.df$factors[9:15] #pre-existing condition
)
```
Again, to make the plot more readable, we can specify which rows of the forest plot are bolded. We do so by creating a summary vector, which specifies the rows of the plot we want to highlight. In this case, we want to make the text on the headings and category titles stand out, so we create a vector of TRUE/FALSE that has value TRUE when we want bold text and FALSE otherwise.
```{r}
#specifying bolded text
summary <- c(rep(TRUE, 2), rep(FALSE, 3), TRUE, rep(FALSE, 3), TRUE, rep(FALSE, 5), TRUE, rep(FALSE, 7))
```
## Forest Plot
*forestplot()* takes a specific type of dataframe. We need to specify the mean (OR values), lower (lower limits of confidence intervals), upper (upper limits of confidence intervals), study (the factors), confint (the formatted confidence intervals *forestplot()* displays) and OR (the OR values *forestplot()* displays).
```{r}
plot.data <- data.frame(mean = or,
lower = ll,
upper = ul,
study = factors,
confint = ci,
or = or.char)
```
```{r}
forestplot(plot.data, labeltext = c(study, or, confint),
is.summary = summary,
new_page = T,
col = fpColors(box = "royalblue",
line = "darkblue"),
txt_gp = fpTxtGp(cex = 0.8, label = gpar(fontfamily = "serif"),
ticks=gpar(fontsize=5, cex=1.4)),
graph.pos = 2,
boxsize = 0.2,
graphwidth = unit(45, 'mm'),
title = "Forest Plot of Mortality Rates")
```
This plot looks fine, but the x-axis scale can be better because the confidence intervals range is not fully apparent. This can be fixed by changing the scale to logarithmic.
```{r}
forestplot(plot.data, labeltext = c(study, or, confint),
xlog = TRUE,
is.summary = summary,
new_page = T,
col = fpColors(box = "royalblue",
line = "darkblue"),
txt_gp = fpTxtGp(cex = 0.8, label = gpar(fontfamily = "serif"),
ticks=gpar(fontsize=5, cex=1.4)),
graph.pos = 2,
boxsize = 0.2,
graphwidth = unit(45, 'mm'),
title = "Forest Plot of Mortality Rates")
```
This is better, but to make it even more clear, we can specify the x-axis ticks (using xticks) and clip the conidence interval range (using clip).
```{r}
forestplot(plot.data, labeltext = c(study, or, confint),
clip = c(0.5, 5),
xticks = c(0.5, 1, 2, 4, 8),
xlog = TRUE,
is.summary = summary,
new_page = T,
col = fpColors(box = "royalblue",
line = "darkblue"),
txt_gp = fpTxtGp(cex = 0.8, label = gpar(fontfamily = "serif"),
ticks=gpar(fontsize=5, cex=1.4)),
graph.pos = 2,
boxsize = 0.2,
graphwidth = unit(45, 'mm'),
title = "Forest Plot of Mortality Rates")
```
## Understanding forestplot() better
![](resources/forest_plot/visual.JPG)
## References
1. https://www.sciencedirect.com/topics/medicine-and-dentistry/forest-plot
2. https://en.wikipedia.org/wiki/Forest_plot