-
Notifications
You must be signed in to change notification settings - Fork 0
/
04_MetaAnalysis.Rmd
510 lines (399 loc) · 17.2 KB
/
04_MetaAnalysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
```{r MEsummary_Libraries2, message=FALSE, include=FALSE}
if (!require("pacman"))
install.packages("pacman")
pacman::p_load(tidyverse, kableExtra, bomrang, lme4, RColorBrewer,metafor, netmeta)
if (!require("theme.usq"))
devtools::install_github("adamhsparks/theme.usq")
library(theme.usq)
theme_set(theme_usq())
# Data
slimmer_PM_dat <- read.csv("cache/slimmer_PM_clusterdat.csv")
```
# Meta-analysis
## Grain yield meta-analysis
Let's get started with the analysis by first finding the best model fit that answers our research question.
> Which spray management scenario provides the greatest yield protection from powdery mildew.
- Grain yield is our response variable.
- Trial, which resolves combinations of categorical variables: year, location, row spacing and cultivar; is set as a random intercept.
- We will test spray management (our treatment) as a fixed effect and random slope to trial.
```{r LME_slimmer_cluster_disease_mod}
m8 <- lmer(grain_yield.t.ha * 1000 ~ factor(spray_management) +
(factor(spray_management) | trial_ref),
data = slimmer_PM_dat)
m9 <- lmer(log(grain_yield.t.ha * 1000) ~ factor(spray_management) +
(factor(spray_management) | trial_ref),
data = slimmer_PM_dat)
m10 <- lmer(log(grain_yield.t.ha * 1000) ~
(factor(spray_management) | trial_ref),
data = slimmer_PM_dat)
m11 <-
lmer(log(grain_yield.t.ha * 1000) ~ factor(spray_management) +
(1 | trial_ref),
data = slimmer_PM_dat)
anova(m8, m9) # m9 is significantly better model
anova(m9, m10) # m9 is significantly better model
anova(m9, m11) # m11 is a simpler model which is no different to m9
```
The best model from the four above is `m9` with the lower AIC of -29.357.
```{r}
summary(m9)
```
This linear mixed effect model shows indicates:
- A single early spray before first sign of powdery mildew is not likely to increase yields.
- A single or single spray with one or more follow sprays starting at the `recommended` first spray , within 3 days of powdery mildew first sign are likely to produce significantly higher grain yields compared to the no spray control.
- The `Late_plus` spray which has one or more follow-up sprays after first sign are likely to increase the mean grain yield.
- `Recommended_plus` spray treatments showed the highest mean grain yield, and was significantly higher than the no spray control. However showed no difference to either the `recommended` or `Late_plus` treatments.
### Imputing sample variances
We need to impute the variances which are missing for a few of the trials. These trials `r unique(slimmer_PM_dat[is.na(slimmer_PM_dat$yield_error),c("location","year", "trial_ref")])$trial_ref` were analysed and reported that there was no significant difference between the treatments in each trial. So therefore we need the imputed variances not to show a significant difference.
There are `r sum(is.na(slimmer_PM_dat$yield_error))` treatments without yield error in our data, and `r sum(!is.na(slimmer_PM_dat$yield_error))` where yield error was recorded.
Plotting a histogram of the variances show that the yield is not normally distributed. A log transformation, however, shows a normal distribution. We can use the mean and standard deviation of the log(V) to sample variances for the treatments where V is missing.
```{r variance_histograms}
hist_usq(slimmer_PM_dat$yield_error) # yield error is not normally distributed
hist_usq(log(slimmer_PM_dat$yield_error)) # a log transformation shows it is close to normally distributed
```
We will evaluate whether the imputed variances confer a significant difference to the recorded means. I created a distance matrix of two standard errors above the mean against two standard errors below the mean. If any errors don't overlap, inferring significant difference, the imputed variances are discarded and re-sampled until all treatment variances show no significant differences.
```{r imputing_variances}
source("./R/impute_yield_err.R")
imputed_error_list <- impute_yield_err(dat = slimmer_PM_dat,
xbar = "grain_yield.t.ha",
Var = "yield_error",
reps = "n")
slimmer_PM_dat <- imputed_error_list[["new_dataFrame"]]
```
The imputed variances calculated above don't show the expected positive correlation between grain yield and variance. This is unexpected and reflects the inherent variability of the data. Therefore we will equalise the variance by using the trial mean squares and use this to calculate the variance for each sample.
#### Imputing sample variances from Mean squares
```{r imputing_MSE_dataSpread}
slimmer_PM_dat <- read.csv("cache/slimmer_PM_clusterdat.csv")
hist_usq(unique(slimmer_PM_dat$Y_Msquare))
hist_usq(log(unique(slimmer_PM_dat$Y_Msquare))) # log mean square has a more normal distribution
```
Imputing using a log transformation of the data is required. Which trials need variance imputation?
```{r imputing_MSE}
TrialMSQ <- slimmer_PM_dat %>%
group_by(trial_ref,location,year)%>%
summarise(unique(Y_Msquare))
TrialMSQ[is.na(TrialMSQ$`unique(Y_Msquare)`), c("trial_ref", "location","year")]
```
```{r imputation_MSE}
for (i in TrialMSQ[is.na(TrialMSQ$`unique(Y_Msquare)`), ]$trial_ref) {
slimmer_PM_dat[slimmer_PM_dat$trial_ref == i, "Y_Msquare"] <-
exp(rnorm(
n = 1,
mean = mean(log(TrialMSQ$`unique(Y_Msquare)`), na.rm = TRUE),
sd(log(TrialMSQ$`unique(Y_Msquare)`), na.rm = TRUE)
))
}
```
Before analysis let's have a look at the trimmed down modified data.
```{r treatment_means_plot}
# class spray_managment as a factor and reorder them for the plot
slimmer_PM_dat$spray_management <-
factor(slimmer_PM_dat$spray_management,
levels(factor(slimmer_PM_dat$spray_management))[rev(c(1, 2, 4, 5, 3))])
slimmer_PM_dat %>%
ggplot(aes(y = grain_yield.t.ha, x = spray_management)) +
geom_boxplot() +
#geom_point(position = "jitter", alpha = 1/5)+
geom_jitter(width = 0.1, alpha = 1 / 5) +
labs(x = "Spray management variable",
y = "Grain yield (t/Ha)",
title = "Mean grain yield from each treatment \n categorised by spray management scenario") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype = 2) +
coord_flip()
```
There seems like no difference between the treatments, with exception to Late_plus. Let's do this plot again, but use the proportion of yield increase compared to the no spray control as the response variable.
It is also important to visualise how well the data is compared across the trial years and trials.
```{r table_of_nTreatments}
kable(table(slimmer_PM_dat$spray_management, slimmer_PM_dat$year),
align = rep('c', 8),
caption = "Which treatments and how many treatments are represented in each year") %>%
kable_styling(
"striped",
fixed_thead = TRUE,
full_width = FALSE,
position = "center"
)
```
Treatments Late_plus and early don't have very good comparison to other treatments.
```{r plot_yieldProportion}
slimmer_PM_dat$spray_management <-
factor(slimmer_PM_dat$spray_management, rev(
c(
"control",
"Early",
"Recommended",
"Recommended_plus",
"Late_plus"
)
))
slimmer_PM_dat %>%
ggplot(aes(y = prop_yield_gain, x = spray_management)) +
geom_boxplot() +
#geom_point(position = "jitter", alpha = 1/5)+
geom_jitter(width = 0.1, alpha = 1 / 5) +
labs(x = "Spray management variable",
y = "Grain yield (t/Ha)",
title = "Mean grain yield difference to the control for each treatment \n categorised by spray management scenario") +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, linetype = 2) +
coord_flip()
```
## Meta-analysis
### metafor package
Let's load the metafor package we are using to analyse the data, then rearrange the factors we want to examine by placing the control treatment first. This way all treatments will be compared to the no spray controls.
Next we are log transforming the grain yield and calculating the variance from the trial mean squares.
Finally we will assign factor classes to the main variables in the meta-analysis. Note that variable trial is a combination of:
* Trial identifier
* Trial year
* Trial location
* Host genotype
* Trial row spacing
```{r metafor_organisation}
slimmer_PM_dat$spray_management <-
factor(
slimmer_PM_dat$spray_management,
c(
"control",
"Early",
"Recommended",
"Recommended_plus",
"Late_plus"
)
)
slimmer_PM_dat$yi <- log(slimmer_PM_dat$grain_yield.t.ha)
slimmer_PM_dat$vi <-
slimmer_PM_dat$Y_Msquare / (slimmer_PM_dat$n * slimmer_PM_dat$grain_yield.t.ha ^
2)
slimmer_PM_dat$spray_management <-
factor(slimmer_PM_dat$spray_management)
slimmer_PM_dat$trial_ref <- factor(slimmer_PM_dat$trial_ref)
slimmer_PM_dat$trial <- factor(slimmer_PM_dat$trial)
```
```{r Metafor-analysis}
PM_mv_AI <- rma.mv(
yi,
vi,
mods = ~ spray_management,
method = "ML",
random = list(~ spray_management | trial),
struct = "UN",
data = slimmer_PM_dat
)
summary(PM_mv_AI)
```
```{r ReRunMetafor}
dat1 <- data.frame(
iteration = 0,
mods = names(coef(PM_mv_AI)),
coefs = coef(PM_mv_AI),
se = PM_mv_AI$se,
zval = PM_mv_AI$zval,
pval = PM_mv_AI$pval,
ci.lb = PM_mv_AI$ci.lb,
ci.ub = PM_mv_AI$ci.ub,
row.names = NULL
)
# Each time below is run dat1 will add the extra iterations to the bottom of dat1
for(i2 in (max(dat1$iteration) +1):(max(dat1$iteration) +3)){
# Read in data again
slimmer_PM_dat <- read.csv("cache/slimmer_PM_clusterdat.csv")
# Identify trials without variance
TrialMSQ <- slimmer_PM_dat %>%
group_by(trial_ref,location,year)%>%
summarise(unique(Y_Msquare))
# impute variance
for (i in TrialMSQ[is.na(TrialMSQ$`unique(Y_Msquare)`), ]$trial_ref) {
slimmer_PM_dat[slimmer_PM_dat$trial_ref == i, "Y_Msquare"] <-
exp(rnorm(
n = 1,
mean = mean(log(TrialMSQ$`unique(Y_Msquare)`), na.rm = TRUE),
sd(log(TrialMSQ$`unique(Y_Msquare)`), na.rm = TRUE)
))
}
#######
#reformat data
slimmer_PM_dat$yi <- log(slimmer_PM_dat$grain_yield.t.ha)
slimmer_PM_dat$vi <-
slimmer_PM_dat$Y_Msquare / (slimmer_PM_dat$n * slimmer_PM_dat$grain_yield.t.ha ^
2)
slimmer_PM_dat$spray_management <-
factor(slimmer_PM_dat$spray_management)
slimmer_PM_dat$trial_ref <- factor(slimmer_PM_dat$trial_ref)
slimmer_PM_dat$trial <- factor(slimmer_PM_dat$trial)
# Run meta-analysis
PM_mv_AI <- rma.mv(
yi,
vi,
mods = ~ spray_management,
method = "ML",
random = list(~ spray_management | trial),
struct = "UN",
data = slimmer_PM_dat
)
dat2 <- data.frame(
iteration = i2,
mods = names(coef(PM_mv_AI)),
coefs = coef(PM_mv_AI),
se = PM_mv_AI$se,
zval = PM_mv_AI$zval,
pval = PM_mv_AI$pval,
ci.lb = PM_mv_AI$ci.lb,
ci.ub = PM_mv_AI$ci.ub,
row.names = NULL
)
dat1 <- rbind(dat1,dat2)
}
dat1
```
```{r coef_iteration}
dat1 %>%
ggplot(aes(x = iteration ,y = coefs,))+
geom_ribbon(aes(ymax = ci.ub, ymin = ci.lb, x = iteration), fill = "grey70")+
geom_line()+
facet_grid(cols = vars(mods))
```
```{r pval_iteration}
dat1 %>%
ggplot(aes(x = iteration ,y = pval,))+
geom_line()+
geom_hline(yintercept = 0.05, colour = "grey50", linetype = 2)+
facet_grid(cols = vars(mods))
```
The first table in this output shows the tau^2 (variance) of each random effects and the number of occurrences for each treatment in the analysis. The second table is in two parts(left and right). The left part indicates the level of acceptable rho (variation) when comparing treatments. All comparisons were acceptable except for a comparison between `Early` and `Late_plus`, `0.000` rho. `Early` and `Late_plus` treatments never occured within the same trial which is indicated by the right side of the table.
The fixed effects, in the last table, showed that yields in single early spray treatments are not significantly different to the no spray control.
Also commencing spray management schedules at first sign of disease (Recommended), or between 7 - 19 days after first sign (late) produced significantly higher yields compared to the no spray control.
On average a spray schedule with two or more applications stating late (7 - 19 days after first sign of powdery mildew) produced the highest yields.
When we compare these results to those of the linear mixed effect model there is little difference in the outcome.
```{r compare_meta&lmer}
lme_PM <-
lmer(yi ~ spray_management + (spray_management |
trial), data = slimmer_PM_dat)
summary(lme_PM)
```
*****
To make it easier to compare each of the treatments we can compute the meta-analysis contrasts.
```{r metafor_contrasts}
anova(PM_mv_AI, L = rbind(
c(0, 1, -1, 0, 0),
# early vs Recomended
c(0, 1, 0, -1, 0),
# early vs recommended plus
c(0, 1, 0, 0, -1),
# early vs Late plus
c(0, 0, -1, 1, 0),
# Recommended plus vs recommended
c(0, 0, -1, 0, 1),
# Late_Plus vs Recommended
c(0, 0, 0, -1, 1)
)) # Late_plus vs Recommended plus
```
Results show with the exclusion of the no spray control, none of the treatments are significantly different, however, early applications treatments on average produced lower yields. Also the model was almost significant when assessing recommended_plus as saving more yield than recommended. This is probably due to the fact there were more comparisons between these two treatments which allowed a more certain outcome.
Let's view these comparisons in a plot.
First we will format the results into a data frame and back-transform the log values by using the exponent function
```{r metafor_results}
results_AI <- data.frame(cbind(exp(PM_mv_AI$b),
exp((PM_mv_AI$ci.lb)),
exp(PM_mv_AI$ci.ub)))
efficacy <- tbl_df(results_AI)
efficacy$Treatment <-
factor(
c(
"control",
"Early",
"Recommended",
"Recommended_plus",
"Late_plus"
)
)
efficacy$se <- PM_mv_AI$se
colnames(efficacy) <-
c("Mean", "CIs_lower", "CI_upper", "Treatment", "SE")
efficacy
```
```{r metafor_plot}
efficacy %>%
ggplot(aes(Treatment, Mean)) +
geom_hline(
yintercept = c(0.8, 1, 1.2, 1.4),
color = "grey80",
linetype = 3
) +
geom_point(aes(size = 1 / SE), shape = 15) +
geom_linerange(aes(ymin = CIs_lower, ymax = CI_upper)) +
coord_flip()
```
What is interesting here is that the variance in the control has increased while the treatments have decreased. This is despite the control being the best represented across all experiments. However it could also be because it also has a low number of pooled reps per trial. Let's look at how well each of the number treatments compare to each other. We can use the netmeta package to give a graphical representation of this.
### netmeta package
Let's analyse the data again using a different statistical approach to see if our outcome with the `metafor` package was robust. The `netmeta` package uses a frequentist approach to the analysis and focuses on the pairwise comparisons between treatments.
```{r netmeta-analysis}
datPM3 <- slimmer_PM_dat %>%
group_by(trial, spray_management, n) %>%
summarize(yi_mean = mean(yi),
vi_mean = mean(vi)) %>%
ungroup()
PM_con <- pairwise(
treat = spray_management,
n = n,
mean = yi_mean,
sd = sqrt(vi_mean),
studlab = trial,
data = datPM3,
sm = "MD"
)
net_con <- netmeta(TE,
seTE,
treat1,
treat2,
studlab,
data = PM_con,
sm = "MD")
summary(net_con)
```
Now let's visualise this as a forest plot
```{r netmeta-forest}
# I don't know why this is not working I will need to follow up on this
forest(
net_con,
reference.group = 4,
rightcols = c("effect", "ci", "Pscore"),
rightlabs = "P-Score",
small.values = "bad"
)
```
The `netmeta` analysis suggests the spray schedule commencing early are no different to any other treatment including the no spray control. It estimates the mean is very similar to the recommended treatments. The recommended plus and late_plus treatments show higher mean estimates, however not significantly different from the early estimate.
```{r netgraphGW}
netgraph(
net_con,
plastic = FALSE,
col = "orange",
thickness = "number.of.studies",
points = FALSE,
col.points = "black",
cex.points = 1,
number.of.studies = TRUE,
cex.number.of.studies = 1,
col.number.of.studies = "black",
bg.number.of.studies = "orange",
multiarm = FALSE,
col.multiarm = "lightblue",
pos.number.of.studies = 0.5
)
```
```{r}
netleague(net_con)
decomp.design(net_con)
netsplit(net_con)
nm1 <- netmeasures(net_con)
plot(
nm1$meanpath,
nm1$minpar,
pch = "",
xlab = "Mean path length",
ylab = "Minimal parallelism"
)
text(nm1$meanpath, nm1$minpar, names(nm1$meanpath), cex = 0.8)
```
```{r Save_meta_data, eval=FALSE}
write.csv(slimmer_PM_dat, file = "data/GYmeta_data.csv")
```