-
Notifications
You must be signed in to change notification settings - Fork 12
/
resistance_predict.Rmd
executable file
·131 lines (99 loc) · 4.97 KB
/
resistance_predict.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
title: "How to predict antimicrobial resistance"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{How to predict antimicrobial resistance}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r setup, include = FALSE, results = 'markup'}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 7.5,
fig.height = 4.75
)
```
## Needed R packages
As with many uses in R, we need some additional packages for AMR data analysis. Our package works closely together with the [tidyverse packages](https://www.tidyverse.org) [`dplyr`](https://dplyr.tidyverse.org/) and [`ggplot2`](https://ggplot2.tidyverse.org). The tidyverse tremendously improves the way we conduct data science - it allows for a very natural way of writing syntaxes and creating beautiful plots in R.
Our `AMR` package depends on these packages and even extends their use and functions.
```{r lib packages, message = FALSE}
library(dplyr)
library(ggplot2)
library(AMR)
# (if not yet installed, install with:)
# install.packages(c("tidyverse", "AMR"))
```
## Prediction analysis
Our package contains a function `resistance_predict()`, which takes the same input as functions for [other AMR data analysis](./AMR.html). Based on a date column, it calculates cases per year and uses a regression model to predict antimicrobial resistance.
It is basically as easy as:
```{r, eval = FALSE}
# resistance prediction of piperacillin/tazobactam (TZP):
resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
# or:
example_isolates %>%
resistance_predict(
col_ab = "TZP",
model = "binomial"
)
# to bind it to object 'predict_TZP' for example:
predict_TZP <- example_isolates %>%
resistance_predict(
col_ab = "TZP",
model = "binomial"
)
```
The function will look for a date column itself if `col_date` is not set.
When running any of these commands, a summary of the regression model will be printed unless using `resistance_predict(..., info = FALSE)`.
```{r, echo = FALSE, message = FALSE}
predict_TZP <- example_isolates %>%
resistance_predict(col_ab = "TZP", model = "binomial")
```
This text is only a printed summary - the actual result (output) of the function is a `data.frame` containing for each year: the number of observations, the actual observed resistance, the estimated resistance and the standard error below and above the estimation:
```{r}
predict_TZP
```
The function `plot` is available in base R, and can be extended by other packages to depend the output based on the type of input. We extended its function to cope with resistance predictions:
```{r, fig.height = 5.5}
plot(predict_TZP)
```
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
We also support the `ggplot2` package with our custom function `ggplot_sir_predict()` to create more appealing plots:
```{r}
ggplot_sir_predict(predict_TZP)
# choose for error bars instead of a ribbon
ggplot_sir_predict(predict_TZP, ribbon = FALSE)
```
### Choosing the right model
Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:
```{r}
example_isolates %>%
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>%
ggplot_sir_predict()
```
Vancomycin resistance could be 100% in ten years, but might remain very low.
You can define the model with the `model` parameter. The model chosen above is a generalised linear regression model using a binomial distribution, assuming that a period of zero resistance was followed by a period of increasing resistance leading slowly to more and more resistance.
Valid values are:
| Input values | Function used by R | Type of model |
|----------------------------------------|-------------------------------|-----------------------------------------------------|
| `"binomial"` or `"binom"` or `"logit"` | `glm(..., family = binomial)` | Generalised linear model with binomial distribution |
| `"loglin"` or `"poisson"` | `glm(..., family = poisson)` | Generalised linear model with poisson distribution |
| `"lin"` or `"linear"` | `lm()` | Linear model |
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate:
```{r}
example_isolates %>%
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
ggplot_sir_predict()
```
The model itself is also available from the object, as an `attribute`:
```{r}
model <- attributes(predict_TZP)$model
summary(model)$family
summary(model)$coefficients
```