-
Notifications
You must be signed in to change notification settings - Fork 0
/
quick_start.Rmd
129 lines (98 loc) · 4.31 KB
/
quick_start.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: "Quick start example analysis"
subtitle: "`r paste0('metabolyseR v',packageVersion('metabolyseR'))`"
author: "Jasen Finch"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
prettydoc::html_pretty:
toc: true
highlight: github
theme: tactile
vignette: >
%\VignetteIndexEntry{Quick start example analysis}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.align = 'center',
message = FALSE
)
```
This example analysis will use the `abr1` data set from the [metaboData](https://aberhrml.github.io/metaboData/) package.
It is nominal mass flow-injection mass spectrometry (FI-MS) fingerprinting data from a plant-pathogen infection time course experiment.
The analysis will also include use of the pipe `%>%` from the [magrittr](https://magrittr.tidyverse.org/) package.
First load the necessary packages.
```{r setup}
library(metabolyseR)
library(metaboData)
```
For this example we will use only the negative acquisition mode data (`abr1$neg`) and sample meta-information (`abr1$fact`).
Create an `AnalysisData` class object using the following:
```{r analysis_data}
d <- analysisData(abr1$neg,abr1$fact)
```
The data includes `r nSamples(d)` samples and `r nFeatures(d)` mass spectral features as shown below.
```{r print_analysis_data}
d
```
The `clsAvailable()` function can be used to identify the columns available in our meta-information table.
```{r}
clsAvailable(d)
```
For this analysis, we will be using the infection time course class information contained in the `day` column.
This can be extracted and the class frequencies tabulated using the following:
```{r}
d %>%
clsExtract(cls = 'day') %>%
table()
```
As can be seen above, the experiment is made up of six infection time point classes that includes a healthy control class (`H`) and five day infection time points (`1-5`), each with 20 replicates.
For data pre-treatment prior to statistical analysis, a two-thirds maximum class occupancy filter can be applied.
Features where the maximum proportion of non-missing data per class is above two-thirds are retained.
A total ion count normalisation will also be applied.
```{r pre_treat}
d <- d %>%
occupancyMaximum(cls = 'day', occupancy = 2/3) %>%
transformTICnorm() %>%
transformLog10()
```
```{r pre_treat_result}
d
```
This has reduced the data set to `r nFeatures(d)` relevant features.
The structure of the data can be visualised using both unsupervised and supervised methods. For instance, the first two principle components from a principle component analysis (PCA) of the data with the sample points coloured by infection class can be plotted using:
```{r pca}
plotPCA(d,cls = 'day',xAxis = 'PC1',yAxis = 'PC2')
```
And similarly, multidimensional scaling (MDS) of sample proximity values from a supervised random forest classification model along with receiver operator characteristic (ROC) curves.
```{r supervised_RF}
plotSupervisedRF(d,cls = 'day')
```
A progression can clearly be seen from the earliest to latest infected time points.
For feature selection, one-way analysis of variance (ANOVA) can be performed for each feature to identify features significantly explanatory for the infection time point.
```{r anova}
anova_results <- d %>%
anova(cls = 'day')
```
A table of the significantly explanatory features can be extracted with a bonferroni correction adjusted p value < 0.05 using:
```{r explanatoty_features_extract}
explan_feat <- explanatoryFeatures(anova_results,threshold = 0.05)
```
```{r,explanatory_features}
explan_feat
```
The ANOVA has identified `r nrow(explan_feat)` features significantly explanatory over the infection time course.
A heat map of the mean relative intensity for each class of these explanatory features can be plotted to visualise their trends between the infection time point classes.
```{r rf_heatmap,fig.height=10,fig.width=5}
plotExplanatoryHeatmap(anova_results,
threshold = 0.05,
featureNames = FALSE)
```
Many of the explanatory features can be seen to be most highly abundant in the final infection time point `5`.
Finally, box plots of the trends of individual features can be plotted, such as the `N341` feature below.
```{r feature_plot}
plotFeature(anova_results,feature = 'N341',cls = 'day')
```