/
Intromi4p.Rmd
198 lines (158 loc) · 6.74 KB
/
Intromi4p.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
---
title: "Multiple imputation for proteomics"
shorttitle: "mi4p: multiple imputation for proteomics"
author:
- Marie Chion, Université de Strasbourg et CNRS, IRMA, labex IRMIA, LSMBO, IPHC, marie.chion@protonmail.fr
- Christine Carapito, Université de Strasbourg et CNRS, LSMBO, IPHC, ccarapito@unistra.fr
- Frédéric Bertrand, Université de Strasbourg et CNRS, IRMA, labex IRMIA, Université de technologie de Troyes, LIST3N, frederic.bertrand@utt.fr
date: "`r Sys.Date()`"
output:
prettydoc::html_pretty:
theme: cayman
highlight: github
vignette: >
%\VignetteIndexEntry{Multiple imputation for proteomics}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
<table align="center">
<tr>
<td><img src="../man/figures/logo.png" align="center" width="200" style="margin:0 0 0 100px;"/></td>
</tr>
</table>
```{r setup, include = FALSE}
LOCAL <- identical(Sys.getenv("LOCAL"), "TRUE")
#LOCAL=FALSE
knitr::opts_chunk$set(purl = LOCAL)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width=5
)
```
# mi4p: multiple imputation for proteomics
This repository contains the R code and package for the _mi4p_ methodology (Multiple Imputation for Proteomics), proposed by Marie Chion, Christine Carapito and Frédéric Bertrand (2021) in *Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics*, [https://arxiv.org/abs/2108.07086](https://arxiv.org/abs/2108.07086).
The following material is available on the Github repository of the package [https://github.com/mariechion/mi4p/](https://github.com/mariechion/mi4p/).
1. The `Functions` folder contains all the functions used for the workflow.
2. The `Simulation-1`, `Simulation-2` and `Simulation-3` folders contain all the R scripts and data used to conduct simulated experiments and evaluate our methodology.
3. The `Arabidopsis_UPS` and `Yeast_UPS` folders contain all the R scripts and data used to challenge our methodology on real proteomics datasets. Raw experimental data were deposited with the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD003841 and PXD027800.
<!--You can insert a graphic here, just modify that syntax ![Example Graphs.](IntroPatterns_files/figure-html/Fresults-1.png) -->
This website and these examples were created by M. Chion, C. Carapito and F. Bertrand.
## Installation
You can install the released version of mi4p from [CRAN](https://CRAN.R-project.org) with:
```r
install.packages("mi4p")
```
You can install the development version of mi4p from [github](https://github.com) with:
```{r, eval = FALSE}
devtools::install_github("mariechion/mi4p")
```
## Examples
### First section
```{r}
library(mi4p)
```
```{r}
set.seed(4619)
datasim <- protdatasim()
str(datasim)
```
```{r}
attr(datasim, "metadata")
```
## AMPUTATION
<!-- Use `eval=LOCAL` to prevent testing too long computation during CRAN checks.-->
```{r, cache=TRUE, eval=LOCAL}
MV1pct.NA.data <- MVgen(dataset = datasim[,-1], prop_NA = 0.01)
MV1pct.NA.data
```
## IMPUTATION
```{r, cache=TRUE, eval=LOCAL}
MV1pct.impMLE <- multi.impute(data = MV1pct.NA.data, conditions = attr(datasim,"metadata")$Condition, method = "MLE", parallel = FALSE)
```
## ESTIMATION
```{r, cache=TRUE, eval=LOCAL}
print(paste(Sys.time(), "Dataset", 1, "out of", 1))
MV1pct.impMLE.VarRubin.Mat <- rubin2.all(data = MV1pct.impMLE, metacond = attr(datasim, "metadata")$Condition)
```
## PROJECTION
```{r, cache=TRUE, eval=LOCAL}
print(paste("Dataset", 1, "out of",1, Sys.time()))
MV1pct.impMLE.VarRubin.S2 <- as.numeric(lapply(MV1pct.impMLE.VarRubin.Mat, function(aaa){
DesMat = mi4p::make.design(attr(datasim, "metadata"))
return(max(diag(aaa)%*%t(DesMat)%*%DesMat))
}))
```
## MODERATED T-TEST
```{r, cache=TRUE, eval=LOCAL}
MV1pct.impMLE.mi4limma.res <- mi4limma(qData = apply(MV1pct.impMLE,1:2,mean),
sTab = attr(datasim, "metadata"),
VarRubin = sqrt(MV1pct.impMLE.VarRubin.S2))
MV1pct.impMLE.mi4limma.res
(simplify2array(MV1pct.impMLE.mi4limma.res)$P_Value.A_vs_B_pval)[1:10]
(simplify2array(MV1pct.impMLE.mi4limma.res)$P_Value.A_vs_B_pval)[11:200]<=0.05
```
True positive rate
```{r, cache=TRUE, eval=LOCAL}
sum((simplify2array(MV1pct.impMLE.mi4limma.res)$P_Value.A_vs_B_pval)[1:10]<=0.05)/10
```
False positive rate
```{r, cache=TRUE, eval=LOCAL}
sum((simplify2array(MV1pct.impMLE.mi4limma.res)$P_Value.A_vs_B_pval)[11:200]<=0.05)/190
```
```{r, cache=TRUE, eval=LOCAL}
MV1pct.impMLE.dapar.res <-limmaCompleteTest.mod(qData = apply(MV1pct.impMLE,1:2,mean), sTab = attr(datasim, "metadata"))
MV1pct.impMLE.dapar.res
```
Simulate a list of 100 datasets.
```{r}
set.seed(4619)
norm.200.m100.sd1.vs.m200.sd1.list <- lapply(1:100, protdatasim)
metadata <- attr(norm.200.m100.sd1.vs.m200.sd1.list[[1]],"metadata")
```
100 datasets with parallel comuting support. Quite long to run even with parallel computing support.
```{r, eval=FALSE}
library(foreach)
doParallel::registerDoParallel(cores=NULL)
requireNamespace("foreach",quietly = TRUE)
```
## AMPUTATION
```{r, eval=FALSE}
MV1pct.NA.data <- foreach::foreach(iforeach = norm.200.m100.sd1.vs.m200.sd1.list,
.errorhandling = 'stop', .verbose = T) %dopar%
MVgen(dataset = iforeach[,-1], prop_NA = 0.01)
```
## IMPUTATION
```{r, eval=FALSE}
MV1pct.impMLE <- foreach::foreach(iforeach = MV1pct.NA.data,
.errorhandling = 'stop', .verbose = F) %dopar%
multi.impute(data = iforeach, conditions = metadata$Condition,
method = "MLE", parallel = F)
```
## ESTIMATION
```{r, eval=FALSE}
MV1pct.impMLE.VarRubin.Mat <- lapply(1:length(MV1pct.impMLE), function(index){
print(paste(Sys.time(), "Dataset", index, "out of", length(MV1pct.impMLE)))
rubin2.all(data = MV1pct.impMLE[[index]], metacond = metadata$Condition)
})
```
## PROJECTION
```{r, eval=FALSE}
MV1pct.impMLE.VarRubin.S2 <- lapply(1:length(MV1pct.impMLE.VarRubin.Mat), function(id.dataset){
print(paste("Dataset", id.dataset, "out of",length(MV1pct.impMLE.VarRubin.Mat), Sys.time()))
as.numeric(lapply(MV1pct.impMLE.VarRubin.Mat[[id.dataset]], function(aaa){
DesMat = mi4p::make.design(metadata)
return(max(diag(aaa)%*%t(DesMat)%*%DesMat))
}))
})
```
## MODERATED T-TEST
```{r, eval=FALSE}
MV1pct.impMLE.mi4limma.res <- foreach(iforeach = 1:100, .errorhandling = 'stop', .verbose = T) %dopar%
mi4limma(qData = apply(MV1pct.impMLE[[iforeach]],1:2,mean),
sTab = metadata,
VarRubin = sqrt(MV1pct.impMLE.VarRubin.S2[[iforeach]]))
MV1pct.impMLE.dapar.res <- foreach(iforeach = 1:100, .errorhandling = 'stop', .verbose = T) %dopar%
limmaCompleteTest.mod(qData = apply(MV1pct.impMLE[[iforeach]],1:2,mean),
sTab = metadata)
```