-
Notifications
You must be signed in to change notification settings - Fork 0
/
data-generation.Rmd
261 lines (206 loc) · 7.98 KB
/
data-generation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
---
title: "Data Generation for topdownr"
author:
- name: Pavel V. Shliaha
affiliation: Department of Biochemistry and Molecular Biology,
University of Southern Denmark, Denmark.
- name: Sebastian Gibb
affiliation: Department of Anesthesiology and Intensive Care,
University Medicine Greifswald, Germany.
- name: Ole Nørregaard Jensen
affiliation: Department of Biochemistry and Molecular Biology,
University of Southern Denmark, Denmark.
package: topdownr
abstract: >
This vignette describes the setup and the data preparation to create the
input files needed for the analysis with the functionality the `topdownr`
package.
output:
BiocStyle::html_document:
toc_float: TRUE
tidy: TRUE
bibliography: topdownr.bib
vignette: >
%\VignetteIndexEntry{Data Generation for topdownr}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteKeywords{Mass Spectrometry, Proteomics, Bioinformatics}
%\VignetteEncoding{UTF-8}
%\VignettePackage{topdownr}
---
```{r environment, echo=FALSE, message=FALSE, warning=FALSE}
library("topdownr")
library("BiocStyle")
```
# Foreword {-}
`r BiocStyle::Biocpkg("topdownr")` is free and
open-source software. If you use it, please support the project by
citing it in publications:
```{r citation, echo=FALSE, results="asis"}
ct <- format(citation("topdownr"), "textVersion")
cat(gsub("DOI: *(.*)$", "DOI: [\\1](https://doi.org/\\1)", ct), "\n")
```
# Questions and bugs {-}
For bugs, typos, suggestions or other questions, please file an issue
in our tracking system (https://github.com/sgibb/topdownr/issues)
providing as much information as possible, a reproducible example and
the output of `sessionInfo()`.
If you don't have a GitHub account or wish to reach a broader audience
for general questions about proteomics analysis using R, you may want
to use the Bioconductor support site: https://support.bioconductor.org/.
# Introduction
## The `topdownr` Data Generation Workflow
<embed src="images/workflow/data-generation.mmd.svg" type="image/svg+xml" />
# Installation of Additional Software
## Setup the Thermo Software
To create methods the user will have to install and modify Orbitrap Fusion
LUMOS workstation first:
1. Request `TribridSeriesWorkstationSetup-v3.2.exe` from Thermo Scientific.
2. Install the workstation by running `TribridSeriesWorkstationSetup-v3.2.exe`.
## Setup XMLMethodChanger
*XMLMethodChanger* is needed to convert the xml methods into `.meth` files. It
could be found at https://github.com/thermofisherlsms/meth-modifications
The user has to download and compile it himself (or request it from Thermo
Scientific as well). You would need at least the *3.2 beta* version.
## Setup Operating System
In order to use *XMLMethodChanger* the operating system has to use the `.` (dot)
as decimal mark and the `,` (comma) as digit group separator (one thousand dot
two should be formated as `1,000.2`).
In *Windows 7* the settings are located at
`Windows Control Panel > Region and Language > Formats`.
Choose *English (USA)* here or use the *Additional settings* button to change it
manually.
## Setup ScanHeadsman
After data aquisition `topdownr` would need the header information from the
`.raw` files.
Therefore the *ScanHeadsman* software is used. It could be
downloaded from https://bitbucket.org/caetera/scanheadsman
It requires Microsoft **.NET 4.5** or later (it is often preinstalled on a
typical modern Windows or could be found in Microsoft's Download Center, e.g.
https://www.microsoft.com/en-us/download/details.aspx?id=30653).
Additionally you would need Thermo's *MS File Reader* which could be downloaded
free of charge (but you have to register) from the Thermo FlexNet website:
https://thermo.flexnetoperations.com/
*ScanHeadsman* was created by Vladimir Gorshkov <vgor@bmb.sdu.dk>.
# Creating Methods
Importantly, *XMLmethodChanger* does not create methods *de novo*, but modifies
pre-existing methods (supplied with *XMLMethodChanger*) using modifications
described in XML files. Thus the whole process of creating user specified
methods consists of 2 parts:
1. Construction of XML files with all possible combination of fragmentation
parameters (see `topdownr::createExperimentsFragmentOptimisation`,
and `topdownr::writeMethodXmls` below).
2. Submitting the constructed XML files together with a template
`.meth` file to *XMLmethodChanger*.
We choose to use targeted MS2 scans (TMS2) as a way to store the
fragmentation parameters.
Each TMS2 is stored in a separate experiment. Experiments do not overlap.
![Method Editor](images/methodeditor-exp-tms.png)
# Data preparation with `topdownr`
Shown below is the process of creating XML files and using them to modify the
*TMS2IndependentTemplateForTD.meth* template file.
```{r writeMethodXml, eval=FALSE}
library("topdownr")
## Create MS1 settings
ms1 <- expandMs1Conditions(
FirstMass=400,
LastMass=1200,
Microscans=as.integer(10)
)
## Set TargetMass
targetMz <- cbind(mz=c(560.6, 700.5, 933.7), z=rep(1, 3))
## Set common settings
common <- list(
OrbitrapResolution="R120K",
IsolationWindow=1,
MaxITTimeInMS=200,
Microscans=as.integer(40),
AgcTarget=c(1e5, 5e5, 1e6)
)
## Create settings for different fragmentation conditions
cid <- expandTms2Conditions(
MassList=targetMz,
common,
ActivationType="CID",
CIDCollisionEnergy=seq(7, 35, 7)
)
hcd <- expandTms2Conditions(
MassList=targetMz,
common,
ActivationType="HCD",
HCDCollisionEnergy=seq(7, 35, 7)
)
etd <- expandTms2Conditions(
MassList=targetMz,
common,
ActivationType="ETD",
ETDReactionTime=as.double(1:2)
)
etcid <- expandTms2Conditions(
MassList=targetMz,
common,
ActivationType="ETD",
ETDReactionTime=as.double(1:2),
ETDSupplementalActivation="ETciD",
ETDSupplementalActivationEnergy=as.double(1:2)
)
uvpd <- expandTms2Conditions(
MassList=targetMz,
common,
ActivationType="UVPD"
)
## Create experiments with all combinations of the above settings
## for fragment optimisation
exps <- createExperimentsFragmentOptimisation(
ms1=ms1, cid, hcd, etd, etcid, uvpd,
groupBy=c("AgcTarget", "replication"), nMs2perMs1=10, scanDuration=0.5,
replications=2, randomise=TRUE
)
## Write experiments to xml files
writeMethodXmls(exps=exps)
## Run XMLMethodChanger
runXmlMethodChanger(
modificationXml=list.files(pattern="^method.*\\.xml$"),
templateMeth="TMS2IndependentTemplateForTD.meth",
executable="path\\to\\XmlMethodChanger.exe"
)
```
# Data Acquisition
After setting up direct infusion make sure that MS1 spectrum produces
expected protein mass after deconvolution by *Xtract*.
Shown below is a deconvoluted MS1 spectrum for myoglobin.
The dominant mass corresponds to myoglobin with Met removed.
![Xtract myoglobin](images/xtract-myo.png)
# Data Preparation
Prior to `R` analysis of protein fragmentation data we have to convert the
`.raw` files.
## Extracting Header Information
Some of the information
(SpectrumId, Ion Injection Time (ms), Orbitrap Resolution, targeted Mz,
ETD reaction time, CID activation and HCD activation) is stored in scan
headers, while other (ETD reagent target and AGC target) is only available
in method table.
You can run *ScanHeadsman* from the commandline
(`ScanHeadsman.exe --noMS --methods:CSV`) or use the function provided by
`topdownr`:
```{r ScanHeadsman, eval=FALSE}
runScanHeadsman(
path="path\\to\\raw-files",
executable="path\\to\\ScanHeadsman.exe"
)
```
*ScanHeadsman* will generate a `.txt` (scan header table) and a `.csv` (method
table) file for each `.raw` file.
## Convert .raw files into mzML
The spectra have to be charge state deconvoluted with *Xtract* node in
*Proteome Discoverer 2.1*. The software returns deconvoluted spectra in
mzML format.
![Proteome Discoverer](images/proteomediscoverer.png)
Once a `.csv`, `.txt`, and `.mzML` file for each `.raw` have been produced we
can start the analysis using `topdownr`.
Please see *analysis* vignette (`vignette("analysis", package="topdownr")`) for
an example.
# Session Info
```{r sessioninfo}
sessionInfo()
```
# References