-
Notifications
You must be signed in to change notification settings - Fork 6
/
align_visualize_quantify.html
453 lines (414 loc) · 16.4 KB
/
align_visualize_quantify.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="generator" content="AsciiDoc 8.2.6" />
<style type="text/css">
/* Debug borders */
p, li, dt, dd, div, pre, h1, h2, h3, h4, h5, h6 {
/*
border: 1px solid red;
*/
}
body {
margin: 1em 5% 1em 5%;
}
a {
color: blue;
text-decoration: underline;
}
a:visited {
color: fuchsia;
}
em {
font-style: italic;
color: navy;
}
strong {
font-weight: bold;
color: #083194;
}
tt {
color: navy;
}
h1, h2, h3, h4, h5, h6 {
color: #527bbd;
font-family: sans-serif;
margin-top: 1.2em;
margin-bottom: 0.5em;
line-height: 1.3;
}
h1, h2, h3 {
border-bottom: 2px solid silver;
}
h2 {
padding-top: 0.5em;
}
h3 {
float: left;
}
h3 + * {
clear: left;
}
div.sectionbody {
font-family: serif;
margin-left: 0;
}
hr {
border: 1px solid silver;
}
p {
margin-top: 0.5em;
margin-bottom: 0.5em;
}
ul, ol, li > p {
margin-top: 0;
}
pre {
padding: 0;
margin: 0;
}
span#author {
color: #527bbd;
font-family: sans-serif;
font-weight: bold;
font-size: 1.1em;
}
span#email {
}
span#revision {
font-family: sans-serif;
}
div#footer {
font-family: sans-serif;
font-size: small;
border-top: 2px solid silver;
padding-top: 0.5em;
margin-top: 4.0em;
}
div#footer-text {
float: left;
padding-bottom: 0.5em;
}
div#footer-badges {
float: right;
padding-bottom: 0.5em;
}
div#preamble,
div.tableblock, div.imageblock, div.exampleblock, div.verseblock,
div.quoteblock, div.literalblock, div.listingblock, div.sidebarblock,
div.admonitionblock {
margin-right: 10%;
margin-top: 1.5em;
margin-bottom: 1.5em;
}
div.admonitionblock {
margin-top: 2.5em;
margin-bottom: 2.5em;
}
div.content { /* Block element content. */
padding: 0;
}
/* Block element titles. */
div.title, caption.title {
color: #527bbd;
font-family: sans-serif;
font-weight: bold;
text-align: left;
margin-top: 1.0em;
margin-bottom: 0.5em;
}
div.title + * {
margin-top: 0;
}
td div.title:first-child {
margin-top: 0.0em;
}
div.content div.title:first-child {
margin-top: 0.0em;
}
div.content + div.title {
margin-top: 0.0em;
}
div.sidebarblock > div.content {
background: #ffffee;
border: 1px solid silver;
padding: 0.5em;
}
div.listingblock {
margin-right: 0%;
}
div.listingblock > div.content {
border: 1px solid silver;
background: #f4f4f4;
padding: 0.5em;
}
div.quoteblock > div.content {
padding-left: 2.0em;
}
div.attribution {
text-align: right;
}
div.verseblock + div.attribution {
text-align: left;
}
div.admonitionblock .icon {
vertical-align: top;
font-size: 1.1em;
font-weight: bold;
text-decoration: underline;
color: #527bbd;
padding-right: 0.5em;
}
div.admonitionblock td.content {
padding-left: 0.5em;
border-left: 2px solid silver;
}
div.exampleblock > div.content {
border-left: 2px solid silver;
padding: 0.5em;
}
div.verseblock div.content {
white-space: pre;
}
div.imageblock div.content { padding-left: 0; }
div.imageblock img { border: 1px solid silver; }
span.image img { border-style: none; }
dl {
margin-top: 0.8em;
margin-bottom: 0.8em;
}
dt {
margin-top: 0.5em;
margin-bottom: 0;
font-style: normal;
}
dd > *:first-child {
margin-top: 0.1em;
}
ul, ol {
list-style-position: outside;
}
div.olist > ol {
list-style-type: decimal;
}
div.olist2 > ol {
list-style-type: lower-alpha;
}
div.tableblock > table {
border: 3px solid #527bbd;
}
thead {
font-family: sans-serif;
font-weight: bold;
}
tfoot {
font-weight: bold;
}
div.hlist {
margin-top: 0.8em;
margin-bottom: 0.8em;
}
div.hlist td {
padding-bottom: 15px;
}
td.hlist1 {
vertical-align: top;
font-style: normal;
padding-right: 0.8em;
}
td.hlist2 {
vertical-align: top;
}
@media print {
div#footer-badges { display: none; }
}
div#toctitle {
color: #527bbd;
font-family: sans-serif;
font-size: 1.1em;
font-weight: bold;
margin-top: 1.0em;
margin-bottom: 0.1em;
}
div.toclevel1, div.toclevel2, div.toclevel3, div.toclevel4 {
margin-top: 0;
margin-bottom: 0;
}
div.toclevel2 {
margin-left: 2em;
font-size: 0.9em;
}
div.toclevel3 {
margin-left: 4em;
font-size: 0.9em;
}
div.toclevel4 {
margin-left: 6em;
font-size: 0.9em;
}
/* Workarounds for IE6's broken and incomplete CSS2. */
div.sidebar-content {
background: #ffffee;
border: 1px solid silver;
padding: 0.5em;
}
div.sidebar-title, div.image-title {
color: #527bbd;
font-family: sans-serif;
font-weight: bold;
margin-top: 0.0em;
margin-bottom: 0.5em;
}
div.listingblock div.content {
border: 1px solid silver;
background: #f4f4f4;
padding: 0.5em;
}
div.quoteblock-content {
padding-left: 2.0em;
}
div.exampleblock-content {
border-left: 2px solid silver;
padding-left: 0.5em;
}
/* IE6 sets dynamically generated links as visited. */
div#toc a:visited { color: blue; }
/* Because IE6 child selector is broken. */
div.olist2 ol {
list-style-type: lower-alpha;
}
div.olist2 div.olist ol {
list-style-type: decimal;
}
</style>
<title>Trinity: Supported Processes for Read Alignment, Visualization, and Abundance Estimation</title>
</head>
<body>
<div id="header">
<h1>Trinity: Supported Processes for Read Alignment, Visualization, and Abundance Estimation</h1>
</div>
<div id="preamble">
<div class="sectionbody">
<div class="admonitionblock">
<table><tr>
<td class="icon">
<div class="title">Note</div>
</td>
<td class="content">Processes described below are under rapid development, subject to change, and should be pursued with caution.</td>
</tr></table>
</div>
<div class="para"><p>The following describes methods available, mostly through third-party tools, for generating alignments of RNA-Seq reads to the Trinity transcripts, visualizing the read mappings and coverage information, and estimating abundance values of transcripts (a prerequisite for <a href="diff_expression_analysis.html">studies of differential expression</a>).</p></div>
</div>
</div>
<h2 id="_read_alignment">Read Alignment</h2>
<div class="sectionbody">
<div class="para"><p>Align reads to the Trinity transcripts using the <em>util/alignReads.pl</em> script, which can leverage Bowtie, BLAT, or BWA as the aligner. In the case of BLAT and Bowtie, you must install these tools separately. A specially modified version of BWA is included in the latest Trinity distribution which reports multiply-mapped reads at each of their locations, rather than a single random placement.</p></div>
<div class="para"><p>Caution should be taken in using this wrapper and the modified tools, because there are advantages and disadvantages to each, as described below:</p></div>
<div class="ilist"><ul>
<li>
<p>
Bowtie: Abundance estimation using RSEM (as described below) currently leverages Bowtie gap-free alignments. Running bowtie (original, not the newer bowtie 2…still investigating) with paired fragment reads will exclude alignments where only one of the mate pairs aligns. Since Trinity doesn't perform scaffolding across sequencing gaps yet, there will be cases (moreso in fragmented transcripts corresponding to lowly expressed transcripts) where only one of the mate-pairs aligns. The alignReads.pl script operates similarly to TopHat in that it runs Bowtie to align each of the paired fragment reads separately, and then groups them into pairs afterwards. We capture both the paired and the unpaired fragment read alignments from Bowtie for visualization and examining read support for the transcript assemblies. The properly-mapped pairs are further extracted and can be used as a substrate for RSEM-based abundance estimation (see below).
</p>
</li>
<li>
<p>
BLAT: we've found BLAT to be particularly useful in generating spliced short-read alignments to targets where short introns exist, but GSNAP is superior. We include BLAT here only for exploratory purposes.
</p>
</li>
<li>
<p>
BWA: the modified version of BWA provides SAM entries for each of the multiply mapped reads alternative mappings, but grouping of pairs is performed by the alignReads.pl script, and the total number of alignments reported tends to be substantially less than running the latest version of BWA in paired mode without having the multiply mapped individual reads. BWA is recommended specifically for SNP-calling exercises, and we're continuing to explore the various options available, including further tweaks here.
</p>
</li>
</ul></div>
<div class="para"><p>Our <em>standard</em> practice for aligning reads to the Trinity transcripts for downstream analyses, including visualization and abundance estimation, is to run alignReads.pl with the —bowtie option, like so:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>TRINITY_RNASEQ_ROOT/util/alignReads.pl --left left.fq --right right.fq --seqType fq --target Trinity.fasta --aligner bowtie</tt></pre>
</div></div>
<div class="literalblock">
<div class="content">
<pre><tt>(if your data are strand-specific, be sure to set --SS_lib_type as done with Trinity.pl)</tt></pre>
</div></div>
<div class="para"><p>Note, this alignment process generates lots of output files, ex. for paired strand-specific data:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>-rw-rw-r-- 1 bhaas broad 33644239 Oct 29 09:51 bowtie_out.coordSorted.sam
-rw-rw-r-- 1 bhaas broad 5761928 Oct 29 09:51 bowtie_out.coordSorted.bam
-rw-rw-r-- 1 bhaas broad 33644239 Oct 29 09:51 bowtie_out.nameSorted.sam
-rw-rw-r-- 1 bhaas broad 4256476 Oct 29 09:51 bowtie_out.nameSorted.bam
-rw-rw-r-- 1 bhaas broad 4416 Oct 29 09:51 bowtie_out.coordSorted.bam.bai
-rw-rw-r-- 1 bhaas broad 33634652 Oct 29 09:51 bowtie_out.coordSorted.sam.+.sam
-rw-rw-r-- 1 bhaas broad 9587 Oct 29 09:51 bowtie_out.coordSorted.sam.-.sam
-rw-rw-r-- 1 bhaas broad 33634652 Oct 29 09:51 bowtie_out.nameSorted.sam.+.sam
-rw-rw-r-- 1 bhaas broad 9587 Oct 29 09:51 bowtie_out.nameSorted.sam.-.sam
-rw-rw-r-- 1 bhaas broad 5759999 Oct 29 09:51 bowtie_out.coordSorted.sam.+.bam
-rw-rw-r-- 1 bhaas broad 4416 Oct 29 09:51 bowtie_out.coordSorted.sam.+.bam.bai
-rw-rw-r-- 1 bhaas broad 1836 Oct 29 09:51 bowtie_out.coordSorted.sam.-.bam
-rw-rw-r-- 1 bhaas broad 1680 Oct 29 09:51 bowtie_out.coordSorted.sam.-.bam.bai
-rw-rw-r-- 1 bhaas broad 4255371 Oct 29 09:51 bowtie_out.nameSorted.sam.+.bam
-rw-rw-r-- 1 bhaas broad 1843 Oct 29 09:51 bowtie_out.nameSorted.sam.-.bam
-rw-rw-r-- 1 bhaas broad 3880361 Oct 29 09:51 bowtie_out.nameSorted.sam.+.sam.PropMapPairsForRSEM.bam</tt></pre>
</div></div>
<div class="para"><p>If you do not have strand-specific reads, then you'll not have the (+) and (-) versions of the files as above.</p></div>
<div class="para"><p>The bowtie_out.coordSorted.bam file contains both properly-mapped pairs and single unpaired fragment reads. This file can be used for visualizing the alignments and coverage data using IGV (below).</p></div>
<div class="para"><p>The *nameSorted*PropMapPairsForRsem.bam contains only the properly-mapped pairs for use with the RSEM software (see below). If you have single-stranded RNA-Seq data, then use the bowtie_out.nameSorted.bam file directly (or strand-specific version).</p></div>
</div>
<h2 id="Visualization">Visualization</h2>
<div class="sectionbody">
<div class="para"><p>The Trinity Transcripts and read alignments can be visualized using the <a href="http://www.broadinstitute.org/igv/">Integrated Genomics Viewer</a>.</p></div>
<div class="para"><p>Just import the Trinity.fasta file as a <em>genome</em>, and load up the bam file containing the aligned reads. A screenshot below shows how the data are displayed:</p></div>
<div class="para"><p><span class="image">
<img src="../images/IGV_Trinity_screenshot.png" alt="Trinity_in_IGV" title="Trinity_in_IGV" />
</span></p></div>
</div>
<h2 id="RSEM">Abundance Estimation Using RSEM</h2>
<div class="sectionbody">
<div class="para"><p>We've found RSEM to be enormously useful for abundance estimation in the context of transcriptome assemblies.</p></div>
<div class="para"><p>Note, for the most accurate abundance estimation, we recommend that you obtain the latest RSEM software from <a href="http://deweylab.biostat.wisc.edu/rsem/">the Dewey lab</a> and have it do all the work, including running bowtie to generate alignments, and the downstream quantitation.</p></div>
<div class="para"><p>However, we currently include a slightly modified version of RSEM with Trinity that will leverage the alignments generated by util/AlignReads.pl with Bowtie as described above. Why do this? Mostly for pragmatic reasons: RSEM, by design, prefers to run bowtie with liberal alignment settings, whereas we are interested in visualizing and studying the higher quality alignments. Also, we prefer generating a single set of alignments that can be leveraged for both visualization and abundance estimation. This whole process is likely to change in the near future so that a single set of alignments can be generated optimally for both purposes, leveraging the most current version of RSEM and applying a set of alignment filters to extract those that are most useful for visualization (Work in progress).</p></div>
<div class="para"><p>To run the included version of RSEM, execute the following:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>TRINITY_RNASEQ_ROOT/util/RSEM_util/run_RSEM.pl --transcripts Trinity.fasta --name_sorted_bam bowtie_out.nameSorted.sam.+.sam.PropMapPairsForRSEM.bam --paired</tt></pre>
</div></div>
<div class="para"><p>which will create output files:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>RSEM.isoforms.results : EM read counts per Trinity transcript
RSEM.genes.results : EM read counts on a per-Trinity-component (aka... gene) basis, 'gene' used loosely here.</tt></pre>
</div></div>
<div class="para"><p>You can compute FPKM values and examine transcripts in the context of the percent expressed within a given component (<em>gene</em>) by running:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>TRINITY_RNASEQ_ROOT/util/RSEM_util/summarize_RSEM_fpkm.pl --transcripts trinity_out_dir/Trinity.fasta --RSEM RSEM.isoforms.results --fragment_length 300 --group_by_component | tee Trinity.RSEM.fpkm</tt></pre>
</div></div>
<div class="para"><p>The output looks like follows:</p></div>
<div class="literalblock">
<div class="content">
<pre><tt>#Total fragments mapped to transcriptome: 24114.01
transcript length eff_length count fraction fpkm %comp_fpkm
comp20_c0_seq1 349 50 3.00 5.67e-03 2488.18 100.00</tt></pre>
</div></div>
<div class="literalblock">
<div class="content">
<pre><tt>comp11_c0_seq1 5495 5196 977.15 2.47e-02 7798.71 76.86
comp11_c0_seq2 5457 5158 0.00 5.22e-62 0.00 0.00
comp11_c0_seq3 5436 5137 290.85 7.45e-03 2347.96 23.14
comp11_c0_seq4 5398 5099 0.00 2.85e-63 0.00 0.00</tt></pre>
</div></div>
<div class="para"><p>If you want to filter out the likely transcript artifacts and lowly expressed transcripts, you might consider retaining only those that represent at least 1% of the per-component expression level. Because Trinity transcripts are not currently scaffolded across sequencing gaps, there will be cases where smaller transcript fragments may lack enough properly-paired read support to show up as <em>expressed</em>, but are still otherwise supported by the read data. Therefore, filter cautiously and we don't recommend discarding such lowly expressed (or seemingly unexpressed) transcripts, but rather putting them aside for further study.</p></div>
</div>
<h2 id="_sample_data">Sample Data</h2>
<div class="sectionbody">
<div class="para"><p>Under <em>TRINITY_RNASEQ_ROOT/sample_data/test_Trinity_Assembly</em>, execute <em>runMe.sh 1</em> to build Trinity transcript assemblies using the sample data, and then run through the downstream alignment and abundance estimation steps.</p></div>
</div>
<div id="footer">
<div id="footer-text">
Last updated 2012-03-17 17:14:16 EDT
</div>
</div>
</body>
</html>