/
mg-workflow-2.Rmd
251 lines (188 loc) · 10.8 KB
/
mg-workflow-2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
---
title: "No 4. Assembly & Annotation Summary"
description: |
In this section of the workflow we summarize: **1**) QC & assembly results. **2**) Kraken short read taxonomy. **3**) Mapping results. **4**) Contig classification.
bibliography: assets/cite.bib
---
<details markdown="1">
<summary>Show setup information.</summary>
```{r setup, message = FALSE, warning = FALSE, results='hide'}
knitr::opts_chunk$set(collapse = TRUE)
pacman::p_load(kableExtra, DT, htmlwidgets, htmltools, magick,
install = FALSE, update = FALSE)
```
</details>
# Data Availability
All files generated in this workflow can be downloaded from figshare.
<iframe src="https://widgets.figshare.com/articles/12808502/embed?show_title=1" width="568" height="301" frameborder="0"></iframe>
File names and descriptions:
- **cc-kraken.html**: Short-read Kraken Krona plots for site Coral Caye (cc).
- **cr-kraken.html**: Short-read Kraken Krona plots for site Cayo Roldan Caye (cr).
- **water_kaiju_mar.html**: Contig Kaiju Krona plots against the mar database.
- **water_kaiju_nr.html**: Contig Kaiju Krona plots against the nr database.
# QC & Contig Stats
## QC Results
The first thing we should do is look at the results of the initial QC step. For each sample, anvi’o spits out individual quality control reports. Thankfully anvi’o also concatenates those files into one table. This table contains information like the number of pairs analyzed, the total pairs passed, etc.
```{r qc_table, echo=FALSE, layout="l-body-outset"}
qc <- read.table("tables/mg-workflow-2/qc-report.txt", sep = "\t", header = TRUE)
datatable(
qc, rownames = FALSE, autoHideNavigation=TRUE, width = "100%",
caption = htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"QC results"),
extensions = c("Buttons", "FixedColumns"),
options = list(
columnDefs = list(list(className = "dt-left", targets = 0)),
dom = "Btir", buttons = c("csv", "copy", I('colvis')),
scrollX = TRUE, scrollCollapse = TRUE, scrollY="300px", paging=FALSE,
fixedColumns = list(leftColumns = 1, rightColumns = 0)))
```
## Assembly Results
Next we can look at the results of the co-assembly, the number of HMM hits, and the estimated number of genomes. These data not only give us a general idea of assembly quality but will also help us decide parameters for automatic clustering down the road.
We can use anvi'o to generate a simple table of contig stats for this assembly.
```bash
anvi-display-contigs-stats 03_CONTIGS/WATER-contigs.db -o 03_CONTIGS/WATER-contig-stats -o contig-stats.txt
```
<br/>
```{r contig_table, echo=FALSE}
contig <- read.table("tables/mg-workflow-2/contig-stats.txt", sep = "\t", header = TRUE)
datatable(
contig, rownames = FALSE, width = "50%", colnames = c("Stat", "value"),
caption = htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Table 8.2: Assembly statistics"),
extensions = "Buttons",
options = list(
columnDefs = list(list(className = "dt-left", targets = 0)),
dom = "Blrtip", buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = TRUE, scrollY=TRUE,
scroller=TRUE, lengthMenu = c(5, 25)))
```
You can download a text version of the table using the table buttons.
# Krona Plots Explained
Next let’s take a look at the taxonomic breakdown from the KrakenUniq classification of short reads. There is a lot of data here, so we decided to present these as standalone HTML pages that contain separate [Krona](https://github.com/marbl/Krona/wiki) [@ondov2011krona] plots for each sample. In brief, a Krona plot allow hierarchical data to be explored with multi-layered pie charts. These charts are interactive. We will use these on a few occasions, so it is worth explaining in a little more detail.
Below is an example of a Kraken taxonomy plot. The plot on the left is the *top level* view and the plot on the right is the *expanded* view. Inner rings represent higher taxonomic ranks.
For example, the two most inner rings are *cellular organisms* and *viruses*. Taxonomic ranks decrease towards the outside. When a group is expanded the inner ring becomes the highest rank. For example, in the plot of the right we expanded the *Proteobacteria* (which is a phylum) so the classes form the inner rings.
:::l-body-outset
<figure>
<a href="figures/mg-workflow-2/krona-demo.png">
<img src="figures/mg-workflow-2/krona-demo.png" class="thumbnail">
</a>
<figcaption><strong>Example of a Krona plot</strong>. The **left panel** controls aspects of the plot. Search by taxon name, select a specific sample, control font size, etc. If you were to click once over a taxa on the left plot, the group is highlighted. Double-click to expand the group as seen on the right. Upper right corner provides a summary of the expanded group. </figcaption>
</figure>
:::
# Short-read Taxonomy
Since the Kraken classification was performed BEFORE the assembly we can look at the Krona plots for each individual sample. Here samples are separated by site.
<div class="row">
<div class="column">
<div class="krona" style="max-width:500px;">
<a href="figures/mg-workflow-2/cr-kraken.html" target="_blank">
<img src="figures/mg-workflow-2/cr-kraken.png" alt="HTML tutorial" style="width:90%">
<div class="container-krona">
<p>CAYO ROLDAN<br/><small>Hypoxic & Normoxic</small> </p>
</div>
</a>
</div>
</div>
<div class="column">
<div class="krona" style="max-width:500px;">
<a href="figures/mg-workflow-2/cc-kraken.html" target="_blank">
<img src="figures/mg-workflow-2/cc-kraken.png" alt="HTML tutorial" style="width:90%">
<div class="container-krona">
<p>CRAWL CAYE<br/><small>Normoxic </small> </p>
</div>
</a>
</div>
</div>
</div>
> Click on an image to explore the diversity plots.
Since these plots are web hosted, I decided to keep only the first 10 ranks from the classification files to make the files smaller*. To do this we can use the following command, which pulls out only the ranks we are interested in.
```bash
cut -f 1-10 WROL_1914-kraken.krona > WROL_1914-kraken-TRIM.krona
```
If we want to create a taxonomic summary table for the samples we can easily do that in anvio by accessing the `layer_additional_data` table from the merged profile database.
```bash
anvi-export-table 06_MERGED/WATER/PROFILE.db --table layer_additional_data -o water-layer_additional_data.txt
```
And then we simply parse out the class data and make a table.
<br/>
```{r class_table, echo=FALSE}
class <- read.table("tables/mg-workflow-2/kraken-class.txt", sep = "\t", header = TRUE)
datatable(
class, rownames = FALSE, autoHideNavigation=TRUE, width = "100%",
caption = htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Table 8.3: Number of reads by Class"),
extensions = "Buttons",
options = list(
columnDefs = list(list(className = "dt-left", targets = 0)),
dom = "Blfrtip", lengthMenu = c(10, 25, 55),
buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = FALSE, scrollY=FALSE,
scroller=TRUE, colReorder=TRUE))
```
You can download a text version of the table using the table buttons.
# Mapping Results
Let’s go ahead and look at the mapping results. This took a little swindling but in the end this crude approach worked. First we needed all the `bowtie.log` files from the `00_LOGS` directory and grab the first line of each file, which has the total number of *individual reads that were paired* (after QC) in each sample.
Just like we did above, we needed to grab the `layer_additional_data` from the merged profile databases. This table contains `total_reads_mapped`, `num_SNVs_reported`, and the taxonomic info from the short-read Kraken annotations.
```bash
anvi-export-table 06_MERGED/WATER/PROFILE.db --table layer_additional_data -o water-layer_additional_data.txt
```
What we want is a table with `sample`, `total_reads`, `total_reads_mapped`, and `num_SNVs_reported`. After a little fancy grep work in [BBEdit](https://www.barebones.com/products/bbedit/) (the free mode) we have a table the looks like this...
<br/>
```{r mapping_table, echo=FALSE}
mapping <- read.table("tables/mg-workflow-2/mapping-results.txt", sep = "\t", header = TRUE)
datatable(
mapping, rownames = FALSE, autoHideNavigation=TRUE, width = "100%",
caption = htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;","Table 8.4: Mapping results "),
extensions = "Buttons",
options = list(
columnDefs = list(list(className = "dt-left", targets = 0)),
dom = "Brti", buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = FALSE, scrollY=FALSE,
scroller=TRUE, colReorder=TRUE))
```
You can download a text version of the table using the table buttons.
# Contig Classification
Now we move on to the classification of contigs from the assembly. This means that we cannot look at individual samples because we co-assembled all of the data. We start with the Kaiju classification, again using the Krona plots. Remember we classified the contigs against both the `nr` and `mar` [databases](https://github.com/bioinformatics-centre/kaiju#creating-the-reference-database-and-index). Use the panels below to access the classifications for the assembly.
<div class="row">
<div class="column">
<div class="krona" style="max-width:500px;">
<a href="figures/mg-workflow-2/water_kaiju_mar.html" target="_blank">
<img src="figures/mg-workflow-2/kaiju-mar.png" alt="HTML tutorial" style="width:90%">
<div class="container-krona">
<p>Kaiju classification<br/><small>mar database</small> </p>
</div>
</a>
</div>
</div>
<div class="column">
<div class="krona" style="max-width:500px;">
<a href="figures/mg-workflow-2/water_kaiju_nr.html" target="_blank">
<img src="figures/mg-workflow-2/kaiju-nr.png" alt="HTML tutorial" style="width:90%">
<div class="container-krona">
<p>Kaiju classification<br/><small>nr database</small> </p>
</div>
</a>
</div>
</div>
</div>
> Click on an image to explore the diversity plots.
We will get into much more of the annotation data later, including the various functional annotations. For now though we have a pretty good idea of the shape and nature of these datasets.
<div class="post-nav">
<div class="post-nav-item">
<div class="meta-nav">Previous</div>
<a href="mg-workflow-1.html" rel="next">N<sup><u>o</u></sup> 3. Assembly & Annotations</a>
</div>
</div>
<div class="post-nav">
<div class="post-nav-item">
<div class="meta-nav">Next</div>
<a href="mg-binning.html" rel="prev">N<sup><u>o</u></sup> 5. Automated Binning</a>
</div>
</div>
## Source Code {.appendix}
The source code for this page can be accessed on GitHub by [clicking this link](https://github.com/hypocolypse/web/blob/master/mg-workflow-2.Rmd).
## Data Availability {.appendix}
Full versions of the Krona plots linked above can be downloaded directly from figshare at [doi:10.25573/data.12808502](https://doi.org/10.25573/data.12808502). Files are in html format and can be opened directly in a web browser.