___
# Jovian analysis report
___

### Instructions:  
**- You can generate the report by pressing `Cell` in the toolbar and then selecting `Run All`. This can take a couple of minutes (depending on the size of your dataset). As long as the circle in the top right corner is not-empty (i.e. completely filled), the data is still being loaded.**  
<br>
**- If you want to show or hide the programming code, press the button with the eye in it on the toolbar above.**  
<br>
**- N.B. The sum total of reads in this report will not add up to the sum total number of reads that were supplied as input. This is because, 1) human reads are removed, 2) PCR-duplicate reads are removed.**  
<br>
You can view the graph displaying all Jovian's disparate processes by clicking [here](https://github.com/DennisSchmitz/Jovian/blob/assets/images/rulegraph_Jovian.png?raw=true).

In [None]:
%%bash
bin/scripts/set_symlink.sh

In [None]:
######################################
# Required packages for this script  #
######################################
import pandas as pd
import qgrid
import glob
import os

grid_options = {
    'fullWidthRows': True,
    'syncColumnCellResize': True,
    'forceFitColumns': False,
    'defaultColumnWidth': 100,
    'rowHeight': 23,
    'enableColumnReorder': True,
    'enableTextSelectionOnCells': True,
    'editable': True,
    'autoEdit': False,
    'explicitInitialization': True,
    'maxVisibleRows': 20,
    'minVisibleRows': 8,
    'sortable': True,
    'filterable': True,
    'highlightSelectedCell': True,
    'highlightSelectedRow': True
}

___
## Quality control metrics report (MultiQC):
[Open MultiQC graph in separate tab by clicking here](results/multiqc.html)  
___

In [None]:
%%HTML
<div style="text-align: center">
    <iframe src="results/multiqc.html" width=100% height=980></iframe>
</div>

### Read-based composition of analyzed samples:
[Open the barchart in separate tab by clicking here](results/Sample_composition_graph.html)  
<br>
**Low-quality** reads are those that did not meet the stringency settings as specified in the config file.  
**Unclassified** reads are those that could not be assigned to a taxa.  
**Remaining** reads are those that could not be assembled into contigs longer than the user specified minimum contig length. 

In [None]:
%%HTML
<div style="text-align: center">
    <iframe src="results/Sample_composition_graph.html" width=100% height=980></iframe>
</div>

___
## Metagenomics:
___


### Interactive metagenomics overview (Krona):
[Open Krona graph in separate tab by clicking here](results/krona.html)  

In [None]:
%%HTML
<div style="text-align: center">
    <iframe src="results/krona.html" width=100% height=980></iframe>
</div>

### Heatmaps:


#### Superkingdom heatmap
Open superkingdoms heatmap in a separate tab by clicking [here](results/heatmaps/Superkingdoms_heatmap.html).  

In [None]:
%%HTML
<div style="text-align: center">
    <iframe src="results/heatmaps/Superkingdoms_heatmap.html" width=100% height=700></iframe>
</div>

#### Virus heatmaps
Open virus heatmap in a separate tab by clicking [here](results/heatmaps/Virus_heatmap.html).  

**Please note, many viruses have no "`order`" taxonomic rank so always check the "`family`" taxonomic rank.**

In [None]:
%%HTML
<div style="text-align: center">
    <iframe src="results/heatmaps/Virus_heatmap.html" width=100% height=700></iframe>
</div>

#### Phage heatmaps
Open phage heatmap in a separate tab by clicking [here](results/heatmaps/Phage_heatmap.html).  

**Please note, many viruses have no "`order`" taxonomic rank so always check the "`family`" taxonomic rank.**

In [None]:
%%HTML
<div style="text-align: center">
    <iframe src="results/heatmaps/Phage_heatmap.html" width=100% height=700></iframe>
</div>

#### Bacteria heatmaps
Open bacteria heatmap in a separate tab by clicking [here](results/heatmaps/Bacteria_heatmap.html).  

In [None]:
%%HTML
<div style="text-align: center">
    <iframe src="results/heatmaps/Bacteria_heatmap.html" width=100% height=700></iframe>
</div>

### Classified scaffolds:

In [None]:
if os.path.exists("results/all_taxClassified.tsv"):
    ClassifiedScaffolds_df = pd.read_csv("results/all_taxClassified.tsv" , sep = "\t")
else:
    print("The file \"results/all_taxClassified.tsv\" does not exist. Either no scaffolds were classified, or something went wrong, please doublecheck the logfiles below:")
    print("\t\"logs/Merge_all_metrics_into_single_tsv_[sample_name].log\"")
    print("\t\"logs/Concat_files.log\"")
    ClassifiedScaffolds_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(ClassifiedScaffolds_df, show_toolbar=False, grid_options=grid_options)

### Unclassified scaffolds ("Dark Matter"):

In [None]:
if os.path.exists("results/all_taxUnclassified.tsv"):
    UnclassifiedScaffolds_df = pd.read_csv("results/all_taxUnclassified.tsv" , sep = "\t")
else:
    print("The file \"results/all_taxUnclassified.tsv\" does not exist. Either no scaffolds were unclassified, or something went wrong, please doublecheck the logfiles below:")
    print("\t\"logs/Merge_all_metrics_into_single_tsv_[sample_name].log\"")
    print("\t\"logs/Concat_files.log\"")
    UnclassifiedScaffolds_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(UnclassifiedScaffolds_df, show_toolbar=False, grid_options=grid_options)

___
## Predicted virus hosts:
___

In [None]:
if os.path.exists("results/all_virusHost.tsv"):
    virusHost_df = pd.read_csv("results/all_virusHost.tsv" , sep = "\t")
else:
    print("The file \"results/all_virusHost.tsv\" does not exist. Either no viral scaffolds had host information, or something went wrong, please doublecheck the logfiles below:")
    print("\t\"logs/Merge_all_metrics_into_single_tsv_[sample_name].log\"")
    print("\t\"logs/Concat_files.log\"")
    virusHost_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(virusHost_df, show_toolbar=False, grid_options=grid_options)

___
## Virus typing results:
___

The virus typing outputs are currently not automatically generated via Jovian due to overloading and crashing the web-service. A long-term solution is being worked on, as a work-around we've included scripts to generate these data separately. However, we kindly ask you to <u>**use this sparingly**</u> as to not overload and break the web-service. Instructions on how to generate these data can be found on the [website](https://github.com/DennisSchmitz/Jovian), [here](https://github.com/DennisSchmitz/Jovian/issues/29) and [here](https://github.com/DennisSchmitz/Jovian/issues/51).


### Norovirus typing tool output:  
[Link to the norovirus typing tool](https://www.rivm.nl/mpf/typingtool/norovirus/)  

In [None]:
if os.path.exists("results/all_nov-TT.csv") and os.path.getsize("results/all_nov-TT.csv") > 0:
    NoV_TT_df = pd.read_csv("results/all_nov-TT.csv" , sep = ",")
elif os.path.exists("results/all_nov-TT.csv") and os.path.getsize("results/all_nov-TT.csv") == 0:
    print("No viral scaffolds with species equal to \"Norwalk virus\" were found in this dataset.")
    NoV_TT_df = pd.DataFrame({'NA' : ["No", "Norwalk virus", "species", "scaffolds", "found"]})
else:
    print("The file \"results/all_nov-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    NoV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(NoV_TT_df, show_toolbar=False, grid_options=grid_options)

### Rotavirus A typing tool output:  
[Link to the Rotavirus A typing tool](https://www.rivm.nl/mpf/typingtool/rotavirusa/)  

In [None]:
if os.path.exists("results/all_rva-TT.csv") and os.path.getsize("results/all_rva-TT.csv") > 0:
    RVA_TT_df = pd.read_csv("results/all_rva-TT.csv" , sep = ",")
elif os.path.exists("results/all_rva-TT.csv") and os.path.getsize("results/all_rva-TT.csv") == 0:
    print("No viral scaffolds with genus equal to \"Rotavirus\" were found in this dataset.")
    RVA_TT_df = pd.DataFrame({'NA' : ["No", "Rotavirus", "genus", "scaffolds", "found"]})
else:
    print("The file \"results/all_rva-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    RVA_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(RVA_TT_df, show_toolbar=False, grid_options=grid_options)

### Enterovirus typing tool output:  
[Link to the enterovirus typing tool](https://www.rivm.nl/mpf/typingtool/enterovirus/)  

In [None]:
if os.path.exists("results/all_ev-TT.csv") and os.path.getsize("results/all_ev-TT.csv") > 0:
    EV_TT_df = pd.read_csv("results/all_ev-TT.csv" , sep = ",")
elif os.path.exists("results/all_ev-TT.csv") and os.path.getsize("results/all_ev-TT.csv") == 0:
    print("No viral scaffolds with family equal to \"Picornaviridae\" were found in this dataset.")
    EV_TT_df = pd.DataFrame({'NA' : ["No", "Picornaviridae", "family", "scaffolds", "found"]})
else:
    print("The file \"results/all_ev-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    EV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(EV_TT_df, show_toolbar=False, grid_options=grid_options)

### Hepatitis A typing tool output:  
[Link to the hepatatis A typing tool](https://www.rivm.nl/mpf/typingtool/hav/)  

In [None]:
if os.path.exists("results/all_hav-TT.csv") and os.path.getsize("results/all_hav-TT.csv") > 0:
    HAV_TT_df = pd.read_csv("results/all_hav-TT.csv" , sep = ",")
elif os.path.exists("results/all_hav-TT.csv") and os.path.getsize("results/all_hav-TT.csv") == 0:
    print("No viral scaffolds with genus equal to \"Hepatovirus\" were found in this dataset.")
    HAV_TT_df = pd.DataFrame({'NA' : ["No", "Hepatovirus", "genus", "scaffolds", "found"]})
else:
    print("The file \"results/all_hav-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    HAV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(HAV_TT_df, show_toolbar=False, grid_options=grid_options)

### Hepatitis E typing tool output:  
[Link to the hepatatis E typing tool](https://www.rivm.nl/mpf/typingtool/hev/)  

In [None]:
if os.path.exists("results/all_hev-TT.csv") and os.path.getsize("results/all_hev-TT.csv") > 0:
    HEV_TT_df = pd.read_csv("results/all_hev-TT.csv" , sep = ",")
elif os.path.exists("results/all_hev-TT.csv") and os.path.getsize("results/all_hev-TT.csv") == 0:
    print("No viral scaffolds with genus equal to \"Orthohepevirus\" were found in this dataset.")
    HEV_TT_df = pd.DataFrame({'NA' : ["No", "Orthohepevirus", "genus", "scaffolds", "found"]})
else:
    print("The file \"results/all_hev-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    HEV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(HEV_TT_df, show_toolbar=False, grid_options=grid_options)

### Human Papillomavirus typing tool output:  
[Link to the HPV typing tool](https://www.rivm.nl/mpf/typingtool/papillomavirus/)  

In [None]:
if os.path.exists("results/all_pv-TT.csv") and os.path.getsize("results/all_pv-TT.csv") > 0:
    PV_TT_df = pd.read_csv("results/all_pv-TT.csv" , sep = ",")
elif os.path.exists("results/all_pv-TT.csv") and os.path.getsize("results/all_pv-TT.csv") == 0:
    print("No viral scaffolds with family equal to \"Papillomaviridae\" were found in this dataset.")
    PV_TT_df = pd.DataFrame({'NA' : ["No", "Papillomaviridae", "family", "scaffolds", "found"]})
else:
    print("The file \"results/all_pv-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    PV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(PV_TT_df, show_toolbar=False, grid_options=grid_options)

### Flavivirus typing tool output:  
[Link to the flavivirus typing tool](https://www.rivm.nl/mpf/typingtool/flavivirus/)  

In [None]:
if os.path.exists("results/all_flavi-TT.csv") and os.path.getsize("results/all_flavi-TT.csv") > 0:
    Flavi_TT_df = pd.read_csv("results/all_flavi-TT.csv" , sep = ",")
elif os.path.exists("results/all_flavi-TT.csv") and os.path.getsize("results/all_flavi-TT.csv") == 0:
    print("No viral scaffolds with family equal to \"Flaviviridae\" were found in this dataset.")
    Flavi_TT_df = pd.DataFrame({'NA' : ["No", "Flaviviridae", "family", "scaffolds", "found"]})
else:
    print("The file \"results/all_flavi-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    Flavi_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(Flavi_TT_df, show_toolbar=False, grid_options=grid_options)

___
## Scaffold viewer:
**Containing: SNPs and minority variants (quasispecies), predicted ORFs, depth of coverage graph, GC contents graph**
___
N.B. Depending on the depth of coverage of the selected contig it can be <b>(very) slow, or downright crash your browser</b>. This is a <b>client-sided</b> problem, meaning, your computer isn't powerful enough.  

Open the scaffold viewer in a separate tab by clicking [here](results/IGVjs_index.html).  

In [None]:
!bash bin/includes/Start_nginx start

In [None]:
%%HTML
<div style="text-align: center">
    <iframe src="results/igv.html" width=100% height=980></iframe>
</div>

___
## Minority variant table:
___

In [None]:
if os.path.exists("results/all_filtered_SNPs.tsv"):
    filtered_VCF_df = pd.read_csv("results/all_filtered_SNPs.tsv" , sep = "\t")
else:
    print("The file \"results/all_filtered_SNPs.tsv\" does not exist. Either no SNP's were classified, maybe because you've set the minimum allele-frequency too high? Or something went wrong, please doublecheck the logfiles below:")
    print("\t\"logs/SNP_calling_[sample_name].log\"")
    print("\t\"logs/Concat_filtered_SNPs.log\"")
    filtered_VCF_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

qgrid.show_grid(filtered_VCF_df, show_toolbar=False, grid_options=grid_options)

___
# Logging and audit-trail: 
___

### Sample sheet

In [None]:
%%bash
cat sample_sheet.yaml

### Snakemake summary statistics
[Open Snakemake summary statistics in separate tab by clicking here](snakemake_report.html#stats)

In [None]:
%%HTML
<div style="text-align: center">
    <iframe src="results/snakemake_report.html" width=100% height=980></iframe>
</div>

### All log-files:

In [None]:
%%HTML
<script>
function goBack() {
    window.history.back()
}
</script>

<button onclick="goBack()">Click this button to go back</button>

<div style="text-align: center">
    <iframe src="results/logfiles_index.html" width=100% height=980></iframe>
</div>

### Full software list in "Jovian_master" environment:

In [None]:
%%bash
cat results/log_conda.txt

<br>  
### Database versions:

In [None]:
%%bash
cat results/log_db.txt

<br>  
### Unique methodological "fingerprint":

In [None]:
%%bash
cat results/log_git.txt

<br>  
### Snakemake config files:

In [None]:
%%bash
cat results/log_config.txt

___
# Acknowledgements:
___

|Name |Publication|Website|
|:---|:---|:---|
|BBtools|NA|https://jgi.doe.gov/data-and-tools/bbtools/|
|BEDtools|Quinlan, A.R. and I.M.J.B. Hall, BEDTools: a flexible suite of utilities for comparing genomic features. 2010. 26(6): p. 841-842.|https://bedtools.readthedocs.io/en/latest/|
|BLAST|Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. 25(17): p. 3389-3402.|https://www.ncbi.nlm.nih.gov/books/NBK279690/|
|BWA|Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997.|https://github.com/lh3/bwa|
|BioConda|Grüning, B., et al., Bioconda: sustainable and comprehensive software distribution for the life sciences. 2018. 15(7): p. 475.|https://bioconda.github.io/|
|Biopython|Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., ... & De Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422-1423.|https://biopython.org/|
|Bokeh|Bokeh Development Team (2018). Bokeh: Python library for interactive visualization.|https://bokeh.pydata.org/en/latest/|
|Bowtie2|Langmead, B. and S.L.J.N.m. Salzberg, Fast gapped-read alignment with Bowtie 2. 2012. 9(4): p. 357.|http://bowtie-bio.sourceforge.net/bowtie2/index.shtml|
|Conda|NA|https://conda.io/|
|DRMAA|NA|http://drmaa-python.github.io/|
|FastQC|Andrews, S., FastQC: a quality control tool for high throughput sequence data. 2010.|https://www.bioinformatics.babraham.ac.uk/projects/fastqc/|
|gawk|NA|https://www.gnu.org/software/gawk/|
|GNU Parallel|O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.|https://www.gnu.org/software/parallel/|
|Git|NA|https://git-scm.com/|
|igvtools|NA|https://software.broadinstitute.org/software/igv/igvtools|
|Jupyter Notebook|Kluyver, Thomas, et al. "Jupyter Notebooks-a publishing format for reproducible computational workflows." ELPUB. 2016.|https://jupyter.org/|
|Jupyter_contrib_nbextension|NA|https://github.com/ipython-contrib/jupyter_contrib_nbextensions|
|Jupyterthemes|NA|https://github.com/dunovank/jupyter-themes|
|Krona|Ondov, B.D., N.H. Bergman, and A.M. Phillippy, Interactive metagenomic visualization in a Web browser. BMC Bioinformatics, 2011. 12: p. 385.|https://github.com/marbl/Krona/wiki|
|Lofreq|Wilm, A., et al., LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. 2012. 40(22): p. 11189-11201.|http://csb5.github.io/lofreq/|
|Minimap2|Li, H., Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 2018.|https://github.com/lh3/minimap2|
|MultiQC|Ewels, P., et al., MultiQC: summarize analysis results for multiple tools and samples in a single report. 2016. 32(19): p. 3047-3048.|https://multiqc.info/|
|Nb_conda|NA|https://github.com/Anaconda-Platform/nb_conda|
|Nb_conda_kernels|NA|https://github.com/Anaconda-Platform/nb_conda_kernels|
|Nginx|NA|https://www.nginx.com/|
|Numpy|Walt, S. V. D., Colbert, S. C., & Varoquaux, G. (2011). The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), 22-30.|http://www.numpy.org/|
|Pandas|McKinney, W. Data structures for statistical computing in python. in Proceedings of the 9th Python in Science Conference. 2010. Austin, TX.|https://pandas.pydata.org/|
|Picard|NA|https://broadinstitute.github.io/picard/|
|Prodigal|Hyatt, D., et al., Prodigal: prokaryotic gene recognition and translation initiation site identification. 2010. 11(1): p. 119.|https://github.com/hyattpd/Prodigal/wiki/Introduction|
|Python|G. van Rossum, Python tutorial, Technical Report CS-R9526, Centrum voor Wiskunde en Informatica (CWI), Amsterdam, May 1995.|https://www.python.org/|
|Qgrid|NA|https://github.com/quantopian/qgrid|
|SAMtools|Li, H., et al., The sequence alignment/map format and SAMtools. 2009. 25(16): p. 2078-2079.|http://www.htslib.org/|
|SPAdes|Nurk, S., et al., metaSPAdes: a new versatile metagenomic assembler. Genome Res, 2017. 27(5): p. 824-834.|http://cab.spbu.ru/software/spades/|
|Seqtk|NA|https://github.com/lh3/seqtk|
|Snakemake|Köster, J. and S.J.B. Rahmann, Snakemake—a scalable bioinformatics workflow engine. 2012. 28(19): p. 2520-2522.|https://snakemake.readthedocs.io/en/stable/|
|Tabix|NA|www.htslib.org/doc/tabix.html|
|tree|NA|http://mama.indstate.edu/users/ice/tree/|
|Trimmomatic|Bolger, A.M., M. Lohse, and B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014. 30(15): p. 2114-20.|www.usadellab.org/cms/?page=trimmomatic|
|Virus-Host Database|Mihara, T., Nishimura, Y., Shimizu, Y., Nishiyama, H., Yoshikawa, G., Uehara, H., ... & Ogata, H. (2016). Linking virus genomes with host taxonomy. Viruses, 8(3), 66.|http://www.genome.jp/virushostdb/note.html|
|Virus-typing tools|Kroneman, A., Vennema, H., Deforche, K., Avoort, H. V. D., Penaranda, S., Oberste, M. S., ... & Koopmans, M. (2011). An automated genotyping tool for enteroviruses and noroviruses. Journal of Clinical Virology, 51(2), 121-125.|https://www.ncbi.nlm.nih.gov/pubmed/21514213|

#### Authors:
- Dennis Schmitz ([RIVM](https://www.rivm.nl/en) and [EMC](https://www6.erasmusmc.nl/viroscience/))  
- Sam Nooij ([RIVM](https://www.rivm.nl/en) and [EMC](https://www6.erasmusmc.nl/viroscience/))  
- Robert Verhagen ([RIVM](https://www.rivm.nl/en))  
- Thierry Janssens ([RIVM](https://www.rivm.nl/en))  
- Jeroen Cremer ([RIVM](https://www.rivm.nl/en))  
- Florian Zwagemaker ([RIVM](https://www.rivm.nl/en))  
- Mark Kroon ([RIVM](https://www.rivm.nl/en))  
- Erwin van Wieringen ([RIVM](https://www.rivm.nl/en))  
- Harry Vennema ([RIVM](https://www.rivm.nl/en))  
- Annelies Kroneman ([RIVM](https://www.rivm.nl/en))  
- Marion Koopmans ([EMC](https://www6.erasmusmc.nl/viroscience/)) 

___
Jovian is available on [GitHub](https://github.com/DennisSchmitz/Jovian) under a [AGPL license](https://www.gnu.org/licenses/agpl-3.0). The virus-typing tools are public services hosted by the [RIVM](https://www.rivm.nl/en) and developed independently of Jovian.
___