# Jovian analysis report

In [1]:
######################################
# Required packages for this script  #
######################################
import pandas as pd
import qgrid
import glob
import os
from IPython.display import display, HTML, IFrame

grid_options = {
    'fullWidthRows': True,
    'syncColumnCellResize': True,
    'forceFitColumns': False,
    'defaultColumnWidth': 100,
    'rowHeight': 23,
    'enableColumnReorder': True,
    'enableTextSelectionOnCells': True,
    'editable': True,
    'autoEdit': False,
    'explicitInitialization': True,
    'maxVisibleRows': 20,
    'minVisibleRows': 8,
    'sortable': True,
    'filterable': True,
    'highlightSelectedCell': True,
    'highlightSelectedRow': True
}

## Quality control metrics report (MultiQC):
[Open MultiQC graph in separate tab by clicking here](results/multiqc.html)  

In [35]:
%%HTML
<div style="text-align: center">
    <iframe src="results/multiqc.html" width=100% height=980></iframe>
</div>

### Read-based composition of analyzed samples:
[Open the barchart in separate tab by clicking here](results/Sample_composition_graph.html)  
<br>
**Low-quality** reads are those that did not meet the stringency settings as specified in the config file.  
**Unclassified** reads are those that could not be assigned to a taxa.  
**Remaining** reads are those that could not be assembled into contigs longer than the user specified minimum contig length. 

In [36]:
%%HTML
<div style="text-align: center">
    <iframe src="results/Sample_composition_graph.html" width=100% height=980></iframe>
</div>

## Metagenomics:


### Interactive metagenomics overview (Krona):
[Open Krona graph in separate tab by clicking here](results/krona.html)  

In [37]:
%%HTML
<div style="text-align: center">
    <iframe src="results/krona.html" width=100% height=980></iframe>
</div>

### Heatmaps:

#### Superkingdom heatmap
Open superkingdoms heatmap in a separate tab by clicking [here](results/heatmaps/Superkingdoms_heatmap.html).  

In [38]:
%%HTML
<div style="text-align: center">
    <iframe src="results/heatmaps/Superkingdoms_heatmap.html" width=100% height=700></iframe>
</div>

#### Virus heatmaps
Open virus heatmap in a separate tab by clicking [here](results/heatmaps/Virus_heatmap.html).  

**Please note, many viruses have no "`order`" taxonomic rank so always check the "`family`" taxonomic rank.**

In [39]:
%%HTML
<div style="text-align: center">
    <iframe src="results/heatmaps/Virus_heatmap.html" width=100% height=700></iframe>
</div>

#### Phage heatmaps
Open phage heatmap in a separate tab by clicking [here](results/heatmaps/Phage_heatmap.html).  

**Please note, many viruses have no "`order`" taxonomic rank so always check the "`family`" taxonomic rank.**

In [40]:
%%HTML
<div style="text-align: center">
    <iframe src="results/heatmaps/Phage_heatmap.html" width=100% height=700></iframe>
</div>

#### Bacteria heatmaps
Open bacteria heatmap in a separate tab by clicking [here](results/heatmaps/Bacteria_heatmap.html).  

In [41]:
%%HTML
<div style="text-align: center">
    <iframe src="results/heatmaps/Bacteria_heatmap.html" width=100% height=700></iframe>
</div>

### Classified scaffolds:

In [18]:
if os.path.exists("results/all_taxClassified.tsv"):
    ClassifiedScaffolds_df = pd.read_csv("results/all_taxClassified.tsv" , sep = "\t")
else:
    print("The file \"results/all_taxClassified.tsv\" does not exist. Either no scaffolds were classified, or something went wrong, please doublecheck the logfiles below:")
    print("\t\"logs/Merge_all_metrics_into_single_tsv_[sample_name].log\"")
    print("\t\"logs/Concat_files.log\"")
    ClassifiedScaffolds_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})
display(HTML(ClassifiedScaffolds_df.to_html()))

Unnamed: 0,Sample_name,scaffold_name,taxID,tax_name,Avg._log_e-value,species,genus,family,order,class,phylum,kingdom,superkingdom,Avg_fold,Length,Ref_GC,Nr_ORFs,Covered_percent,Covered_bases,Plus_reads,Minus_reads,Read_GC,Median_fold,Std_Dev,scaffold_seq
0,ERR3482180,NODE_1_length_3152_cov_102.364959,146827.0,Corynebacterium simulans,-450.0,,Corynebacterium,Corynebacteriaceae,Corynebacteriales,Actinobacteria,Actinobacteria,,Bacteri,6526.8141,3152,0.5276,2.0,100.0,3152,120574,106885,0.5352,6570,3318.88,GGACGCAAAGCACAACAAAAATTAATCTCTCACGAAAAACACCAGCCAAAAACCGGTATGTTGTAGTAGTTCGGTGAATTAGTACCAGTCACCTCCACGCCTTACAACGCTTCCAGATCTGGCCTATCAACCCCATAGTCTCTAGGGCACCTAATAACGAAACCTCATCTCAAAACAGGCTTCCCGCTTAGATGCTTTCAGCGGTTATCCCTTCTACACGTAGCCAACCAGCCATGCCACGGGCGTGACAACTGGCACACTAGAGGTGTATCCATCCCGGTCCTCTCGTACTAGGGACAGCCTTTTTCAAGTTTCTACGCGCGCGGCGGATAGAGACCGAACTGTCTCACGACGTTCTGAACCCAGCTCGCGTGCCGCTTTAATGGGCGAACAGCCCAACCCTTGGGACCTACTCCAGCCCCAGGATGCGACGAGCCGACATCGAGGTGCCAAACCATCCCGTCGATATGGACTCTTGGGGAAGATCAGCCTGTTATCCCCGGGGTACCTTTTATCCGTTGAGCGACACCACATCCACAAGTAGGTGCCGGATCACTAGTCCCGACTTTCGTCCCTGCTCGAGCAGTAACTCTCACAGTCAAGCTTCCTTGTGCACTTACACTCAAAACCTGATTGCCAACCAGGCTGAGGAAACCTTTGGGCGCCTCCGTTACTCTTTAGGAGGCAACCGCCCCAGTTAAACTACCCACCAGGCACTGTCCCCAACCCAGATCATGGGCCAAGGTTAAGGTATCCGATCAAGCCAGAGTGGTATTTCAACAACGACTCCACACACACTAGCGTGCATGCCTCACAGTCTCCCACCTATCCTACACAAGCCAAACCGAACACCAATACCAAGCTATAGTGAAGGTCCCGGGGTCTTTTCGTCCTGCCGCGCGTAACGAGCATCTTTACTCGTACTGCAATTTCACCGGGCCTGTGGTTGAGACAGCAGGGGTCTCGTTACGCCATTCGTGCAGGTCGGAACTTACCCGACAAGGAATTTCGCTACCTTAGGATGGTTATAGTTACCACCGCCGTTTACTGGGGCTTAAATTCTCCGCTTCGACCCAAAAGATCTAACAGGTCCTCTTAACCTTCCAGCACCGGGCAGGCGTCAGTCCATATACATCAACTTAACGTTTTCGCATGGACCTGTGTTTTTGATAAACAGTCGGACCCCTCTATTCTCTGCGGCCCCACCAAGCAAAACCCGGTGGGGCTCCCCTTCTCCCGAAGTTACGGGGACATTTTGCCGAGTTCCTTAACCACAGTTCACCCGAACGCCTTAGTATTCTCAACCTGACTACCTGTGTCGGTTTAGGGTACGGGCCGAATATGCACTCGCTAGAGGCTTTTCTCGACAGCATAAGCACACCACATTCACCCAAAAAGGGCTACGCATCACGCCTCAGCAAACAGTGCCCGGATTTACCAATGGCACCAACCTCAACGCTTACACCAACAATCCAATAAGCGGCGTGGCTGCCCCACTGCGTCACCCCATCACTTGGACCACAGATCAGGCCCCACAAACTCAACCCATCAACACACTCAAAGAGCATAATCAATGAGTCTCACACGTGGTTAGTATCACTGCTTTACCATTGACGCACACACTCGGGTACCAGAATATCAACTGGTTATCCATCGACTACGCCTGTCGGCCTCGCCTTAGGTCCCGACTCACCCTGGGAAGACGAACTTGACCCAGGAACCCTTAGTCATCCGGCGGTAAGGATTCTCACCTTACAATTCGTTACTCATGCCTGCATTCTCACTCGCACACAATCCACACGTGTTCACACTCATGCTTCACCTCATGCACGACGCTCCCCTACCCAACTAAAAAATAGTTGCCGCGGCTTCGGCGGTGTACTTGAGCCCCACTACATTGTCGGCGCAGAACCACTCGACCAGTGAGCTATTACGCACTCTTTCAAGGATGGCTGCTTCTAAGCCAACCTCCTGGCTGTCATCGCGATCCCACATCCTTTTCCACTTAGTACACCCTTAGGGGCCTTAGCCGGCGATCTGGGCTGTTTCCCTCTCGACTATGAAGCTTATCCCCCACAGTCTCACTGCCGTAGAACACTTCAATGGCATTCGGAGTTTGTCTGATGTTGCTAAGATGATAGTCCCGCTCAACCAAACAGTAGCTCTACCTCCACCAAGCTAACCTACGACGCTGCACCTAAATGCATTTCGGGGAGAACCAGCTATCACGGAGTTTGATTGGCCTTTCACCCCTACCCACAGCTCATCCCCGCAGTTTTTAACCTACGTGGGTTCGCGCCTCCACAACCTCTTACAGCTGCTTCACACTGGCCATGGGTAGATCACCCCGCTTCGGGTCCAGGACATGCCACTACAACACCCTAATTAGGATTCGCTTTCGCTACGACTACCCCACACGGGTTAACCTCGCGACATGCCGCTGACTCGCAGGCTCATTCTTCAAAAGGCACGCCATCACACAACAAAAGGTGCTCTGACGGATTGTAAGCACATGGTTTCAGGAACTATTTCACTCCCCTCCCGGGGTACTTTTCACCATTCCCTCACGGTACTATCCGCTATCGGTCACACTAAGTATTTAGGCTTACCGGGTGGTCCCGGCAGATTCACGGCAGATTCCACGAGCCCGCCGCTACTCGGGAAACAACAACCAACACGCACATGCCTTCAACTACAGGACTCTCACCTACTCCGGTGGGCCATTCCAAACCACTTCACCTAACACGCACGCACCAGCACAGCCATGGTAGTGACTGAACAATTGCTCCCAACAACCCCACACATGCAACCCCTACCAGGTATCACACACATGCAGTTTAGCCAACATTCGCGTTCGCTCGCCGCTACTAACGAAATCACTATTGTTTCCTTCTCCTACGGGTACTGAGATGTTTCACTTCCCCGCGTAACCCCCATACTGACTATAAATTCACCAGCAGGTGACTGCGAATAACCACAGCCGGGTTTCCCCATTCGGACATCCTCGGATCAACGCTCGATTGACAACTCCCCGAGGCATAACGCAGCCTTCCACGTCCTTCATCGGCTTAGCATGCCAAGGCATCCACCATGTGCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGCATACATCTCGTATGCCG
1,ERR3482180,NODE_2_length_2930_cov_239.887977,1313.0,Streptococcus pneumoniae,-450.0,,Streptococcus,Streptococcaceae,Lactobacillales,Bacilli,Firmicutes,,Bacteri,14200.8263,2930,0.4969,2.0,100.0,2930,221729,221624,0.5017,13413,7497.25,ATTCAATTTAAGAAAACCGTTCGCTTTCATATTCTCAGTTACTTTGGATAAGTCCTCGAGCTATTAGTATTAGTCCGCTACATGTGTCACCACACTTCCACTTCTAACCTATCTACCTGATCATCTCTCAGGGCTCTTACTGATATATAATCATGGGAAATCTCATCTTGAGGTGGGTTTCACACTTAGATGCTTTCAGCGTTTATCCCTTCCCTACATAGCTACCCAGCGATGCCTTTGGCAAGACAACTGGTACACCAGCGGTAAGTCCACTCTGGTCCTCTCGTACTAGGAGCAGATCCTCTCAAATTTCCTACGCCCGCGACGGATAGGGACCGAACTGTCTCACGACGTTCTGAACCCAGCTCGCGTGCCGCTTTAATGGGCGAACAGCCCAACCCTTGGGACCGACTACAGCCCCAGGATGCGACGAGCCGACATCGAGGTGCCAAACCTCCCCGTCGATGTGAACTCTTGGGGGAGATAAGCCTGTTATCCCCAGGGTAGCTTTTATCCGTTGAGCGATGGCCCTTCCATACGGAACCACCGGATCACTAAGCCCGACTTTCGTCCCTGCTCGAGTTGTAGCTCTCGCAGTCAAGCTCCCTTATACCTTTACACTCTGCGAATGATTTCCAACCATTCTGAGGGAACCTTTGGGCGCCTCCGTTACCTTTTAGGAGGCGACCGCCCCAGTCAAACTGCCCGTCAGACACTGTCTCCGATAGGGATCACCTATCTGGGTTAGAGTGGCCATAACACAAGGGTAGTATCCCAACATCGTCTCCTTCGAAACTGGCGTCCCGATCTCGTAGACTCCTACCTATCCTGTACATGTGGTACAGACACTCAATATCAAACTGCAGTAAAGCTCCATGGGGTCTTTCCGTCCTGTCGCGGGTAACCTGCATCTTCACAGGTACTAAAATTTCACCGAGTCTCTCGTTGAGACAGTGCCCAAATCATTACGCCTTTCGTGCGGGTCGGAACTTACCCGACAAGGAATTTCGCTACCTTAGGACCGTTATTGTTACGGCCGCCGTTTACTGGGGCTTCAATTCATACCTTCGCTTACGCTAAGCACTCCTCTTAACCTTCCAGCACCGGGCAGGCGTCACCCCCTATACATCATCTTACGATTTAGCAGAGAGCTGTGTTTTTGATAAACAGTTGCTTGGGCCTATTCACTGCGGCTGACTTTAAGTCAGCACCCCTTCTCCCGAAGTTACGGGGTCATTTTGCCGAGTTCCTTAACGAGAGTTCTCTCGCTCACCTGAGGCTACTCGCCTCGACTACCTGTGTCGGTTTGCGGTACGGGTAGAGTATGTTTAAACGCTAGAAGCTTTTCTTGGCAGTGTGACGTCACTAACTTCGCTACTAAACTTCGCTCCCCATCACAGCTCAATGTTATAGAATTAAGCATTTGACTCAATTCACACCTCACTGCTTAGACAGACTCTTCCAATCGTCTGCTTTAGTTAGCCTACTGCGTCCCTCCATCACTACATACTCTAGTACAGGAATATCAACCTGTTGTCCATCGGATACACCTTTCGGTCTCTCCTTAGGTCCCGACTAACCCAGGGCGGACGAGCCTTCCCCTGGAAACCTTAGTCTTACGGTGGACAGGATTCTCACCTGTCTTTCGCTACTCATACCGGCATTCTCACTTCTATGCGTTCCAGCACTCCTCACGGTATACCTTCATCACACATAGAACGCTCTCCTACCATACCTATAAAGGTATCCACAGCTTCGGTAAATTGTTTTAGCCCCGGTACATTTTCGGCGCAGGGTCACTCGACTAGTGAGCTATTACGCACTCTTTGAATGAATAGCTGCTTCTAAGCTAACATCCTAGTTGTCTGTGCAACCCCACATCCTTTTCCACTTAACAATTATTTTGGGACCTTAGCTGGTGGTCTGGGCTGTTTCCCTTTCGACTACGGATCTTAGCACTCGCAGTCTGACTGCCGACCATAATTCATTGGCATTCGGAGTTTATCTGAGATTGGTAATCCGGGATGGACCCCTCACCCAAACAGTGCTCTACCTCCAAGAATCTCTAATGTCGACGCTAGCCCTAAAGCTATTTCGGAGAGAACCAGCTATCTCCAAGTTCGTTTGGAATTTCTCCGCTACCCACAAGTCATCCAAGCACTTTTTAACGTGCCCTGGTTCGGTCCTCCAGTGCGTCTTACCGCACCTTCAACCTGCTCATGGGTAGGTCACATGGTTTCGGGTCTACGTCATGATACTAAGGCGCCCTATTCAGACTCGGTTTCCCTACGGCTCCGTCTCTTCAACTTAACCTCGCATCATAACGTAACTCGCCGGTTCATTCTACAAAAGGCACGCTCTCACCCATTAACGGGCTCGAACTTGTTGTAGGCACACGGTTTCAGGTTCTATTTCACTCCCCTCCCGGGGTGCTTTTCACCTTTCCCTCACGGTACTGGTTCACTATCGGTCACTAGGGAGTATTTAGGGTTGGGAGATGGTCCTCCCAGATTCCGACGGGATTTCACGTGTCCCGCCGTACTCAGGATACTGCTAGGTACAAAGACTATTTTAAATACGAGGCTATTACTCTCTTTGGCTGATCTTCCCAAATCATTCTTCTATAATCTTTGAGTCCACATTGCAGTCCTACAACCCCGAAGAGTAAACTCTTCGGTTTGCCCTTCTGCCGTTTCGCTCGCCGCTACTAAGGCAATCGCTTTTGCTTTCTCTTCCTGCAGCTACTTAGATGTTTCAGTTCACTGCGTCTTCCTCCTCACATCCTTAACAGATGCGGGTAACAGGTAGTACCTGTTGGGTTCCCCCATTCGGAAATCCCTGGATCATCGCTTACTTACAGCTACCCAAGGCATATCGTCGTTTGTCACGTCCTTCTTCGGCTCCTAGTGCCAAGGCATCCACCGTGCGCC
2,ERR3482180,NODE_3_length_2912_cov_2857.685099,480.0,Moraxella catarrhalis,-450.0,,Moraxella,Moraxellaceae,Pseudomonadales,Gammaproteobacteria,Proteobacteria,,Bacteri,157955.9001,2912,0.5055,1.0,100.0,2912,2443114,2450709,0.5121,65535,11840.58,ATACGAGATGTATGCTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTATGAAGTGCACATGGTGGATGCCTTGGCAGTCAGAGGCGATGAAAGACGTGATAGCCTGCGATAAGCGTCGGTGAGGTGGCAATATCCTGTGACCCGGCGATTTCTGAATGGGGAAACCCAACCAACACAAGTTGGTTATTACACAGTTTACTGTGTAAGGCAAACCGGGAGAAGTGAAACATCTCAGTACCCCGAGGAAAAGACATCAAATGAGATTCCGTAAGTAGCGGCGAGCGAACACGGAGGAGCCGATCAATTTTACAGTAGCAAAATGGCCTGGGAAAGCCAACCATAGTAGGTGATAGTCCTGTATGCGAAACTGTTTAAGCGACATATTAAGTAGGGCGGAACACGAGAAATTCTGTCTGAAGATGGGGGGACCATCCTCCAAGGCTAAATACTCCTGACTGACCGATAGTGAACCAGTACCGTGAGGGAAAGGCGAAAAGAACCCCTGTTAGGGGAGTGAAATAGAACCTGAAACCGTGTGCATACAAGCAGTCGGAGCGGACTTGTTCCGTGACGGCGTACCTTTTGTATAATGGGTCAGCGACTTATATTCTGTAGCAAGGTTAACCGAATAGGGGAGCCGTAGGGAAACCGAGTCTTAATAGGGCGAATGAGTTGCAGGGTATAGACCCGAAACCGAGTGATCTATCCATGAGCAGGTTGAAAGTGCCGTAACAGGCACCGGAGGACCGAACCCACTGTCTTGAAAAGCCAGGGGATGACTTGTGGATAGGGGTGAAAGGCTAATCAAACTCGGTGATAGCTGGTTCTCCCCGAAAGCTATTTAGGTAGCGCCTCGGACGAACACCATTGGGGGTAGAGCACTGTTTCGGCTAGGGGGTCATCCCGACTTACCAAACCGATGCAAACTCCGAATACCGATGAGTGATATCCGGGAGACAGACGGCGGGTGCTAACGTCCGTCGTCAAGAGGGAAACAACCCAGACCGCCAGCTAAGGCCCCAAATTCCTAGTTAAGTGGGAAACGATGTGGGAAGGCACAGACAGCTAGGAGGTTGGCTTAGAAGCAGCCACCCTTTAAAGAAAGCGTAATAGCTCACTAGTCGAGTCGGCCTGCGCGGAAGATGTAACGGGGCTCAAACTAGGAGCCGAAGCTGCGGATTTAATTGTTTCAATTAAGTGGTAGGGGAGCGTTGTGTAAGCCTGTGAAGGTGCACTGTAAGGTGTGCTGGAGGTATCACAAGAGCGAATGCTGACGTGAGTAACGACAAAACGGGTGAAAAGCCCGTTCGCCGGAAGACCAAGGGTTCCAGTCCAACGTTAATCGGGGCTGGGTGAGTCGACCCCTAAGGCGAGGCCGAAAGGCGTAGTCGATGGGAAATCGGTTAATATTCCGATACTTGTTTATGATGCGATGGAGGGACGGAGAAGGTTATGCCAGCCTGGCGATGGTTGTCCAGGTGGAAGGATGTAGTTAGACTGAGTAGGCAAATCCGCTCGGTTATTAATGAGATCTGATAGCAAGCCAGTTTACTGGCGAAGTGGCAAATACCCTGCTTCCAGGAAAAGCTTCTAAGCGATAGTCATAAACGAATCGTACCCGAAACCGACACAGGTGGTCAGGTAGAGAATACCAAGGCGCTTGAGAGAACTCTGCTGAAGGAACTAGGCAAAATGGTACCGTAACTTCGGGAGAAGGTACGCTGCCGATGGTGATGAAACTTGCTTTCTGAGCTGTTGGCAGTCGCAGATACCAGGCTGCTGCAACTGTTTATTAAAAACACAGCACTCTGCAAACACGAAAGTGGACGTATAGGGTGTGATGCCTGCCCGGTGCTGGAAGGTTAATTGATGGGGTTAGCGTAAGCGAAGCTCTTGATCGAAGCCCCAGTAAACGGCGGCCGTAACAATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCTGCACGAATGGCATAATGATGGCAGCGCTGTCTCCAGCAGAGACTCAGTGAAATCGAAATCGCAGTGAAGATGCTGTGTACCCGCGGCTAGACGGAAAGACCCCGTGAACCTTTACTACAGCTTTACATTGAACTTTGACCTAACTTGTGTAGGATAGGTGGGAGGCTTTGAAGCAGATACGCCAGTATCTGTGGAGCCAACCTTGAAATACCACCCTGGTTATGTTGGGGTTCTAACTTGGATATAACAGTATCAAGGACAATGTATGGTGGGTAGTTTGACTGGGGCGGTCTCCTCCTAAAGAGTAACGGAGGAGTACGAAGGTGCGCTCAGAACGGTCGGAAATCGTTCAAAGAGTATAAAGGCAAAAGCGCGCTTAACTGCGAGACCCACAAGTCGAGCAGGTACGAAAGTAGGTCTTAGTGATCCGGTGGTTCTGTATGGAAGGGCCATCGCTCAACGGATAAAAGGTACTCTGGGGATAACAGGCTGATACCGCCCAAGAGTTCATATCGACGGCGGTGTTTGGCACCTCGATGTCGGCTCATCTCATCCTGGGGCTGAAGCAGGTCCCAAGGGTATGGCTGTTCGCCATTTAAAGAGGTACGCGAGCTGGGTTTAGAACGTCGTGAGACAGTTCGGTCCCTATCTACCGTGGGCGTTGGAAATTTGAGAGGATCTGCTCCTAGTACGAGAGGACCAGAGTGGACGAACCTCTGGTGTTTCGGTTGTCACGCCAGTGGCATTGCCGAGTAGCTACGTTCGGATGGGATAACCGCTGAAAGCATCTAAGCGGGAAGCCCACCTCAAGATAAGATTTCCCTAAGAGCCGTTGTAGACTACGACGTTGATAGGTTGGGTGTGGAAGTGTAGTGATACATGTAGCTAACCAATACTAATTGCTCGTTTGGCTTGACCAGTGGCATTGCCGAG
3,ERR3482180,NODE_4_length_2461_cov_0.218081,480.0,Moraxella catarrhalis,-450.0,,Moraxella,Moraxellaceae,Pseudomonadales,Gammaproteobacteria,Proteobacteria,,Bacteri,10.9033,2461,0.4397,2.0,94.0675,2315,150,147,0.4446,8,8.14,TGGCAAGTTTTTCTACAAATGATTTTAAAGCAGGATTGAAAGTAATGTTGGATAGGAATCCCTGTTCCATCCTTGAAAATGAATATGTAAAACCGGGTAAAGGTCAAGCTTTTAACCGTGTTAAGTTGCGTAATCTACGCACGGGCAAAGTGCTTGAGCAGACTTTTAAGTCAGGCGATTCACTTGAAGGTGCTGATGTGGTGGATGTGGAAATGAACTATCTGTACAACGATGGCGAATTTTGGCATTTTATGCACCCTGAAACCTTTGAGCAAATGCAAGCTGACAAAACGGCGATGGGCGATACAGTGCAATGGCTAAAAGAGAACTCAAATGCTCGTTGCACCATTACCTTGTTTAACGGCTCACCGCTGTCGGTTACTCCGCCGAACTTTGTGGAACTTGAAATCGTTGAAACCGACCCTGGCGTGCGTGGCGATACGTCAGGCGGCGGCGGTAAACCTGCCAAATTAGAAACAGGGGCAGTGGTACGTGTACCACTATTTGTCCAACAAAATGAAGTGGTCAAAGTTGATACTCGCACGGGTGAATATCTGTCTCGTGTGTGATGGTGTGATGAATGCCAAAAAACAGTCCATATATATGGGCTGTTTTTTTGTGCTGATTTAGGTATGAAAAAACCCAACCTTCAAGCGTGAAAGTTGGGTGTGGAGAAAAGCACGATATTTTAAAAGTCAGTTGGCTCATCAGCCAAATCAGTAAGTAAAGTTCACGCCAACACGGAAATCACGACCCGCCCCAGGTAGGGTATCTATGGAAGCACGCTGAGTGTGCGGACGATAGTTTTTATTAAAGACATTATTCACCGCAAAGTTCACATTCACCTTATCGTTGCCATAAGGCTTCCAGTTGGCATAGATGTCGCTGACATTATAGCCTTCACGGTTAAGAATAGCACCATCTTTATCATCAAATTTACCGCTCATGATGGAGGTGTCTTTGGCATCTACCCCTTCAACCAATGTGTGTCTCACACCAAGCTCCACGCTTGGGTTTGGCAGGCGATAGGCGACATCTGCTGTCCATGTACGACCTGTACGCACGGCATATTCAGGGTTAGCTTTAAATTTGACATTGTGCATGGTTGGTTTGCTGTGAGCAACGCCAGCACGCGCACGCCACGCCCCATCAGCATAGCCTACGCCCAGCTCATAACCGTGGTTGGTCTGATGACCTTGGTTACCAAGATAGGTAATCTGAGCTGTATTATCATTGGGGTGTTGTATAGCATCAGCAGCAGCTCTGGCATTATTAACACGCTGCCAAAAATAGCTACCAAAGGCGGTATATGGCCCATTATTATAGTTAAAACCGATCTCGGTATTGCGTGCTTTTTCTGCTTTTGCGTTATCAGCAATGCTGATCACACCATCACGGAAGCCACGACTTAGGATGGCATCAGCAAAGCGGGGGCTGCGAGTGGCATAGATTAGGTTACCGCTGACGCTTAGGTTAGGATTGACATCATAAATCACCCCAAAGCTTGGGTTGATGTCGGTTTTGCCAACCTTTTTACCGTCCATGGCTTTAAAGTCAAAACGGTCAACACGCACACCTGTATTGATGGTAAAGTCATTGATTTGGTTAATCGCTTCTAGATAAATGCCAGCATCGGTTTTTTCTTGATGCACCATACCAGGTTTGAGACTTCTGGGCGGTACGGCTTCTTGATGGCGATAGTTGATACCATATTTTAATAAAGTATGGTCAACACCAAAGCCTTTTAATAAAAGGTGGTTAAACTCAGTGTCAAAGTTTAGATTTGCCCCTTTGGCTACCACATGCATGCTTGATGGTGTATCAACATTATCTATTAACTCCTTGGGTACATCTTTGGTGATATAGGTGTTACCTTTATCGCTGGAAGAGTTACGTCCATGTTCTAACTTATAGACATTGGCAGTAACTTCACTTGCAAAGCCAAGGTTTTTAGCTGTCCATTCAAGGTTGGTTAACTTCTGGTAGGTTTCACGCTGAGTTGGGTTGTTCCTATCTAAATCAGCCACCAATTTACCATTGGCATCAACCACATTATACTTTTTACCATTTGGTGTTTTACTGCCAAGTTGGTAACCCTTGCCTGCATATTTTTTATCTGTATTCAATATTGCCTGCAATTCCGCTTCGGTATACTTTTGCTTAGTTTCTTTATCTTTTTCTATATCTAGTGTCAAGACACGATTGGCGAAGTCAAACTCTTCACGCACACCACGAATGCCTTTATGCGCTTCATTTAGATGGCTGACCACAAATCGATGATCGCCAGCGGTTAGCCCTGCTTTGACAAGGTAACTGGATTTATCCAAGGCACTTCTTGTAACCACGTCATTGCCCAACAGATTGGTGTACCCTTTACCGCCTTTATAATCACCATCATTTACTTGATTATAAGAGACAAGGGCATCTACCTGACCAAATCCTGTCTGTGCTTTACCAAAGG
4,ERR3482180,NODE_5_length_2322_cov_0.174487,1313.0,Streptococcus pneumoniae,-450.0,,Streptococcus,Streptococcaceae,Lactobacillales,Bacilli,Firmicutes,,Bacteri,8.2205,2322,0.363,4.0,100.0,2322,100,101,0.3736,8,3.92,AAACCAGCATTTTCATGGTTAATCAAGACATCAATCTCTGAATCTGACTCATACAGCTCTTGTAGTTTTTCACCCGTACCGTTATTAGAAAAATTATCAATAATAACGATCTGCTTTGTATTGGAATTGCCTTCTTTTATTGATTTAACACAAGAAATAGTTTCCTCTAAGACTTTGTAATGCAAAATAATATAACAAAACAAATTATTGTCCTCCTAAATGCGATTTCTTTTCATCATCACTAGGCTCTTCGTGAAGATAGTTTTCTCCGTAAAGTCTTTTCAGATAGCGGTCATACTCCGTGGGAACTTTAAAGAAGGCATCCTCAAAAGGCTTCTCTACTCCTTCTCCCCAAATCTCTACTGGCATGACTTCTTTGGCAAACAGAGATGCATTAATTGTCCCAGCATACTTCTGCTTTTTCCAATCATACTTTTTATAGAGATTATCCAAGCGTCTGTACACATAATTTTGAGGGAATAGTTTTGCAAATCCTAGCTTTTTAAAGAAACCAATGATTAGTTTTTGCTTAGCGGAATGCATATCCACATGGTCTCCAACATAGGTAGTTCCTAAACTTGCTAAAAAACGATACCAGTATACTTTACCGAAGTAAAGCTTTCTTAAAAATGAATGATTTGGGGCACCATCCAAAGGAATGATATCAATCAAATGCAAACCTAAATTGGTATTTCGTGGCAATCCCAAACGTTTTCTTTCATCTTCTAAAAGGAATAAGCGAGGAAAGTAGCAATGTAAAGTATCGCAATATTGATAAGCAAGAACCTGATATTTGCCTGCGATAATTCTATCCTTAAAGATACCTGGAAGTTTATCGTAGCCTTCACGAGGCACAGCAATATCCATGTCATCATCCCATGGAATAAAGCCATCATATTTGACTGCCCCAAGTACACTCCCCCCACGAAGGAAGAAATCTATATTGTTCTCTTTACAAATTTTTTGAAACTCTTTTATAGCATCCAATTCAACTTTCTGAATCAATTTAACTATTTCTGTTTGGTTCAAAGTTTATATCCCCTTTTCGCTCTTAAAACTTTTCTTATAAATGAAACATTCCCTACAAAATAACGCTTAAAGAGACGTTTAGGCTCATTTGCCACACGAAACAACCACTCTAGGTGAGCATTTTGCATCCATAATGGAGCTCGTTTGATATGCCCTGATAGTACATCAAAACTTCCACCAACCCCCATAAACACAGCATTAATTCCATTATCCATGAATGATTGGATAATATATTCTTTCTTAGGAGAAGTAATTCCTACAAACACAAAATCTGGTTTTTTCTCTTTAATATCTTCCTGAATATGTTTTTCATCTTCAGGCGAAAAATATCCATTTCTATAGCCAACAACATTTAGAGCTGGATATCTTTCTTCAAAAATTTTTAACATATCTTGTAAAACGTCTTCTTTTGCTCCAAAAAAGTAGACCGAGTATCCTTTTTCATTAGACAACTCTAGTAAATGTTTCATCAAGTCTATTCCAGCAACTCGCTCAGGAACTGGTGTTCCTAAAAACTTACTCGCTAAAACTACTGAAGCACCGTCGGCATTAATTATTCCTGATTCATTAACGATTTTTTTGATTTTCTCATCTTCATAGCACTGATTAATTTTATCCGCGTTAACTCCCATCAAGTGCAAAGGGTGCTTCTTTAAAACATACTGTTCAACAGCATCCACTGTTTCTTTCATCGTTAATGGATCTATAGTAATTCCTAATATTTGAATTCTCTCATTCATATGAACTATTTCGCTCCATCCTTCATCAATACAACCTTAACTGTCTTCAATAAAATTTCAATATCTTTCCAGATTGTCCAACCATCAATATAGGCCACATCTAATTTGACAACTTCATCGAAATTTTTGATTTCACTTCGTCCGCTGACCTGCCATAAACCTGTTATGCCAGGTTTAAAACTTAGGCGACGTTTTTGTTCTGGGGTATAGTGCTCATACTCGTCCACTGTTGGTGGTCGTGTACCTACCAAACTCATATCTCCCTTTAGAACATTATAAAACTGTGGTAGCTCGTCCAAGCTCGTCTTCCGTATAAAATGACCAATTTTCGTGATACGAGGATCATCGTCCACCTTAAACATTCCACCCTGCATGGTATTTTGTTCCATGAGTTCTCTTTTTTTCTCCTCAGCATCCACACACATAGAGCGAAACTTATAAAAAGTAAAATGGCGGCCATTTTTTCCTATACGCGTCTGAGCAAAAAAAGCAGAACCCCCATCCTTTCGAATCAAAGGAACCAGCACAATACTGAGATCGGAAGAGCACACGT
5,ERR3482180,NODE_6_length_2092_cov_0.139949,480.0,Moraxella catarrhalis,-450.0,,Moraxella,Moraxellaceae,Pseudomonadales,Gammaproteobacteria,Proteobacteria,,Bacteri,6.7299,2092,0.4493,1.0,93.7859,1962,77,79,0.4554,6,4.46,AGGATTACCATTTTCATCATATGGCATATCCTCAACTGGCATGATGCGTGATACAACCCCTTTGTTACCATGGCGGCCTGCCATCTTGTCACCTGGCTGGATACGACGCTTAACCGCCAAATACACCTTAACGATCTTCTGAACGCCATGTGCTAAGTCATCACCTTGAGTTAATTTGCGTTTTTTCTCGGTAAATTTATTATCGATATCTTTTTGCTTATCACTTAAAAACTCAGCAATCTGACCCAAACGCTCCGAAACTTCCTCTTCAGCAGGCTGAATATCTAGCAAGCTTTCTAGTGGCAAATTTTCGATGTCTTTAGTGCTTAGAATCATACCCGCTCTAAAGCCTGCACCACCGCTGACTTTTTTACCATCAAGTAAGCTTAAAATACGACCTTTTGCAGCTGCTTCAAAAATCACCAGCTCTTCTTTTAAATCTTTACGGTAATCATCAAGCATGGCTTTTTCAATAGATTTGGCACGGCTGTCTTTTTCAAGACCGTCACGCGTAAAGACTTGCACATCAATGACCGTACCTTTGGTTGATGATGGTACGCGCAAAGAGGTATCTTTAACATCCGCCGCTTTTTCACCAAAAATTGCACGAAGCAGTTTTTCTTCTGGAGTAAGCTGACTTTCACCTTTTGGTGTTACTTTACCAACCAAAATATCACCAGCATCGACCTCAGCACCGATATAAACAATACCTGCTTCATCAAGATTTGCCAACGCAGCCTCACCGACATTTGGAATATCGCCTGTAATCTCTTCTGGCCCAAGTTTGGTATCACGAGCCACACAAGTTAACTCTTGAATGTGAATCGTGGTGAAACGGTCTTCTTGAACCACTCGCTCAGAGAGCAAAATAGAGTCCTCAAAGTTATACCCATTCCATGGCATAAACGCCACACGCATGTTTTGACCCAAAGCAAGTTCACCCAAATCAGTAGATGGGCCATCAGCCAAGATATCACCACGAGCAATCACATCGCCAGCATTGACAATGATACGTTGGTTAATACAGGTGTTTTGGTTGGAGCGAGTGTATTTGATGAGGTTATAAATATCAATACCTGCCTCACCTGCGATCATTTCATCATCATTGACACGTACGATAATACGACTGGCATCAACCTCTTCAATCACACCGCCACGTTTGGCGACAACACACACGCCACTGTCACGAGCAACATGGCGTTCCATGCCTGTACCCACAAGCGGTTTATCAGCACGCAAGGTTGGTACTGCTTGACGTTGCATGTTTGAACCCATCAAAGCACGGTTGGCATCATCATGCTCCAAAAATGGAATCAAACTTGCTGCCACTGACACCACTTGGCGTGGCGAGACGTCCATATGTGTTACCTTATCAGGACTCATTCGTACCGATTCACCATGATGGCGTACCATAACCATCTCTTCGGTCAGTTCGCCATGTTCGTTCATGGGTGAATCTGCTTGGGCAACAACAGCCCCAACCTCTTCAATGGCAGACATGTATTCAATGTCATCGGTTACCTTACCATCAACAACACGACGGTAAGGCGTTTCTAAAAACCCAAAGTTATTGGTCTTGGCAAAAACCGCCAATGAGTTAATCAAGCCAATGTTTGGACCCTCTGGCGTCTCAATCGGACACACACGACCGTAGTGCGTGGTGTGTACATCACGAACCTCAAAGCCTGCACGCTCACGAGTTAAGCCACCAGGACCTAAGGCAGAGACACGACGCTTATGGGTAATCTCTGATAATGGGTTGTTTTGATCCATAAATTGCGACAGTTGCGATGCACCAAAAAATTCTTTGATGGAAGCTGCCACAGGCTTAGAGTTGATCAAATCTTGGGGAGACAGATTGTCAGATTCGGCAACAGTCAAACGCTCTTTGACAGCACGCTCTACTCTTGCAAGTCCAATGCGGAATTGATTCTCGGTCATCTCACCAACCGAGCGGATACGACGGTTACCCAAATGGTCAATATCATCAACCTCACCACGACCATTACGAATTTCAATCAACTCTTTTAGTACATCAACAATGTCTTGATTGGATAATACACCCTGTTCACGTTGCACATCAATACTGTCG
6,ERR3482180,NODE_7_length_2061_cov_0.183040,476.0,Moraxella bovis,-450.0,,Moraxella,Moraxellaceae,Pseudomonadales,Gammaproteobacteria,Proteobacteria,,Bacteri,6.3993,2061,0.4668,1.0,100.0,2061,80,70,0.4646,6,3.88,GATCTAGCGTGAAGAACCTGTGGTGTGGGACATGCTTGCAAGCGTCATTCGTGAACATCCTGTGCTACTTAACCGTGCCCCAACGCTTCACCGTCTGGGCTTGCAGGCTTTTGAACCTGTACTTATTGAGGGTAAGGCCATTCAGCTGCATCCTTTGGTATGTGCAGCGTTTAACGCCGACTTTGACGGTGACCAAATGGCGGTTCACGTGCCATTAACCTTAGAGGCTCAGCTTGAAGCTCGTGCCTTGATGATGTCTACCAATAACGTTTTATCGCCTGCCAACGGTGACCCGATCATTGTGCCATCACAAGATGTGGTACTTGGTCTGTACTATATCAGTCGCAGTCATGTCAATGCCAAGGGCGAAAATATGACATTTAGCACCGTCAATGAGGCTTTACGTGCCATTGGCTCTGATGATTTGTCGGTTAATGCCAAGGTCAAAGTGCGTGTTACTGAGACGATTTTGGATGAAAAAACGGGTGAGAAAACCACCAAAACCGAAATCAAAGACACAGTTGCAGGGCGTTTGCTTATTTGGAATATCATGCCAGAGGGTATGGCTTTTGAAGAATGCAACAAAGAGATGACCAAGAAAAATATCTCTAAGCTACTAAATTCATGTTATCGTCAATTGGGCGTCAAAGAATCCGTTCTGTTTGCTGACCATTTGATGTATTTGGGTTTTGCACAAGCAACTTTATCTGGCATCTCAATTGGTATGGAAGATATGGTGATTCCACCAAACAAAAAAGAGATTGTCGATGCTGCGGACGCCGAGGTACGTGAGATTGAACAGCAGTTTGAGGGCGGTTTTGTCACCGCAGGTGAACGCTATAACAAAGTGGTTGATATTTGGTCTCGTACATCTGATAAAATCGCCAATGCGATGATGGATAACTTATCGACTGATGAGGTGATAAACAGTCAAGGTGAGATTGAGAGGGAAAAATCCTTTAACTCTATCTATATGATGGCAGATTCTGGGGCTCGTGGTAGTGCCACTCAGATTCGTCAGCTTGCTGGTATGCGTGGTTTGATGGCAAAACCAGATGGCTCTATCATTGAGACACCCATCAAGGCGAACTTCCGTGAAGGTTTGACGGTACTCCAATACTTTATCTCAACGCACGGTGCTCGTAAAGGTCTTGCTGATACCGCCCTAAAAACTGCCAACTCAGGTTATCTAACTCGTCGTTTGGTGGACGTGGCACAGGATTTGGTGATTACCCATGATGACTGTGGTACTGATGCAGGACAGCTTATAACCCCTGTGATTAAGGGTGGTGATATTGTTGAGCGTTTAGGCGATCGTGTGTTGGGTCGTGTAACTGCCAAAGATGTCGTCAATCACGACAGCGAGGTGATTATTCTTGCAGGTACGCTCATTGATGAACGTTTGGTTGAAGTTTTAGATAACAACGCCATCGATGAGGTGTATGTACGTTCAGTCATTACCTGTGAAGCAACGCAAGGTGTGTGTGCTAAGTGTTATGGTCGTGATTTGGCTCGTGGTCATTTGGTTAATATTGGCGAGTCGGTTGGTGTTATGGCAGCACAGTCTATTGGTGAACCTGGTACACAGCTGACCATGCGTACCTTCCACGTCGGTGGTGCAGCAAGTGCCGCATCAGTGGACAACAGTGTGTCTGTGGGTAATAACGGTACGGTACGTTTTCATAACATGAAGACCGTAGAGCATACAGATGGTCATTTGGTGGTTGTATCACGTTCAGCTGAAATTGGCGTGACAGACTCCCAAGGTCGTGAGCGTGAACGTTATAAAGTGCCATATGGTTCTAACATTTTGGTGCGTGATGGTGATGAGGTGAGTGCAGGCGATGTTATTGCCAAATGGGATCCGCACACCCATCCAATCATTACCGAATTTGCTGGTACGGCACGGTTTAGCGATATTGCAGATGGCATGACCGCAACTGTCAAGGTAGATGAGACCACAGGTATGAGCTCCTTTGAGGTGTTGGCAGGTAAAGACCGTCCAAGCCTATCCAAGGATTTGCGTCCTGCCATCATCTTAGAGATCGGAAGAGCGTCGTG
7,ERR3482180,NODE_8_length_1971_cov_0.188178,480.0,Moraxella catarrhalis,-450.0,,Moraxella,Moraxellaceae,Pseudomonadales,Gammaproteobacteria,Proteobacteria,,Bacteri,8.5535,1971,0.4845,1.0,99.0868,1953,90,92,0.4884,9,4.67,TCCGATCTCGCCCTGACATTCCGCTACATGCCAACGCCATGCTAAAAAATGAGCGTGAGGGCATTCATCCAGAAAAAAATGGCGAGGTTGGCCCTTTGACCCAAATCCAAGAGCTACTTGACAGAGGTCATCCTGTTGCCTATGTGGGTGATGTTGTTGGTACAGGTTCTAGCCGTAAATCAGCAACCAACTCTGTATTGTGGTTCTTTGGTGAAGAGATTCCCCACATTCCAAACAAAAAAGATGGCGGTGTGTGTTTGGGCGGTAAAATTGCCCCCATTTTCTTTAATACCATGGAAGACGCTGGGGCGTTGCCTGTTGAGATTGATGTAACCGAGATGAACATGGGTGATGAGATTACCCTAAAAATTGACCACGCAAATGCAACTGTTACCGCCTTTAAAGATGGCACACAAATTGCCAAATCTAAGCTAAAAACCCCTGTACTGCTAGATGAGGTGCGTGCTGGTGGTCGCATTAACCTTATCATCGGTCGCAACTTAACAAGCAAGGCTCGTGAGTCCTTGGGTCTTGAACCATCCACTCTATTTCGTACCCCAGAGCAACCTGCCGATACTGGCAAAGGCTTTACCTTAGCCCAAAAAATGGTGGGTCGTGCCTGTGGTTTGCCAGAGGGGCAAGGCATTCGCCCAGATACCTACTGTGAACCGAAAATGACAACCGTAGGCTCACAAGACACCACAGGCCCGATGACCCGTGATGAACTCAAAGATTTGGCATGTCTGGGCTTCTCATCTGACCTTGTCATGCAGTCGTTTTGTCATACCGCCGCTTATCCAAAGCCGATTGATGTGGTAACGCATCACACCCTACCTGATTTTATCATGAACCGTGGCGGTGTCAGCTTACGCCCAGGGGATGGTGTGATTCATTCATGGCTAAACCGTATGCTACTGCCAGACACCGTTGGTACTGGTGGCGACTCGCACACCCGCTTCCCGATTGGTATCTCATTTCCAGCAGGTTCAGGTCTTGTGGCATTCGCTGCTGCCACTGGTGTGATGCCACTTGACATGCCAGAGTCTGTACTTGTCAAATTCAAAGGCAAAATGCAGCCTGGTATTACTTTGCGTGATTTGGTTCATGCCATTCCTTACTATGCCATCCAAGCGGGTGATTTGACGGTTGAGAAAAAGGGCAAGAAAAATATCTTCTCTGGTCGCATCTTAGAGATTGATTTGACCGAGATGGAAAATGACTTGACTGTTGAACAAGCGTTTGAATTATCCGATGCTTCTGCTGAACGCTCTGCCGCAGGCTGTGCCATCACGGTATCAGAAGAAAAAGTGGCAGAATACTTACGCTCTAACATCGTCATGCTAAAATGGATGATTGCCGAAGGTTATGGCGATGCTCGTACACTGGCTCGCCGTGCTGAAAATATGCAAAAATGGCTTGACAACCCAAGCCTGCTAAAAGCAGATGCTGATGCTGAATACGCCAAAGTATACGAGATTGACCTTGCTGACATCAAAGAGCCAATCCTATGCTGTCCAAATGACCCTGATGATGCCAAACTGCTCTCAGAGGTTGCAGGCGATAAGATTGATGAGGTATTCATCGGTTCGTGCATGACCAACATTGGGCATTTTCGTGCCGCAGGTAAGCTGTTGCAAGAAGTACCAGCAGGCAGCTTAACCACTCGCTTGTGGATTGCACCACCCACCAAAATGGATGAACGCCAACTGATGGATGAGGGTTATTACAACACTTATGCCCAAGCAGGAGCAAGAACCGAAATGCCTGGCTGTTCACTGTGCATGGGCAACCAAGCACGCATCGCACCCAACTGTACCGCTGTTTCAACATCCACCCATATACTCTTCAACAGTTGGCAGTTTGCCCAGTACAGATGCCACAGAGGCAAGTTCAGCAGATGCCAGATAGACATTTGCCCCTTGACCTAAGCGGTTCGGGAAGTTACGGGTGGATGTTGAAAAGATCGGAAAG
8,ERR3482180,NODE_9_length_1883_cov_0.316629,476.0,Moraxella bovis,-450.0,,Moraxella,Moraxellaceae,Pseudomonadales,Gammaproteobacteria,Proteobacteria,,Bacteri,13.8322,1883,0.4514,2.0,100.0,1883,142,137,0.4649,14,7.88,ATTATCACACAAGGTATCACCTGTGGTTACGTCTTTTAAGCCCACCAATGCCACGATATCGCCTGCACGAACCTCTTCAACTTCATTCTGGCTGTTGGCATGCATCTCAACGATACGACCGATGCGTTCACGTTTCATCTTAACAGGGTTATAAACGCTGTCACCTTGTTTTGCTACACCTGAGTAAACACGCACGAAAGTCAAATTACCTACATACTTATCGTTCATGATTTTAAATGCCAAAGCGGCAAATGGTGCATCGTCAGAAGACTCACGACTGCCTTCGGTTTCGTCTTTGTCATCTAGGATGCCTTTGATGGCTTCAACGTCAGTTGGGGCGGGCAAAAACTCAATGACCGCATCAAGCATACGCTGTACACCCTTGTTTTTAAAGGCAGTACCACAAAGCATCGGCTGGATTTCACATGCCAGTGTACGAGCACGAAGACCTGCTATGATATCCTCACGAGACAAGTCGCCCTCTTCTAGGTACTTATCCATTAATTCTTCTGATGACTCAGCAGCTGCTTCAATCATGTTATTACGCCACTCCTCAGCGACGTCCACCAAATCAGCAGGAATCTCGCCATACTCAAACTTCATGCCCTGAGATGCCACATCCCAAATGATGGATTTCATTTCAAGTAGGTCAACAACACCTTCAAAGTTATCTTCTGCACCAATTGGTATAACCACAGGTATGGGGTTACCGCCTAGGCGAGTTTTAACTTGTTCAACCACACGGAAAAAGTTAGCGCCTGTTCTGTCCATTTTATTGACGAAAGCAAGACGAGGAACTTTGTATTTGTTGGCTTGACGCCATACAGTCTCTGACTGAGGCTGTACACCACCAACTGCACAATACACCATACACGCACCGTCAAGTACACGCATAGAACGCTCAACTTCAATGGTAAAGTCAACGTGGCCGGGGGTGTCAATTAGGTTGATACGGTGTTCTGGAAATTGCTCAGACATGCCTGACCAAAAACACGTGGTCGCAGCAGAGGTGATGGTGATGCCACGCTCTTGCTCTTGCTCCATCCAGTCCATGGTAGCAGCACCTTCATGCACTTCGCCAATTTTGTGGCTTTTACCTGTATAGAACAAAATACGCTCTGAGGTCGTGGTTTTACCTGCATCAATGTGAGCAGAAATACCGATATTACGGTATCTATTTAAGGGGGTTTTACGTGCCATAATTTTTCCTAGAAAAATTGCGAGTAATTGGACGTCGGTTTGCCATGACTAAAATACAATCATAACCTAACCATGACAAACGACTGATATTATTTGGCAGACCATATTATTTGGCAAACCAAATACGCTTTTAATAAAATTAGAAGCGAAAGTGTGAGAATGCTTTGTTAGCATCAGCCATACGGTGAACATCATCACGTTTTTTGATGGCAGCACCTTTACCTTCAGCGGCGTCATTTAGCTCACCTGCTAAGCGTAGAGCCATAGACTTTTCAGAACGCTTAGCGGCAGCATCAGCCAACCAACGCATGGCTAGGGCAGTACGACGGGAGGGACGAACTTCCATAGGCACTTGGTAGGTAGCACCACCAACACGGCGAGCTTTAACCTCTACCGTTGGGCGAACACTTTCTAGCACTTCTTCAAAGAAAGCCACTGGGTCTTCGATCTTACGTTTGGCAGCAACACTCTCTAACGCACCATAAACGATTTTTTCAGCGACTGATTTTTTACCATCAACCATCACATGATTGATGAATTTGGCGATGGTTTGGCTGCCAAACTTAGGGTCTGGTAGAATTTCACGGGCAGCAACGACACGACGTCTTGGCATAATAGTTCCTTATGTCTTCAGGGTAAATCTGGCATATTATCTTAAAACAAATCAAGCATTTAGCCAGATC
9,ERR3482180,NODE_10_length_1877_cov_10.696571,1660.0,Schaalia odontolytica,-450.0,,Schaalia,Actinomycetaceae,Actinomycetales,Actinobacteria,Actinobacteria,,Bacteri,889.4923,1877,0.5594,2.0,99.6803,1871,9311,9438,0.5579,871,476.47,TCCGATCTGTCGGATCCTTCTGCCGGGGGGTCGCGCGTGTTGTTGCTGGGAGTTGAACAGTCTGGGATGGCTGACCGTAGAGCGTGAGAGTCGTGTAAGCGGACCGGTGGCAGGGCGTGTGGGTGTGGTACCCGAGTAGCGCGGGACTCGTGAAATCTCGTGTGAATCTGCCAAGACCACTTGGTAAGCCTAAATACTACTTGATGACCGATAGTGCATAGTACCGTGAGGGAATGGTGAAAAGTACCCCGGGAGGGGAGTGAAATAGTACCTGAAACCGTGTGCCTGTAAGCCGTCAGAGCCTTGTGGGGTGATGGCGTGCCTTTTGAAGAATGAGCCTGCGAGTTAGTGATACGTGGCGTGCTTAACCCGTGTGGGGTATGCGTAGCGAAAGCGAGTCTGAAAGATGGCGTTTGTCGCGTGTTCTAGACCCGAAGCGGGGTGATCTACCCATGGTCAGGTTGAAGCAGAGGTAAGACTGTGTGGAGGACCGAACCCACCAGGGTTGAAAACCTGGGGGATGAACTGTGGGTAGGGGTGAAAGGCCAATCAAACTCCGTGATAGCTGGTTCTCCCCGAAATGCATTTAGGTGCAGCGTCGCGTGGTCCCTGGTGGAGGTAGAGCTACTGGATGGCCGATGGGCCCTATGGGTTACTGACGTCAGCCAAACTCCGAATGCCATTAGGTGTAGCGCGGCAGTGAGACTGCGGGGGATAAGCTTCGTAGTCGAGAGGGAAACAGCCCAGATCATCAGCTAAGGCCCCTAAGCGTACGCTAAGTGGGAAAGGATGTGGAGTCGCGGTGACAACCAGGAGGTTGGCTTAGAAGCAGCCATCCTTGAAAGAGTGCGTAATAGCTCACTGGTCAAGTGGTTCTGCGCCGACAATGTAGCGGGGCTCAAGCGTACCGCCGAAGCTGTGGCAACAACACGCAAAGCTGGCGCCAGGAACTGGAGTGCTGGTGTGTTGTTGGGTAGGGGAGCGTCCTGCACGAGGTGAAGCCTGGAGGGAACTTCTGGTGGATTGTGTGGGAGTGAGAATGCAGGCATGAGTAACGAATCTGCGGTGAGAATCCGCAGCGCCGATTGACTAAGGGTTCCAGGGCTAGGTTTATCCGCCCTGGGTTAGTCGGGTCCTAAGGCGAGGCCGACAGGCGTAGTCGATGGACAACCAGTTGATATTCTGGTACCGCGTTATGACCGCCCATGCTGAAGTCATTGTGCTAACCAATCTGCTCACGCCGCCACTAAGCCTTCGGGTTTGTTGGTGTGTGTGGGTCTTGGGGCCCCGGTGATTGTAGGCAAGCGTGTTAACAGGGGTGACGCAGACAGGTAGCTGCAGTGCGCCGATGGTTGTGCGTATTTAAGGTTGTGGCCCGTTCCTCAGGTAAATCCGGGGGACACTAGGGGTGAGAACTGATGAGGGACACACGCGTGTGTGGACTTGTGGTGATCCTATGCTGCCGAGAAAAGCCTCGACGTGAGGGCATAACGCGCCCGTACCCTAAACCGACTCAGGTGGTCAGGTAGAGTATACCGAGGCGTTCGAGTGAATCGTGGTTAAGGAACTCGGCAAAATTCCTCCGTAACTTCGGGATAAGGAGGACCCCGTGACTTCCCCGCGCCTTGCGCGCGGTGTGGGGTTGTGGGGTCGCAGAGAATAGGGAGAAGCGACTGTTTATCAAAAACACAGGTGCGTGCGAAGCCGTAAGGCGATGTATACGCACTGACGCCTGCCCGGTGCTGGAAGGTTAAGAGGAACCGTCAACACTCCTTGCGGGTGTGAAGCGGTGAATTTAAGCCCCAGTAAACGGCGGTGGTAACTATAACCATCCTAAGGTAGCGAAATTCCTTGTCGAGATCGGAAGAGCGTCGTGT


### Unclassified scaffolds ("Dark Matter"):

In [19]:
if os.path.exists("results/all_taxUnclassified.tsv"):
    UnclassifiedScaffolds_df = pd.read_csv("results/all_taxUnclassified.tsv" , sep = "\t")
else:
    print("The file \"results/all_taxUnclassified.tsv\" does not exist. Either no scaffolds were unclassified, or something went wrong, please doublecheck the logfiles below:")
    print("\t\"logs/Merge_all_metrics_into_single_tsv_[sample_name].log\"")
    print("\t\"logs/Concat_files.log\"")
    UnclassifiedScaffolds_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(UnclassifiedScaffolds_df.to_html()))

Unnamed: 0,Sample_name,scaffold_name,Avg_fold,Length,Ref_GC,Nr_ORFs,Covered_percent,Covered_bases,Plus_reads,Minus_reads,Read_GC,Median_fold,Std_Dev,scaffold_seq
0,ERR3482180,NODE_154_length_992_cov_0.121387,4.126,992,0.4214,1.0,100.0,992,22,21,0.438,4,1.84,CCTAAAGCAGTTCAAGACAAAGCAGGTAATGACGCTGCCAGCTTTACTGATACTGATGTGACTAATAGTAGTACGGTTCAAGATAAACCCAATAGCATCTCATCAGTCACACTGGTCGATAATTTGACAGATGAAACACGAAATGCTCAAAATTACTATCCAAATAAAAATACAGGTCCGTATGTCGGTAATGTCGGCAATGCCAATACTGCAAATTTTGATACCGCAGTAAACTTAAGATCTGGCTTGACTAACGACAAAACACCGACCTTGAATTTCACCTTGGATAGAGCGGTAGCGACTGGTGAGTCAGTTGAGGTGATTCGTCATACGATGGTCAATGGTCATACCGCAAGATCAGAGAATGTGACGAATAAGGTAAGCCTAAAACAAGGCAGCGATAAGAGCTATTCATTGACCGACAGCTTGACAGATACTTATGGCACTGATTATAAGTATACTGTGCGTGTTAAAAAGGGTAATAAAGTTGTCAATGAAAAATCGCACGCATTCCGCCTAGATACGACCGTTGAGTCGATGGAGATTACCAATGCTGATTTTAATGATAAAACAAAATCAGCAAAAATCTCTCTGAAATCTCGTGGTAACTCAGAGGTCGGTGCAACGGTAACCTATAAGTACATGGAAAATGGCCGTGAAGTTACAGGTAAAGTGACCGATAGCGGCAAAGGCGTGTATAATTTGAATTTACAAGGCTTTGACCGTAAAGACACCAGAGGCCTGACGCTAGAGACTGTGGATGCTGCAGGTAATATCACTACCCAAAAAATCAACTTCCTGCGTAACTTATTCTCTGAATATAATACCGAAAGAGGGGTTGATGTCACCAAAGCAAGTATCGATGATAATGCTCGTGTTGGGGTTATTATGGGTGGTGCTACAGGTCGTCAAGCGACAACAACAGGTAGTAATGCCATCAAAGCCACAAATGGTAACGATACCCTTGTGGTTGGTTTAGAGCAGTTTGGGGG
1,ERR3482180,NODE_187_length_942_cov_0.109202,4.7282,942,0.4257,1.0,99.8938,941,24,23,0.4352,4,3.2,GGATGTCTCTGATGGTGATGGTTTGAATGATGACAAAGAAGATAAAGATGATAAAAAAGATGGTCAAAAGAAGGTAGATAATCAAGTTGATGCCAAACAAAATCACTCTACAACCAATCCCTCTGATTATTATGATGTCGGTGGCATTAGTGGACTGCTTACCAAGGCAGGTGCTTATTTAGGGCGTAATTATCAAGATAAGAACTACCAAGGCGTGGATTTGCATGACAGCACCACAACCAAACATCCTTTTTATATCCAAAGTTCAACACTTGGGGGGAGTTTTGTCAAATCCATCTTTGGTAAAAAAGTTACGGTAGCCCATCATTTGGCAGATAAATTCATCTCTGATGATAAGTTAAATGCCATCTCACAGAGCGTGTATGAGAAAATCGCTGAATTGGCTCGTGCATGGGCACTTAGAACTTTACCTGTTGATCCAAAAACACTCAGCTCCACTCAAAAAGATGAACTGGCTCAAAGCCTTGCCAATCAAAATCGAGCATTGGCAACTGCTGGCGGTGTAACGGGCTTTTTTGGGCTGACAGGAGTGGTCATGGATACAGCGTGGCTGCTGCTTGTGGCACTGCGGACAGTGTATCAGCTGTCAGCGGTTTATGGTGTGCCATTGACGAAAAAAGAAGGGATTAAGATGGCATACAGCGTGCTGTCAGGGGCGGATTTGGATAAAATGCAAGAAAAGCAAATCATTCTCACAGCACTTGCTCTTGCCAAAAAAACACTGGTTTATGCAGGCGAGAATGGCTTAAAAGAAGAACTGATTAAGTTAAGCTCATCCAACATCAACATCGATGACTTTGATGGACTGCTAAAATTCACTCATTTGGATAAATTGGTTGAAAAATATGGCATAGACATCAATGGGCTAAACACCCATTGGTCACATCGTTTGGTTGCTGTTTCGGCGGTTGGTGTGGCAGC
2,ERR3482180,NODE_212_length_901_cov_0.054264,1.7503,901,0.404,1.0,99.778,899,8,8,0.4033,2,0.6,TGCCAGATTTTCAAGTTCTTTGATGAATAAGTGTGAAAATGCGTGCAAAAATAAAAATCTGGCATTGACTACCATCTCTTTTTGATTTAGTGTATAAGCATGAATACTGCCCAACAGTCGCTTAGCAAAAACATCTGCTCGTTTTTGGATTCGCTCATCTTGTTGCCATTGCCTAATGATGCGTTCATCAAAACTGATAAAAATCCCTTCACCATAGAGCTCCAAAGCAGGTAACCAATCACTAACACCCATCACATCAGGCGGTATGATTCTATCGTTTTTGTCCTTTTGATGGGCAGTACTTGGACTGTGGTTGTCAGATTCTATGCTGTTGGTACTTTTCTTTACACTTATTTGAATCTCATCTGATGATAAGTCGCCACCACGAGTAAAGCCCTTGAACACCAAAATCTCTTTGAGTTTACTTACCGCCACAACTTTATCAATGAGTCTGTCAAGTCTGATAAATTCATCTTGATTCATGCCTTGCAAAGTATGATGAAACTCATCACTGACATGATGGGTAACAAAATCTTCCTCATCATCAAAATCTTTGTGTGTTGTTAGAGCCTCATACTCCAACTCATGTAATGGCTTTTGTTGAAACCCCCCATCAAAGTTGGGATAACCCTCACGCAACGATTTAATCGCCTCCTCTAAGGATGTGGCATCACAATCAAATTCACCTTTTAGACGATTTTTAAGTGCTCTTTGTTCATCAAAGTCAATATCATCTCTTAAAAGCTCATCTTTCATCTCATGGCAATACAACCTATCAACCAGCGTACCACGAGCAATGCGAGACTCTGGGGGTATGACCAGTGCCTCTTTGGTGATGGGTGTATACACCCGTGTGTCATTAACCTCCATGATGTAGGCTGTGGGCTGTTGTTTTAATGCG
3,ERR3482180,NODE_216_length_896_cov_0.107932,2.7511,896,0.4096,2.0,100.0,896,13,11,0.4155,3,1.25,CGTCATCAGCATTTGCCTGTGGTGTGGGTTGGACATCAAATGACAAGTCAAAATCACCAAAGTCTGATACATCAGCTTGCGATGGTGTGGCAGGTTGTGCAGATGTGGATGGTTGGTTGATATCAAAGGCAATGCCATCCAGTGCTAAATCATCAGTAGGTTCTACTTTGGCGGTGGGTTTGGGTTCATCTGCCAAGTCAAAATCCAAATATTCAAATGTTGACTGTGGATTTGATGTTGACTGTGGATTTGATGCTAAAGCTGTTGTTTTGGCCATCTCTTCTTCTAGCAGAGATTTACAAAAATCAGCTTCACGAATGGTCTCATCATCACCGATTTCATGAATTTTTTGATGGAGCTGATTAAAAGCGGCGAACTGTTTGGTGATGCCATAAGTTTGTAGTAGTTTAAGCAAAGCACCATTGTGACGTGGATTGGCCATCAATACACGTTTGAGTTCAGCCATCGCTTCTTGGTAGTTTTGATTGCGTATGTGTGCATCGGCGGTAGCAAGCTCATCGGCAGTATTTGGTGTAGGTGCCACAGGTGTTACAGAATCTTGCTGTGTGGGCAAGTGGGGTTGTGGATTTGTGGGTAATTGTAATTTCTGGGCTTGTAATTTTTTTAATACAAAAAATGCCACAATACCAATCACAACAACAGCGCCAATGGCAATAATCATGGTTTGCATAATAACTCCAATTTGTTAAATCTGATTCTAAGTAGACTTGGGACGTTACCCTTGTTAATTATTGTTTGTCTTTGAGATTTTTTAGACGATTTTCAAGTTCAGCCAATCTTTGGTTTTGTAATTGAACTTTTCTAGCAGCAGATGACAGCTCTTCATTTAAGCTATTGACACGTTTGGCATTGGTGGCGGTTTGCTGACGTGTATT
4,ERR3482180,NODE_287_length_798_cov_0.093890,4.8722,798,0.4361,1.0,94.4862,754,20,20,0.4585,5,2.56,TTTTGGTTGAGCAAATACTGAATTTTAGCTAAGCTCTTTTAGGGAGAAAATATGTCGATTTCTGCGTTTAAAATTTTTGCGGTGAGTATTTCAGCTGCAGCCCTTGCTGCTTGTAGCAGCAGCTCTTCTCCAAGTCCTGCGCCAAGCCCTACGACGCCAACTATTAAAGCTCCATCAACTGGCTCAAACAATACTAAGGCTACAACGCCGAGTGAGCCTACCAGTCCTGCTATCAAAGGCAATCAAATCAACTTTAACGCCAATAAGGTTGCTGGTGCTAATCAAGCAACTGCCAACACCAGTGATGTCGCTGTAGTGACCATTGATGGCAAGGCAGTGAGTTTTGATTTACCGGGATTTAATATTGGTAAGGTGGTCAATATCTCAGCATCAAATATGGCGCGCTTTGGCTCTGGAAACGGGTATCTTGATAGTTCTCGTTTTGGTTACATTAAAGAAGGTGTCAGTGGCACGCCTGCTTTATTTTCTCAGGGTACAGTGACAAACAATATGCCAACCACTGGTAAGGCAACTTATACAGGCTTTGCTGCCCATGTTGCTAATGGTCAAGTAACCACCCCTGATGCGAAATTTATCGTTGATTATGGCAATAAAACCGTTGCAGGCACTATTGGTAATAATGTTGAGCTTACAGGAGCCATTACAGGCAATCAGTTTAGTGGCACAAAAAATGGTTTTAGCACCAATGGTTATTTTTATGGTAATGGCGCCAAAGAACTTGGCGGGACTTATAAAAACGCAGCGGGTACTGTCAGTGGTGCCTATGGTGCAAAAAAG
5,ERR3482180,NODE_290_length_797_cov_0.056716,4.3977,797,0.4115,1.0,100.0,797,20,19,0.4217,4,2.21,ATTGTTATTGTTGCTACCACCATTAACACTAAATTTATTATTAGCATTAGTGCCATTACTATCAAGAGTGGCGGTATTCACGCCAATGGTATAAGTTTGGGGGGTAATATTGCCATTACTATTATTAACTGTTAGTGTACCTGTTTTAGGTGTAATACTGATACCAGCACCTGCGGTTAAAATAGGCTGGGCGTCTTTGAGCTGTTTGATGGTAACGGCGTCTTTATCGTTACTGCCTGCCGTTAGGTTGGTGATTTTGGTGTCTTGGGCGTCATAGCCTGTGGCGGTTAAGTGTGGTAAGTCTTTTTTAACTCTGCCATCAGTACCACTAAACCCAACGGTATTTTTATCAATGCGTACGGTTTGGTGTAGGGGTGTATTATCGGCCTCAGTGAATTTTAGCCCATCTGTGCCATAGATGGTTTTACCTGCGTTTGCACCTGTTGTTTGGGTAAAGGTTAAACCGCTAGTTTGTAGCTGGGCGTTGGTATTGGTATTACCTACGGTAACTGAATTTGTAGCGGTTAGGGTTTGAGTATTAACCGATGTCAGTTCTGTTAGCTCTTTGGCAAGTTTGACTTTTAGACCATTACTATCTTTTACCACACCGATGTTCTTACCATTGGTTAGCTTGTCCGCTGTATTCTCACCACCTGTGATGTTTAAGGTTCCGCCCAGTTTTGTATTAACGCCACCATTTATATCCCCTGTAAAAGTAATGCCTTGGCTGACTTTATTAATATCGGCAGCAGTGATATTAACACCTTTAAAATTATTGATGACTTTTTTGGCATCAG
6,ERR3482180,NODE_303_length_790_cov_0.036199,5.1886,790,0.3937,1.0,100.0,790,24,25,0.406,4,3.39,ACAAGCCGTTGGGTATCTGAAATCGCCTATAATGACAAAACAACTGCCATAGATATTACTTTTGCCCCTAGCGTTGTGCCACTTATTACAATGCTTGAACGCCACTTTACTAGTTATAATTTAGAGCAAGTAGCAGGTTTAAACAGTAAATACGCCATACGCCTTTATGAAATCGTCATCGCTTGGAAAAGTAATGGCAAAACCAATCAAATCGGTTTAGAACAACTTAGAGATAGATTAGGTGTTTTTGATGATGAATATCAAAGAATGGAATTATTTAAGCGTAAAGTACTAGATAAGTCAGTAGCAGAAATCAACGAAAAAACCGAAATTAATTTATCCTATGAGCAACACAAACAAGGGCGTAAAATTGTTGGCTTCACTTTCACAGTCAAACAAAAATCCAAGCCTAAAAAATTAAAGACACAAGAACAAGAACGAGACCCAAAAACCCCTGATATTTTTGACAACCTAACCGACAAAGAACGAGAAATCATCGCCCAAAAGAACGCCTATGCCGACCAAATCGGAGCAACAGAGCAACACCGCCAAAACTTGATTAAAAAAGCCTTAGAACAGCACAGACAAGCCGAGCAGAGCGAGCAAGAACGCAAACAGCGTGAAAAAGCCGAACGATTGGCACAGGAGCAACAAGACAAAGAGCGGTTAGAGCTTGCAAAGCGACAATTTGAGCAGATTTTGGCAAGTGATGGACTGATTAACGCTTACATTGCCAACAACATCGTTGCAAAGTATCTATCAGGTTTGCAAAAAATACGATATGAGCAAG
7,ERR3482180,NODE_342_length_771_cov_0.100932,4.511,771,0.4812,1.0,100.0,771,19,18,0.486,4,2.35,GCTTTTTGGTTCATCGCCCAGATGAAATTATCATAACCGCCCGGATAGGTACGAGCAAGGGCTTTGGCGGCGGGATTTTCTGATTTCATAAGCATCAAGAGCATGAACTCGCTGCGACTCATGCGATCGCCTGCACGCAAATTGGAACTTGCTTTTTTGGCAGCGATAAAGTCTGACGGCTCAATGACAATCTCCTCACCCATATCAAGCCCTGCATCCATGAGCACCATCGCTGTCATGACCTTGGTTACCGAAGCGATAGAACGCACGGCGTCGGCATTTTTTTGGTAGATGGGCTGACCTGTTTGTAAATCTACCACAATGGCAGCAGCAGAGCGAGTATTTACAGGCAACTGTTCAATGTAGCCTGTACCATAAGCTGTGGATTGTCCATAGGGCTGTATGGGAAATGGCTGAGAGGTCGTCGTACTGGCAGGTATGGGCACTCTGGTAACCTGAGTGTGCGTGTAGCCTTGGTTGTAGGTGTTGGTGGTGGGGGCTGTGTTATTATTGAGGTACGCTGTGTTATTATTGAGATATGTGGTGGTTGTTTGCTGAACGGTTGTGGTCGCACTCGGTATCTCTTGAACGGTATGTGCCTGCATGTTTTTGGCATTGCCTTTGCCAATATAGACCGTCTTTTGTTGGCTGTTATCCACCAAAAAAGCATGGGCAGAGGTTGTCATCATGGAGATGACAGACAGGCTTAGCACGTAGCTTAGGCGTGCTCGGTTGATTTTTTTAATATTAACTCGCTTCATAAGTTCCGCT
8,ERR3482180,NODE_347_length_768_cov_0.082683,4.5768,768,0.4049,2.0,100.0,768,18,18,0.3902,5,2.42,GGCGTTTAAGATTATAATTGTGATTTTGCTGACTTTGGACAGCATCAAGCACCGCCAATATCCCCACTGTTGCCACCACTCCTTTCGTTAGCAAGGCAAATATAGATGTTTTGCCTTCTGTGGTAATTTTAGTCTCAATCACCTTACCTATGTCGGCAGATTCTAATTTATTCAAATCAAATTGAGCGATAAACTCCATTTGATGTTGATGAGATAATTTTGAAAATTCAATGGCTTCTTTATGAAACTTATCACCCACACCCAAGATGAGTTTTTGAAAATTAGCCGAAACTGCCGTTAGCAACTCTTTGTCAGATTCACTCCATTCGGCAGGGGGTTGTGATAAATATTCTTGGATAATCGGCAAATGTTTGGTGATTTGTTTGGCAGTGGTGGCGGTAGCAACGGTTCGTAAAACTGGCAAAACGGTAACAGGTATCATGACATATCCTTGGCTAAGTAGGCGATTTATGCACTTTAACCACTTTTGACTTGTCAAAACAGAAAAAATGCCTTAGTATTATAATCTATCCCTAATATTACACAACCAACCCTAAGTAACACCAAAAACCCACTTTTAGGGATACACACAGGACAAGTAAGCCATGCCATTACATGACAACATTCGTAAATTTAGAGAGCAAAAACAGTGGTCGCAAGAATACATGGCAGAACAGCTTGGACTGTCCAAAAACGGCTATGCCAAGATAGAGCGTGGCGAGAGTCGCCCAAGCCTTGATAGATTGGAAAGATCGGAAGAGCGTCGTG
9,ERR3482180,NODE_383_length_746_cov_0.006462,2.9196,746,0.4142,1.0,89.9464,671,13,11,0.4334,3,2.05,GATGATTTAGATGACTTCAATACCTTTTTGAATAAAGACATGGGTATTCTCATTACTGCCATCCCAATAGGCATCATAACCCTCAGGAGCTGTTCCTTCTCGAATATCTTTGACAAATCCACCTAATTTGTCTCCTCTTCTTGGGGTACCATTAGCTCCCAAATCAACCTTATCTTGATTGTTACCATTGATAAACAGACCTTGATAGGATTGACCATTGATGGTGGTGGCAGTATCAGCATTATTTAACACATCAGATATCTTGACGTTTAAGGTATTTCCACCAGTGCCTGCTAGGTTAATGACCTCCATGCCAGTCATGACACCCAGGTCTTGATTAACGCCCTTACCCGTGAAGTTGAGAGTATCAAAGCCCGCACCACCGTTGATCCTTGCTCCACTTGAGATACCAGCATAAACCGAAATTATATCATCACCACGACCTAAGTCAATCACACCTCCACTCATTGAGTTACCTAATTCGGTATGAGGTTTAGCCTGTAATGTGGTATCACCGATGACAACACTATCATCACCATCTCCCATCAATACTTTTGTGCGGTCATTCCTGATACCATATATATTATTACCAACACGCAATGTGTCATTACCAGCACCCATATCGACTTCTACACTACCTATAACAGCAGTTAATTTGCTGTACGCAATGATGGTATCATCACCACCTTTTGTATCAACTTTAACTTTGCTAGCCAAAGAAGATGAAATATCACCATAGCCACCCG


## Predicted virus hosts:

In [21]:
if os.path.exists("results/all_virusHost.tsv"):
    virusHost_df = pd.read_csv("results/all_virusHost.tsv" , sep = "\t")
else:
    print("The file \"results/all_virusHost.tsv\" does not exist. Either no viral scaffolds had host information, or something went wrong, please doublecheck the logfiles below:")
    print("\t\"logs/Merge_all_metrics_into_single_tsv_[sample_name].log\"")
    print("\t\"logs/Concat_files.log\"")
    virusHost_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(virusHost_df.to_html()))

Unnamed: 0,Sample_name,scaffold_name,taxID,NCBI_potential_hosts,virus_name,virus_lineage,refseq_id,KEGG_GENOME,KEGG_DISEASE,DISEASE,host_tax_id,host_name,host_lineage,pmid,evidence,sample_type,source_organism


## Virus typing results:

The virus typing outputs are currently not automatically generated via Jovian due to overloading and crashing the web-service. A long-term solution is being worked on, as a work-around we've included scripts to generate these data separately. However, we kindly ask you to <u>**use this sparingly**</u> as to not overload and break the web-service. Instructions on how to generate these data can be found on the [website](https://github.com/DennisSchmitz/Jovian), [here](https://github.com/DennisSchmitz/Jovian/issues/29) and [here](https://github.com/DennisSchmitz/Jovian/issues/51).


### Norovirus typing tool output:  
[Link to the norovirus typing tool](https://www.rivm.nl/mpf/typingtool/norovirus/)  

In [23]:
if os.path.exists("results/all_NoV-TT.csv") and os.path.getsize("results/all_NoV-TT.csv") > 0:
    NoV_TT_df = pd.read_csv("results/all_NoV-TT.csv" , sep = ",")
elif os.path.exists("results/all_NoV-TT.csv") and os.path.getsize("results/all_NoV-TT.csv") == 0:
    print("No viral scaffolds with species equal to \"Norwalk virus\" were found in this dataset.")
    NoV_TT_df = pd.DataFrame({'NA' : ["No", "Norwalk virus", "species", "scaffolds", "found"]})
else:
    print("The file \"results/all_NoV-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    NoV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(NoV_TT_df.to_html()))

The file "results/all_NoV-TT.csv" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.


Unnamed: 0,Error
0,Please
1,see
2,error
3,message
4,above


### Rotavirus A typing tool output:  
[Link to the Rotavirus A typing tool](https://www.rivm.nl/mpf/typingtool/rotavirusa/)  

In [24]:
if os.path.exists("results/all_RVA-TT.csv") and os.path.getsize("results/all_RVA-TT.csv") > 0:
    RVA_TT_df = pd.read_csv("results/all_RVA-TT.csv" , sep = ",")
elif os.path.exists("results/all_RVA-TT.csv") and os.path.getsize("results/all_RVA-TT.csv") == 0:
    print("No viral scaffolds with genus equal to \"Rotavirus\" were found in this dataset.")
    RVA_TT_df = pd.DataFrame({'NA' : ["No", "Rotavirus", "genus", "scaffolds", "found"]})
else:
    print("The file \"results/all_RVA-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    RVA_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(RVA_TT_df.to_html()))

The file "results/all_RVA-TT.csv" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.


Unnamed: 0,Error
0,Please
1,see
2,error
3,message
4,above


### Enterovirus typing tool output:  
[Link to the enterovirus typing tool](https://www.rivm.nl/mpf/typingtool/enterovirus/)  

In [25]:
if os.path.exists("results/all_EV-TT.csv") and os.path.getsize("results/all_EV-TT.csv") > 0:
    EV_TT_df = pd.read_csv("results/all_EV-TT.csv" , sep = ",")
elif os.path.exists("results/all_EV-TT.csv") and os.path.getsize("results/all_EV-TT.csv") == 0:
    print("No viral scaffolds with family equal to \"Picornaviridae\" were found in this dataset.")
    EV_TT_df = pd.DataFrame({'NA' : ["No", "Picornaviridae", "family", "scaffolds", "found"]})
else:
    print("The file \"results/all_EV-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    EV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(EV_TT_df.to_html()))

The file "results/all_EV-TT.csv" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.


Unnamed: 0,Error
0,Please
1,see
2,error
3,message
4,above


### Hepatitis A typing tool output:  
[Link to the hepatatis A typing tool](https://www.rivm.nl/mpf/typingtool/hav/)  

In [26]:
if os.path.exists("results/all_HAV-TT.csv") and os.path.getsize("results/all_HAV-TT.csv") > 0:
    HAV_TT_df = pd.read_csv("results/all_HAV-TT.csv" , sep = ",")
elif os.path.exists("results/all_HAV-TT.csv") and os.path.getsize("results/all_HAV-TT.csv") == 0:
    print("No viral scaffolds with genus equal to \"Hepatovirus\" were found in this dataset.")
    HAV_TT_df = pd.DataFrame({'NA' : ["No", "Hepatovirus", "genus", "scaffolds", "found"]})
else:
    print("The file \"results/all_HAV-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    HAV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(HAV_TT_df.to_html()))

The file "results/all_HAV-TT.csv" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.


Unnamed: 0,Error
0,Please
1,see
2,error
3,message
4,above


### Hepatitis E typing tool output:  
[Link to the hepatatis E typing tool](https://www.rivm.nl/mpf/typingtool/hev/)  

In [27]:
if os.path.exists("results/all_HEV-TT.csv") and os.path.getsize("results/all_HEV-TT.csv") > 0:
    HEV_TT_df = pd.read_csv("results/all_HEV-TT.csv" , sep = ",")
elif os.path.exists("results/all_HEV-TT.csv") and os.path.getsize("results/all_HEV-TT.csv") == 0:
    print("No viral scaffolds with genus equal to \"Orthohepevirus\" were found in this dataset.")
    HEV_TT_df = pd.DataFrame({'NA' : ["No", "Orthohepevirus", "genus", "scaffolds", "found"]})
else:
    print("The file \"results/all_HEV-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    HEV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(HEV_TT_df.to_html()))

The file "results/all_HEV-TT.csv" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.


Unnamed: 0,Error
0,Please
1,see
2,error
3,message
4,above


### Human Papillomavirus typing tool output:  
[Link to the HPV typing tool](https://www.rivm.nl/mpf/typingtool/papillomavirus/)  

In [28]:
if os.path.exists("results/all_PV-TT.csv") and os.path.getsize("results/all_PV-TT.csv") > 0:
    PV_TT_df = pd.read_csv("results/all_PV-TT.csv" , sep = ",")
elif os.path.exists("results/all_PV-TT.csv") and os.path.getsize("results/all_PV-TT.csv") == 0:
    print("No viral scaffolds with family equal to \"Papillomaviridae\" were found in this dataset.")
    PV_TT_df = pd.DataFrame({'NA' : ["No", "Papillomaviridae", "family", "scaffolds", "found"]})
else:
    print("The file \"results/all_PV-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    PV_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(PV_TT_df.to_html()))

The file "results/all_PV-TT.csv" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.


Unnamed: 0,Error
0,Please
1,see
2,error
3,message
4,above


### Flavivirus typing tool output:  
[Link to the flavivirus typing tool](https://www.rivm.nl/mpf/typingtool/flavivirus/)  

In [29]:
if os.path.exists("results/all_Flavi-TT.csv") and os.path.getsize("results/all_Flavi-TT.csv") > 0:
    Flavi_TT_df = pd.read_csv("results/all_Flavi-TT.csv" , sep = ",")
elif os.path.exists("results/all_Flavi-TT.csv") and os.path.getsize("results/all_Flavi-TT.csv") == 0:
    print("No viral scaffolds with family equal to \"Flaviviridae\" were found in this dataset.")
    Flavi_TT_df = pd.DataFrame({'NA' : ["No", "Flaviviridae", "family", "scaffolds", "found"]})
else:
    print("The file \"results/all_Flavi-TT.csv\" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.")
    Flavi_TT_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(Flavi_TT_df.to_html()))

The file "results/all_Flavi-TT.csv" does not exist. If you want this information, please see https://github.com/DennisSchmitz/Jovian/issues/29 for instructions.


Unnamed: 0,Error
0,Please
1,see
2,error
3,message
4,above


## Minority variant table:

In [30]:
if os.path.exists("results/all_filtered_SNPs.tsv"):
    filtered_VCF_df = pd.read_csv("results/all_filtered_SNPs.tsv" , sep = "\t")
else:
    print("The file \"results/all_filtered_SNPs.tsv\" does not exist. Either no SNP's were classified, maybe because you've set the minimum allele-frequency too high? Or something went wrong, please doublecheck the logfiles below:")
    print("\t\"logs/SNP_calling_[sample_name].log\"")
    print("\t\"logs/Concat_filtered_SNPs.log\"")
    filtered_VCF_df = pd.DataFrame({'Error' : ["Please", "see", "error", "message", "above"]})

display(HTML(filtered_VCF_df.to_html()))

Unnamed: 0,Sample_name,Contig_name,Position,Identifier,Reference_base,Alternative_base,Quality,Filter_status,Total_depth_of_coverage,Allele_frequency,Strand_bias,DoC_forward_ref_allele,DoC_reverse_ref_allele,DoC_forward_non-ref_allele,DoC_reverse_non-ref_allele
0,ERR3482180_filtered.vcf,NODE_1_length_3152_cov_102.364959,377,.,G,A,10240,PASS,5909,0.111356,23,2056,3166,297,361
1,ERR3482180_filtered.vcf,NODE_1_length_3152_cov_102.364959,412,.,A,T,7934,PASS,3842,0.124414,8,658,2665,109,369
2,ERR3482180_filtered.vcf,NODE_1_length_3152_cov_102.364959,429,.,C,T,5980,PASS,4182,0.093257,8,837,2938,74,316
3,ERR3482180_filtered.vcf,NODE_1_length_3152_cov_102.364959,824,.,A,G,591,PASS,1554,0.063063,3,1194,239,85,13
4,ERR3482180_filtered.vcf,NODE_1_length_3152_cov_102.364959,2280,.,T,C,2385,PASS,3750,0.052533,0,1639,1901,90,107
5,ERR3482180_filtered.vcf,NODE_2_length_2930_cov_239.887977,29,.,A,G,113,PASS,117,0.119658,0,88,15,12,2
6,ERR3482180_filtered.vcf,NODE_2_length_2930_cov_239.887977,59,.,A,G,1414,PASS,1607,0.060361,23,1253,255,91,6
7,ERR3482180_filtered.vcf,NODE_2_length_2930_cov_239.887977,80,.,A,C,4836,PASS,3354,0.076625,10,2575,517,225,32
8,ERR3482180_filtered.vcf,NODE_2_length_2930_cov_239.887977,84,.,G,T,4164,PASS,3488,0.06422,11,2698,561,196,28
9,ERR3482180_filtered.vcf,NODE_2_length_2930_cov_239.887977,85,.,T,G,4045,PASS,3554,0.06359,11,2754,572,198,28


# Logging and audit-trail: 

### Sample sheet

In [53]:
%%bash
cat sample_sheet.yaml

ERR3482180:
  R1: /raw_data/ERR3482180/ERR3482180_1.fastq.gz
  R2: /raw_data/ERR3482180/ERR3482180_2.fastq.gz


### Snakemake summary statistics
[Open Snakemake summary statistics in separate tab by clicking here](snakemake_report.html#stats)

In [54]:
%%HTML
<div style="text-align: center">
    <iframe src="results/snakemake_report.html" width=100% height=980></iframe>
</div>

### All log-files:

In [55]:
%%HTML
<script>
function goBack() {
    window.history.back()
}
</script>

<button onclick="goBack()">Click this button to go back</button>

<div style="text-align: center">
    <iframe src="results/logfiles_index.html" width=100% height=980></iframe>
</div>

### Full software list in "Jovian_master" environment:

In [56]:
%%bash
cat results/log_conda.txt

# packages in environment at /opt/conda/envs/Jovian_master:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
aioeasywebdav             2.4.0                 py37_1000    conda-forge
aiohttp                   3.6.2            py37h516909a_0    conda-forge
appdirs                   1.4.3                      py_1    conda-forge
async-timeout             3.0.1                   py_1000    conda-forge
attrs                     19.3.0                     py_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
bcrypt                    3.1.7            py37h8f50634_1    conda-forge
bleach                    3.1.4              pyh9f0ad1d_0    conda-forge
boto3                     1.12.48            pyh9f0ad1d_0    conda-forge
botocore                  1.15.48            pyh9f0ad1d_0    conda-f

### Database versions:

In [57]:
%%bash
cat results/log_db.txt

==> User-specified background reference (default: Homo Sapiens NCBI GRch38 NO DECOY genome): <==
-rw-r--r--. 1 root root 3.0G Apr 22 08:10 /mnt/database/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/genome.fa


==> Virus-Host Interaction Database: <==
-rw-r--r--. 1 root root 4.0M Apr 22 12:14 /mnt/database/Virus-Host_interaction_db/virushostdb.tsv


==> Krona Taxonomy Database: <==
total 29G
drwxr-xr-x.  3 root root   83 Apr 22 12:14 .
drwxrwxrwx. 11 root root  205 Apr 24 13:28 ..
drwxr-xr-x.  2 root root    6 Apr 22 08:25 accession2taxid
-rw-r--r--.  1 root root  28G Apr 22 08:25 all.accession2taxid.sorted
-rw-r--r--.  1 root root 113M Apr 22 12:14 taxonomy.tab


==> NCBI new_taxdump Database: <==
-rw-r--r--. 1 root root 4.6M Apr 22 12:15 /mnt/database/new_taxdump/host.dmp.delim
-rw-r--r--. 1 root root 228M Apr 22 12:15 /mnt/database/new_taxdump/rankedlineage.dmp.delim


==> NCBI Databases as specified in ~/.ncbirc: <==
/mnt/database/NR_database:
total 212G
drwxr-xr-x.  2 root root  

### Unique methodological "fingerprint":

In [58]:
%%bash
cat results/log_git.txt

This is the link to the code used for this analysis:	https://github.com/DennisSchmitz/Jovian/tree/
This code with unique fingerprint  was committed by  at 


### Snakemake config files:

In [59]:
%%bash
cat results/log_config.txt


==> Contents of file profile/cluster_config.yaml: <==
__default__:
  threads: 1
  vmem: 4G
  nodes: 1
  time: 00:30:00

QC_raw_data:
  threads: 1
  vmem: 8G

Clean_the_data:
  threads: 4
  vmem: 16G
  time: 04:00:00

QC_clean_data:
  threads: 1
  vmem: 8G

HuGo_removal_pt1_alignment:
  threads: 4
  vmem: 4G
  time: 12:00:00

HuGo_removal_pt2_extract_paired_unmapped_reads:
  threads: 4
  vmem: 2G
  time: 02:00:00

HuGo_removal_pt3_extract_unpaired_unmapped_reads:
  threads: 4
  vmem: 2G
  time: 02:00:00

De_novo_assembly:
  threads: 12
  vmem: 10G
  time: 24:00:00

Fragment_length_analysis:
  threads: 4
  vmem: 8G
  time: 04:00:00

SNP_calling:
  threads: 12
  vmem: 6G
  time: 04:00:00

ORF_analysis:
  threads: 1
  vmem: 16G
  time: 04:00:00

Generate_contigs_metrics:
  threads: 1
  vmem: 64G
  time: 04:00:00

Determine_GC_content:
  threads: 1
  vmem: 24G
  time: 04:00:00

Generate_IGVjs_html_file:
  threads: 1
  vmem: 2G
  time: 02:00:00

MultiQC_report:
  threads: 1
  vmem: 4G
  tim

# Acknowledgements:

|Name |Publication|Website|
|:---|:---|:---|
|BBtools|NA|https://jgi.doe.gov/data-and-tools/bbtools/|
|BEDtools|Quinlan, A.R. and I.M.J.B. Hall, BEDTools: a flexible suite of utilities for comparing genomic features. 2010. 26(6): p. 841-842.|https://bedtools.readthedocs.io/en/latest/|
|BLAST|Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. 1997. 25(17): p. 3389-3402.|https://www.ncbi.nlm.nih.gov/books/NBK279690/|
|BWA|Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997.|https://github.com/lh3/bwa|
|BioConda|Grüning, B., et al., Bioconda: sustainable and comprehensive software distribution for the life sciences. 2018. 15(7): p. 475.|https://bioconda.github.io/|
|Biopython|Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., ... & De Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422-1423.|https://biopython.org/|
|Bokeh|Bokeh Development Team (2018). Bokeh: Python library for interactive visualization.|https://bokeh.pydata.org/en/latest/|
|Bowtie2|Langmead, B. and S.L.J.N.m. Salzberg, Fast gapped-read alignment with Bowtie 2. 2012. 9(4): p. 357.|http://bowtie-bio.sourceforge.net/bowtie2/index.shtml|
|Conda|NA|https://conda.io/|
|DRMAA|NA|http://drmaa-python.github.io/|
|FastQC|Andrews, S., FastQC: a quality control tool for high throughput sequence data. 2010.|https://www.bioinformatics.babraham.ac.uk/projects/fastqc/|
|gawk|NA|https://www.gnu.org/software/gawk/|
|GNU Parallel|O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.|https://www.gnu.org/software/parallel/|
|Git|NA|https://git-scm.com/|
|igvtools|NA|https://software.broadinstitute.org/software/igv/igvtools|
|Jupyter Notebook|Kluyver, Thomas, et al. "Jupyter Notebooks-a publishing format for reproducible computational workflows." ELPUB. 2016.|https://jupyter.org/|
|Jupyter_contrib_nbextension|NA|https://github.com/ipython-contrib/jupyter_contrib_nbextensions|
|Jupyterthemes|NA|https://github.com/dunovank/jupyter-themes|
|Krona|Ondov, B.D., N.H. Bergman, and A.M. Phillippy, Interactive metagenomic visualization in a Web browser. BMC Bioinformatics, 2011. 12: p. 385.|https://github.com/marbl/Krona/wiki|
|Lofreq|Wilm, A., et al., LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. 2012. 40(22): p. 11189-11201.|http://csb5.github.io/lofreq/|
|Minimap2|Li, H., Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 2018.|https://github.com/lh3/minimap2|
|MultiQC|Ewels, P., et al., MultiQC: summarize analysis results for multiple tools and samples in a single report. 2016. 32(19): p. 3047-3048.|https://multiqc.info/|
|Nb_conda|NA|https://github.com/Anaconda-Platform/nb_conda|
|Nb_conda_kernels|NA|https://github.com/Anaconda-Platform/nb_conda_kernels|
|Nginx|NA|https://www.nginx.com/|
|Numpy|Walt, S. V. D., Colbert, S. C., & Varoquaux, G. (2011). The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), 22-30.|http://www.numpy.org/|
|Pandas|McKinney, W. Data structures for statistical computing in python. in Proceedings of the 9th Python in Science Conference. 2010. Austin, TX.|https://pandas.pydata.org/|
|Picard|NA|https://broadinstitute.github.io/picard/|
|Prodigal|Hyatt, D., et al., Prodigal: prokaryotic gene recognition and translation initiation site identification. 2010. 11(1): p. 119.|https://github.com/hyattpd/Prodigal/wiki/Introduction|
|Python|G. van Rossum, Python tutorial, Technical Report CS-R9526, Centrum voor Wiskunde en Informatica (CWI), Amsterdam, May 1995.|https://www.python.org/|
|Qgrid|NA|https://github.com/quantopian/qgrid|
|SAMtools|Li, H., et al., The sequence alignment/map format and SAMtools. 2009. 25(16): p. 2078-2079.|http://www.htslib.org/|
|SPAdes|Nurk, S., et al., metaSPAdes: a new versatile metagenomic assembler. Genome Res, 2017. 27(5): p. 824-834.|http://cab.spbu.ru/software/spades/|
|Seqtk|NA|https://github.com/lh3/seqtk|
|Snakemake|Köster, J. and S.J.B. Rahmann, Snakemake—a scalable bioinformatics workflow engine. 2012. 28(19): p. 2520-2522.|https://snakemake.readthedocs.io/en/stable/|
|Tabix|NA|www.htslib.org/doc/tabix.html|
|tree|NA|http://mama.indstate.edu/users/ice/tree/|
|Trimmomatic|Bolger, A.M., M. Lohse, and B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014. 30(15): p. 2114-20.|www.usadellab.org/cms/?page=trimmomatic|
|Virus-Host Database|Mihara, T., Nishimura, Y., Shimizu, Y., Nishiyama, H., Yoshikawa, G., Uehara, H., ... & Ogata, H. (2016). Linking virus genomes with host taxonomy. Viruses, 8(3), 66.|http://www.genome.jp/virushostdb/note.html|
|Virus-typing tools|Kroneman, A., Vennema, H., Deforche, K., Avoort, H. V. D., Penaranda, S., Oberste, M. S., ... & Koopmans, M. (2011). An automated genotyping tool for enteroviruses and noroviruses. Journal of Clinical Virology, 51(2), 121-125.|https://www.ncbi.nlm.nih.gov/pubmed/21514213|

#### Authors:
- Dennis Schmitz ([RIVM](https://www.rivm.nl/en) and [EMC](https://www6.erasmusmc.nl/viroscience/))  
- Sam Nooij ([RIVM](https://www.rivm.nl/en) and [EMC](https://www6.erasmusmc.nl/viroscience/))  
- Robert Verhagen ([RIVM](https://www.rivm.nl/en))  
- Thierry Janssens ([RIVM](https://www.rivm.nl/en))  
- Jeroen Cremer ([RIVM](https://www.rivm.nl/en))  
- Florian Zwagemaker ([RIVM](https://www.rivm.nl/en))  
- Mark Kroon ([RIVM](https://www.rivm.nl/en))  
- Erwin van Wieringen ([RIVM](https://www.rivm.nl/en))  
- Harry Vennema ([RIVM](https://www.rivm.nl/en))  
- Annelies Kroneman ([RIVM](https://www.rivm.nl/en))  
- Marion Koopmans ([EMC](https://www6.erasmusmc.nl/viroscience/)) 

Jovian is available on [GitHub](https://github.com/DennisSchmitz/Jovian) under a [AGPL license](https://www.gnu.org/licenses/agpl-3.0). The virus-typing tools are public services hosted by the [RIVM](https://www.rivm.nl/en) and developed independently of Jovian.