# 11_pharokka_genome_summaries

This document summarizes the ouput of pharokka of the 10 selected genomes in the 10_interesting_virus_host_pairs jupyter notebook, which was run on Galaxy. Each of the viruses where run on pharokka separately due to errors when running multiple at one time. vMAGs were run using meta mode. All output files are uploaded to google drive folder MH_project > prokka_output with edited file names that include virus name. There are separate histories in Galaxy for vsags and vmags.

| virus name     | variable name | type | sample depth |
|----------------|---------------|------| ------------ |
| vir_AM-654-B04 | vsag1         | vSAG | 80           |
| vir_AM-654-E17 | vsag2         | vSAG | 80           |         
| vir_AM-656-P04 | vsag3         | vSAG | 95           |
| vir_AM-662-D22 | vsag4         | vSAG | 140          |
| vir_AM-666-P13 | vsag5         | vSAG | 400          |
| jv119_vMAG_29  | vmag1         | vMAG | 400          |          
| jv119_vMAG_32  | vmag2         | vMAG | 400          |
| jv121_vMAG_31  | vmag3         | vMAG | 95           |
| jv154_vMAG_31  | vmag4         | vMAG | 140          |
| jv154_vMAG_44  | vmag5         | vMAG | 140          |

## Load packages and data

In [92]:
import pandas as pd
import os
import sys
import csv
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import glob
import seaborn as sns
from collections import Counter

pd.set_option('display.max_columns', None)

vsag1 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/vir_AM-654-B04/pharokka_cds_final_merged_output_vir_AM-654-B04.tsv", sep = '\t' )

vsag2 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/vir_AM-654-E17/pharokka_cds_final_merged_output_vir_AM-654-E17.tsv", sep = '\t' )

vsag3 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/vir_AM-656-P04/pharokka_cds_final_merged_output_vir_AM-656-P04.tsv", sep = '\t' )

vsag4 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/vir_AM-662-D22/pharokka_cds_final_merged_output_vir_AM-662-D22.tsv", sep = '\t' )

vsag5 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/vir_AM-666-P13/pharokka_cds_final_merged_output_vir_AM-666-P13.tsv", sep = '\t' )

vmag1 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/jv119_vMAG_29/pharokka_cds_final_merged_output_jv119_vMAG_29.tsv", sep = '\t' )

vmag2 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/jv119_vMAG_32/pharokka_cds_final_merged_output_jv119_vMAG_32.tsv", sep = '\t' )

vmag3 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/jv121_vMAG_31/pharokka_cds_final_merged_output_jv121_vMAG_31.tsv", sep = '\t' )

vmag4 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/jv154_vMAG_31/pharokka_cds_final_merged_output_jv154_vMAG_31.tsv", sep = '\t' )

vmag5 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/pharokka_output/jv154_vMAG_44/pharokka_cds_final_merged_output_jv154_vMAG_44.tsv", sep = '\t' )


## vSAGs

### vSAG1: vir-AM-654-B04

In [25]:
len(vsag1)

164

In [32]:
vsag1['category'].value_counts()

unknown function                                     107
DNA, RNA and nucleotide metabolism                    16
other                                                 13
moron, auxiliary metabolic gene and host takeover      9
head and packaging                                     8
tail                                                   8
connector                                              2
transcription regulation                               1
Name: category, dtype: int64

In [33]:
vsag1_AMG = vsag1[vsag1['category'] == 'moron, auxiliary metabolic gene and host takeover']
vsag1_AMG['annot'].value_counts()

glycosyltransferase                                   2
glucosyltransferase                                   1
phosphoheptose isomerase                              1
phosphoribosylaminoimidazole synthetase               1
5-phosphoribosylformylglycinamide amidotransferase    1
porphyrin biosynthesis                                1
Galactose-3-O-sulfotransferase                        1
ferredoxin                                            1
Name: annot, dtype: int64

In [34]:
len(vsag1_AMG)

9

### vSAG2: vir_AM-654-E17

In [81]:
len(vsag2)

212

In [29]:
vsag2['category'].value_counts()

unknown function                                     152
tail                                                  18
DNA, RNA and nucleotide metabolism                    15
other                                                  8
head and packaging                                     7
moron, auxiliary metabolic gene and host takeover      5
connector                                              3
transcription regulation                               3
lysis                                                  1
Name: category, dtype: int64

In [35]:
vsag2_AMG = vsag2[vsag2['category'] == 'moron, auxiliary metabolic gene and host takeover']
vsag2_AMG['annot'].value_counts()

anti-restriction protein       1
porphyrin biosynthesis         1
2OG-Fe(II) oxygenase           1
5'-3' deoxyribonucleotidase    1
ribosomal protein S21          1
Name: annot, dtype: int64

In [36]:
len(vsag2_AMG)

5

### vSAG3: vir_AM-656-P04

In [39]:
len(vsag3)

51

In [40]:
vsag3['category'].value_counts()

unknown function                      29
DNA, RNA and nucleotide metabolism     8
head and packaging                     5
tail                                   5
other                                  2
connector                              1
lysis                                  1
Name: category, dtype: int64

## vSAG4: vir_AM-662-D22

In [44]:
len(vsags4)

60

In [47]:
vsag4['category'].value_counts()

unknown function                      46
head and packaging                     6
DNA, RNA and nucleotide metabolism     5
tail                                   3
Name: category, dtype: int64

## vSAG5: vir_AM-666-P13

In [49]:
len(vsag5)

194

In [50]:
vsag5['category'].value_counts()

unknown function                                     164
DNA, RNA and nucleotide metabolism                    11
other                                                  9
head and packaging                                     5
moron, auxiliary metabolic gene and host takeover      3
tail                                                   2
Name: category, dtype: int64

In [51]:
vsag5_AMG = vsag5[vsag5['category'] == 'moron, auxiliary metabolic gene and host takeover']
vsag5_AMG['annot'].value_counts()

2OG-Fe(II) oxygenase                               1
QueE-like  radical SAM domain                      1
QueD-like  6-pyruvoyl-tetrahydropterin synthase    1
Name: annot, dtype: int64

## vMAGs

## vMAG1: jv119_vMAG_29

In [77]:
len(vmag1)

101

In [78]:
vmag1['category'].value_counts()

unknown function                      91
integration and excision               3
DNA, RNA and nucleotide metabolism     3
other                                  2
transcription regulation               1
head and packaging                     1
Name: category, dtype: int64

### vmag2: jv119_vMAG_32

In [82]:
len(vmag2)

40

In [83]:
vmag2['category'].value_counts()

unknown function                                     31
DNA, RNA and nucleotide metabolism                    5
head and packaging                                    2
other                                                 1
moron, auxiliary metabolic gene and host takeover     1
Name: category, dtype: int64

In [84]:
vmag2_AMG = vmag2[vmag2['category'] == 'moron, auxiliary metabolic gene and host takeover']
vmag2_AMG['annot'].value_counts()

QueC-like queuosine biosynthesis    1
Name: annot, dtype: int64

### vmag3: jv121_vMAG_31

In [87]:
len(vmag3)

74

In [88]:
vmag3['category'].value_counts()

unknown function                      68
DNA, RNA and nucleotide metabolism     4
head and packaging                     1
other                                  1
Name: category, dtype: int64

### vmag4: jv154_vMAG_31

In [90]:
len(vmag4)

51

In [91]:
vmag4['category'].value_counts()

unknown function            47
integration and excision     2
transcription regulation     1
head and packaging           1
Name: category, dtype: int64

### vmag5: jv154_vMAG_44

In [93]:
len(vmag5)

109

In [94]:
vmag5['category'].value_counts()

unknown function                      95
DNA, RNA and nucleotide metabolism     7
head and packaging                     4
other                                  2
tail                                   1
Name: category, dtype: int64