# 12_dramv_candidate_genome_summaries

This document includes an example command to run DRAM on host genomes and DRAM-v on both vMAGs and vSAGs, as well as summaries of the ouput of dram of the 10 selected genomes in the 10_interesting_virus_host_pairs jupyter notebook.


| virus name     | variable name | type | sample depth |
|----------------|---------------|------| ------------ |
| vir_AM-654-B04 | vsag1         | vSAG | 80           |
| vir_AM-654-E17 | vsag2         | vSAG | 80           |         
| vir_AM-656-P04 | vsag3         | vSAG | 95           |
| vir_AM-662-D22 | vsag4         | vSAG | 140          |
| vir_AM-666-P13 | vsag5         | vSAG | 400          |
| jv119_vMAG_29  | vmag1         | vMAG | 400          |          
| jv119_vMAG_32  | vmag2         | vMAG | 400          |
| jv121_vMAG_31  | vmag3         | vMAG | 95           |
| jv154_vMAG_31  | vmag4         | vMAG | 140          |
| jv154_vMAG_44  | vmag5         | vMAG | 140          |

## DRAM command line code examples

In [None]:
# DRAM- for hosts

DRAM.py annotate \
-i 'input_file_path' \ 
-o output_folder_name

DRAM.py distill \
-i annot_output_folder/annotations.tsv \
-o distill_output_folder_name \
--trna_path annot_output_folder/trnas.tsv \
--rrna_path annot_output_folder/rrnas.tsv

# example using a single host SAG file on my laptop

DRAM.py annotate \
-i '~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/genomes/sag_hosts/AM-656-B04_nvcontigs.fasta' \
-o annotations

DRAM.py distill \
-i annotations/annotations.tsv \
-o distill \
--trna_path annotations/trnas.tsv \
--rrna_path annotations/rrnas.tsv

In [None]:
# DRAM-v

# First, run contigs trough VirSorter to identify the viral contigs. Then concatenate viral contigs into a single file. 
# Run the concatenated fasta file and the VIRSorter_affi-contigs.tab file through DRAM-v.

DRAM-v.py annotate \
-i input_viral_path \
-v VIRSorter_affi-contigs.tab \
-o output_folder_name

DRAM-v.py distill \
-i annot_output_folder/annotations.tsv \
-o distill

# example using a single vSAG file on my laptop

DRAM-v.py annotate \
-i '~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/genomes/vsags/cv1_AM-656-B04.fasta' \
-v VIRSorter_affi-contigs.tab \
-o distill

## Load packages and data

In [55]:
import pandas as pd
import os
import sys
import csv
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import glob
import seaborn as sns
from collections import Counter

pd.set_option('display.max_columns', None)

vsag1 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-654-B04/annotations.tsv", sep = '\t').fillna('NA')
vsag2 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-654-E17/annotations.tsv", sep = '\t').fillna('NA')
vsag3 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-656-P04/annotations.tsv", sep = '\t').fillna('NA')
vsag4 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-662-D22/annotations.tsv", sep = '\t').fillna('NA')
vsag5 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-666-P13/annotations.tsv", sep = '\t').fillna('NA')

vmag1 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-119-vMAG_29/annotations.tsv", sep = '\t').fillna('NA')
vmag2 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-119-vMAG_32/annotations.tsv", sep = '\t').fillna('NA')
vmag3 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-121-vMAG_31/annotations.tsv", sep = '\t').fillna('NA')
vmag4 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-154-vMAG_31/annotations.tsv", sep = '\t').fillna('NA')
vmag5 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-154-vMAG_44/annotations.tsv", sep = '\t').fillna('NA')

## vSAGs

### vSAG1

In [2]:
# How many genes does DRAM-v recognize?
vsag1_annot = vsag1[vsag1['rank'] != 'E'] # E = no annotation
len(vsag1_annot)

63

In [9]:
# subset out AMGs
vsag1_amg = vsag1_annot[vsag1_annot['amg_flags'].str.contains('M')]
len(vsag1_amg)

14

In [10]:
vsag1_amg['kegg_hit'].value_counts()

NA                                                                      6
ferredoxin-nitrite reductase [EC:1.7.7.1]                               1
3-methyl-2-oxobutanoate hydroxymethyltransferase [EC:2.1.2.11]          1
GDP-D-mannose 3', 5'-epimerase [EC:5.1.3.18 5.1.3.-]                    1
ribonucleoside-diphosphate reductase beta chain [EC:1.17.4.1]           1
phosphoribosylformylglycinamidine cyclo-ligase [EC:6.3.3.1]             1
phosphoribosylformylglycinamidine synthase subunit PurS [EC:6.3.5.3]    1
cobaltochelatase CobS [EC:6.6.1.2]                                      1
ferredoxin                                                              1
Name: kegg_hit, dtype: int64

In [11]:
vsag1_amg['viral_hit'].value_counts()

NA                                                                                                                         1
YP_004324307.1 nucleotide-sugar epimerase [Synechococcus phage S-SSM7]                                                     1
YP_004322358.1 cysteine dioxygenase [Synechococcus phage S-SM2]                                                            1
YP_009810871.1 bifunctional heptose 7-phosphate kinase/heptose 1-phosphate adenyltransferase [Synechococcus phage S-T4]    1
YP_004061678.1 nucleotide-sugar epimerase [Ostreococcus lucimarinus virus 1]                                               1
YP_009172525.1 hypothetical protein APZ24_gp034 [Ostreococcus lucimarinus virus 2]                                         1
YP_004322339.1 nucleotide-sugar epimerase [Synechococcus phage S-SM2]                                                      1
YP_004322326.1 ribonucleoside diphosphate reductase small subunit [Synechococcus phage S-SM2]                              1


In [12]:
vsag1_amg['pfam_hits'].value_counts()

Nitrite and sulphite reductase 4Fe-4S domain [PF01077.25]; Nitrite/Sulfite reductase ferredoxin-like half domain [PF03460.20]                                                                                      1
NAD dependent epimerase/dehydratase family [PF01370.24]; GDP-mannose 4,6 dehydratase [PF16363.8]; RmlD substrate binding domain [PF04321.20]                                                                       1
Mannose-6-phosphate isomerase [PF01050.21]; Cupin [PF00190.25]; D-lyxose isomerase [PF07385.15]                                                                                                                    1
Cytidylyltransferase-like [PF01467.29]                                                                                                                                                                             1
NAD dependent epimerase/dehydratase family [PF01370.24]; GDP-mannose 4,6 dehydratase [PF16363.8]; RmlD substrate binding domain [PF04321.20]; 3-beta

In [64]:
vsag1_amg

Unnamed: 0.1,Unnamed: 0,fasta,scaffold,gene_position,start_position,end_position,strandedness,rank,ko_id,kegg_hit,viral_id,viral_hit,viral_RBH,viral_identity,viral_bitScore,viral_eVal,peptidase_id,peptidase_family,peptidase_hit,peptidase_RBH,peptidase_identity,peptidase_bitScore,peptidase_eVal,pfam_hits,cazy_ids,cazy_hits,cazy_subfam_ec,cazy_best_hit,vogdb_id,vogdb_hits,vogdb_categories,heme_regulatory_motif_count,is_transposon,amg_flags
2,SCGC_AM-654-B04_contig1||full_3,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,3,506,2014,-1,C,K00366,ferredoxin-nitrite reductase [EC:1.7.7.1],,,,,,,,,,,,,,Nitrite and sulphite reductase 4Fe-4S domain [...,,,,,,,,0,False,MKF
11,SCGC_AM-654-B04_contig1||full_12,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,12,7487,8338,-1,D,,,YP_004324307.1,YP_004324307.1 nucleotide-sugar epimerase [Syn...,True,0.677,379.0,0.0,,,,,,,,NAD dependent epimerase/dehydratase family [PF...,,,,,VOG00055,sp|Q9SYM5|RHM1_ARATH Trifunctional UDP-glucose...,Xh,0,False,MK
12,SCGC_AM-654-B04_contig1||full_13,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,13,8340,8714,-1,D,,,YP_004322358.1,YP_004322358.1 cysteine dioxygenase [Synechoco...,False,0.817,213.0,0.0,,,,,,,,Mannose-6-phosphate isomerase [PF01050.21]; Cu...,,,,,VOG01945,sp|P07874|ALGA_PSEAE Alginate biosynthesis pro...,Xh,0,False,MK
15,SCGC_AM-654-B04_contig1||full_16,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,16,9729,10145,-1,D,,,YP_009810871.1,YP_009810871.1 bifunctional heptose 7-phosphat...,False,0.579,149.0,0.0,,,,,,,,Cytidylyltransferase-like [PF01467.29],,,,,,,,0,False,MKB
16,SCGC_AM-654-B04_contig1||full_17,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,17,10136,10996,-1,D,,,YP_004061678.1,YP_004061678.1 nucleotide-sugar epimerase [Ost...,False,0.299,104.0,0.0,,,,,,,,NAD dependent epimerase/dehydratase family [PF...,,,,,VOG00055,sp|Q9SYM5|RHM1_ARATH Trifunctional UDP-glucose...,Xh,0,False,MKB
17,SCGC_AM-654-B04_contig1||full_18,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,18,10977,11849,-1,C,K00606,3-methyl-2-oxobutanoate hydroxymethyltransfera...,YP_009172525.1,YP_009172525.1 hypothetical protein APZ24_gp03...,False,0.587,320.0,0.0,,,,,,,,Ketopantoate hydroxymethyltransferase [PF02548...,,,,,VOG19288,sp|A6TMH9|PANB_ALKMQ 3-methyl-2-oxobutanoate h...,Xh,0,False,MKB
26,SCGC_AM-654-B04_contig1||full_27,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,27,18883,19953,-1,C,K10046,"GDP-D-mannose 3', 5'-epimerase [EC:5.1.3.18 5....",YP_004322339.1,YP_004322339.1 nucleotide-sugar epimerase [Syn...,True,0.799,591.0,0.0,,,,,,,,"GDP-mannose 4,6 dehydratase [PF16363.8]; NAD d...",,,,,VOG00055,sp|Q9SYM5|RHM1_ARATH Trifunctional UDP-glucose...,Xh,0,False,MK
39,SCGC_AM-654-B04_contig1||full_40,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,40,24677,25828,-1,C,K00526,ribonucleoside-diphosphate reductase beta chai...,YP_004322326.1,YP_004322326.1 ribonucleoside diphosphate redu...,True,0.88,702.0,0.0,,,,,,,,"Ribonucleotide reductase, small chain [PF00268...",,,,,VOG00527,sp|P69924|RIR2_ECOLI Ribonucleoside-diphosphat...,Xh,0,False,MKE
40,SCGC_AM-654-B04_contig1||full_41,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,41,25818,28154,-1,D,,,YP_009791335.1,YP_009791335.1 ribonucleotide reductase large ...,True,0.841,1351.0,0.0,,,,,,,,"Ribonucleotide reductase, barrel domain [PF028...",,,,,VOG00122,sp|P00452|RIR1_ECOLI Ribonucleoside-diphosphat...,Xh,0,False,MKE
55,SCGC_AM-654-B04_contig1||full_56,cv1_AM-654-B04,SCGC_AM-654-B04_contig1||full,56,38300,39274,-1,C,K01933,phosphoribosylformylglycinamidine cyclo-ligase...,YP_009320653.1,YP_009320653.1 phosphoribosylaminoimidazole sy...,True,0.605,377.0,0.0,,,,,,,,"AIR synthase related protein, N-terminal domai...",,,,,VOG09964,sp|Q3AMJ2|PUR5_SYNSC Phosphoribosylformylglyci...,Xh,0,False,MK


### vSAG2

In [15]:
# How many genes does DRAM-v recognize?
vsag2_annot = vsag2[vsag2['rank'] != 'E'] # E = no annotation
len(vsag2_annot)

60

In [16]:
# subset out AMGs
vsag2_amg = vsag2_annot[vsag2_annot['amg_flags'].str.contains('M')]
len(vsag2_amg)

4

In [17]:
vsag2_amg['kegg_hit'].value_counts()

cobaltochelatase CobS [EC:6.6.1.2]             1
NA                                             1
nitrite reductase (NO-forming) [EC:1.7.2.1]    1
small subunit ribosomal protein S21            1
Name: kegg_hit, dtype: int64

In [18]:
vsag2_amg['viral_hit'].value_counts()

NA                                                                    2
YP_009325124.1 porphyrin biosynthesis [Synechococcus phage S-WAM1]    1
YP_009810895.1 cytidyltransferase [Synechococcus phage S-T4]          1
Name: viral_hit, dtype: int64

In [19]:
vsag2_amg['pfam_hits'].value_counts()

AAA domain (dynein-related subfamily) [PF07728.17]; ATPase family associated with various cellular activities (AAA) [PF07726.14]; P-loop containing dynein motor region [PF12775.10]; ATPase family associated with various cellular activities (AAA) [PF00004.32]    1
Cytidylyltransferase-like [PF01467.29]                                                                                                                                                                                                                                1
Multicopper oxidase [PF07732.18]                                                                                                                                                                                                                                      1
Ribosomal protein S21 [PF01165.23]                                                                                                                                                                              

In [65]:
vsag2_amg

Unnamed: 0.1,Unnamed: 0,fasta,scaffold,gene_position,start_position,end_position,strandedness,rank,ko_id,kegg_hit,viral_id,viral_hit,viral_RBH,viral_identity,viral_bitScore,viral_eVal,peptidase_id,peptidase_family,peptidase_hit,peptidase_RBH,peptidase_identity,peptidase_bitScore,peptidase_eVal,pfam_hits,cazy_ids,cazy_hits,cazy_subfam_ec,cazy_best_hit,vogdb_id,vogdb_hits,vogdb_categories,heme_regulatory_motif_count,is_transposon,amg_flags
83,SCGC_AM-654-E17_contig3||full_8,cv1_AM-654-E17,SCGC_AM-654-E17_contig3||full,8,3428,4621,-1,C,K09882,cobaltochelatase CobS [EC:6.6.1.2],YP_009325124.1,YP_009325124.1 porphyrin biosynthesis [Synecho...,True,0.616,377.0,0.0,,,,,,,,AAA domain (dynein-related subfamily) [PF07728...,,,,,VOG02489,sp|P04526|LOADL_BPT4 Sliding-clamp-loader larg...,Xr;Xh,0,False,VMF
141,SCGC_AM-654-E17_contig4||full_23,cv1_AM-654-E17,SCGC_AM-654-E17_contig4||full,23,17813,18721,1,D,,,YP_009810895.1,YP_009810895.1 cytidyltransferase [Synechococc...,False,0.472,165.0,0.0,,,,,,,,Cytidylyltransferase-like [PF01467.29],,,,,,,,0,False,MK
159,SCGC_AM-654-E17_contig83||full_1,cv1_AM-654-E17,SCGC_AM-654-E17_contig83||full,1,1,1038,1,C,K00368,nitrite reductase (NO-forming) [EC:1.7.2.1],,,,,,,,,,,,,,Multicopper oxidase [PF07732.18],,,,,,,,0,False,MKF
182,SCGC_AM-654-E17_contig8||full_20,cv1_AM-654-E17,SCGC_AM-654-E17_contig8||full,20,8293,8559,1,C,K02970,small subunit ribosomal protein S21,,,,,,,,,,,,,,Ribosomal protein S21 [PF01165.23],,,,,VOG04372,sp|A4THT0|RS21_YERPP 30S ribosomal protein S21...,Xr,0,False,VM


### vSAG3

In [21]:
# How many genes does DRAM-v recognize?
vsag3_annot = vsag3[vsag3['rank'] != 'E'] # E = no annotation
len(vsag3_annot)

19

In [22]:
# subset out AMGs
vsag3_amg = vsag3_annot[vsag3_annot['amg_flags'].str.contains('M')]
len(vsag3_amg)

1

In [23]:
vsag3_amg['kegg_hit'].value_counts()

NA    1
Name: kegg_hit, dtype: int64

In [24]:
vsag3_amg['viral_hit'].value_counts()

YP_005087467.1 ribonucleotide reductase [Cyanophage NATL1A-7]    1
Name: viral_hit, dtype: int64

In [25]:
vsag3_amg['pfam_hits'].value_counts()

Ribonucleotide reductase, barrel domain [PF02867.18]    1
Name: pfam_hits, dtype: int64

In [66]:
vsag3_amg

Unnamed: 0.1,Unnamed: 0,fasta,scaffold,gene_position,start_position,end_position,strandedness,rank,ko_id,kegg_hit,viral_id,viral_hit,viral_RBH,viral_identity,viral_bitScore,viral_eVal,peptidase_id,peptidase_family,peptidase_hit,peptidase_RBH,peptidase_bitScore,peptidase_eVal,peptidase_identity,pfam_hits,vogdb_id,vogdb_hits,vogdb_categories,heme_regulatory_motif_count,is_transposon,amg_flags
11,SCGC_AM-656-P04_contig1||full_12,cv1_AM-656-P04,SCGC_AM-656-P04_contig1||full,12,8234,9616,1,D,,,YP_005087467.1,YP_005087467.1 ribonucleotide reductase [Cyano...,True,0.756,698.0,0.0,,,,,,,,"Ribonucleotide reductase, barrel domain [PF028...",VOG00122,sp|P00452|RIR1_ECOLI Ribonucleoside-diphosphat...,Xh,0,False,MKE


### vSAG4

In [27]:
# How many genes does DRAM-v recognize?
vsag4_annot = vsag4[vsag4['rank'] != 'E'] # E = no annotation
len(vsag4_annot)

10

In [28]:
# subset out AMGs
vsag4_amg = vsag4_annot[vsag4_annot['amg_flags'].str.contains('M')]
len(vsag4_amg)

1

In [29]:
vsag4_amg['kegg_hit'].value_counts()

NA    1
Name: kegg_hit, dtype: int64

In [30]:
vsag4_amg['viral_hit'].value_counts()

YP_009831807.1 ribonucleotide-diphosphate reductase [Streptomyces phage BRock]    1
Name: viral_hit, dtype: int64

In [31]:
vsag4_amg['pfam_hits'].value_counts()

Ribonucleotide reductase, barrel domain [PF02867.18]    1
Name: pfam_hits, dtype: int64

In [67]:
vsag4_amg

Unnamed: 0.1,Unnamed: 0,fasta,scaffold,gene_position,start_position,end_position,strandedness,rank,ko_id,kegg_hit,viral_id,viral_hit,viral_RBH,viral_identity,viral_bitScore,viral_eVal,pfam_hits,vogdb_id,vogdb_hits,vogdb_categories,heme_regulatory_motif_count,is_transposon,amg_flags
16,SCGC_AM-662-D22_contig3||full_17,cv1_AM-662-D22,SCGC_AM-662-D22_contig3||full,17,6294,7940,1,D,,,YP_009831807.1,YP_009831807.1 ribonucleotide-diphosphate redu...,True,0.464,454.0,0.0,"Ribonucleotide reductase, barrel domain [PF028...",VOG00122,sp|P00452|RIR1_ECOLI Ribonucleoside-diphosphat...,Xh,0,False,MKE


### vSAG5

In [33]:
# How many genes does DRAM-v recognize?
vsag5_annot = vsag5[vsag5['rank'] != 'E'] # E = no annotation
len(vsag5_annot)

52

In [34]:
# subset out AMGs
vsag5_amg = vsag5_annot[vsag5_annot['amg_flags'].str.contains('M')]
len(vsag5_amg)

11

In [35]:
vsag5_amg['kegg_hit'].value_counts()

NA                                                                                      5
small subunit ribosomal protein S21                                                     1
ribonucleoside-diphosphate reductase beta chain [EC:1.17.4.1]                           1
ribonucleoside-diphosphate reductase alpha chain [EC:1.17.4.1]                          1
GTP cyclohydrolase IA [EC:3.5.4.16]                                                     1
6-pyruvoyltetrahydropterin/6-carboxytetrahydropterin synthase [EC:4.2.3.12 4.1.2.50]    1
large subunit ribosomal protein L31                                                     1
Name: kegg_hit, dtype: int64

In [36]:
vsag5_amg['viral_hit'].value_counts()

NA                                                                                                              4
YP_010761232.1 S-adenosylmethionine decarboxylase proenzyme (TIGR03330) [uncultured phage_MedDCM-OCT-S35-C6]    1
YP_214387.1 2OG-Fe(II) oxygenase [Prochlorococcus phage P-SSM2]                                                 1
YP_009810888.1 ribonucleoside diphosphate reductase small subunit [Synechococcus phage S-T4]                    1
YP_009140943.1 ribonucleotide reductase large subunit [Synechococcus phage ACG-2014i]                           1
YP_010114473.1 GTP cyclohydrolase [Flavobacterium phage vB_FspM_immuto_2-6A]                                    1
YP_010678988.1 QueD-like 6-pyruvoyl-tetrahydropterin synthase [Pseudomonas phage vB_PaeM_PA5oct]                1
YP_007677334.1 cytidyltransferase [Synechococcus phage S-SSM4]                                                  1
Name: viral_hit, dtype: int64

In [37]:
vsag5_amg['pfam_hits'].value_counts()

S-adenosylmethionine decarboxylase [PF02675.18]                                                                                                1
Putative 2OG-Fe(II) oxygenase [PF13759.9]                                                                                                      1
Cadherin domain [PF00028.20]                                                                                                                   1
Sulfotransferase family [PF03567.17]                                                                                                           1
NA                                                                                                                                             1
Ribonucleotide reductase, small chain [PF00268.24]                                                                                             1
Ribonucleotide reductase, barrel domain [PF02867.18]; Ribonucleotide reductase, all-alpha domain [PF00317.24]; ATP cone domain [PF

In [68]:
vsag5_amg

Unnamed: 0.1,Unnamed: 0,fasta,scaffold,gene_position,start_position,end_position,strandedness,rank,ko_id,kegg_hit,viral_id,viral_hit,viral_RBH,viral_identity,viral_bitScore,viral_eVal,peptidase_id,peptidase_family,peptidase_hit,peptidase_RBH,peptidase_identity,peptidase_bitScore,peptidase_eVal,pfam_hits,vogdb_id,vogdb_hits,vogdb_categories,heme_regulatory_motif_count,is_transposon,amg_flags
4,SCGC_AM-666-P13_contig12||full_5,cv1_AM-666-P13,SCGC_AM-666-P13_contig12||full,5,3502,3885,-1,D,,,YP_010761232.1,YP_010761232.1 S-adenosylmethionine decarboxyl...,False,0.505,115.0,0.0,,,,,,,,S-adenosylmethionine decarboxylase [PF02675.18],VOG01568,sp|Q7V558|SPEH_PROMM S-adenosylmethionine deca...,Xh,0,False,MKF
56,SCGC_AM-666-P13_contig12||full_57,cv1_AM-666-P13,SCGC_AM-666-P13_contig12||full,57,33659,34366,1,D,,,YP_214387.1,YP_214387.1 2OG-Fe(II) oxygenase [Prochlorococ...,False,0.39,66.0,0.0,,,,,,,,Putative 2OG-Fe(II) oxygenase [PF13759.9],VOG00052,REFSEQ 2OG-Fe(II) oxygenase; Xu,Xu,0,False,MK
64,SCGC_AM-666-P13_contig14||full_1,cv1_AM-666-P13,SCGC_AM-666-P13_contig14||full,1,1,3672,1,D,,,,,,,,,MER0257111,U69,MER0257111 - family U69 unassigned peptidases ...,False,0.257,78.0,0.0,Cadherin domain [PF00028.20],,,,0,False,MF
66,SCGC_AM-666-P13_contig14||full_3,cv1_AM-666-P13,SCGC_AM-666-P13_contig14||full,3,4833,5429,1,D,,,,,,,,,,,,,,,,Sulfotransferase family [PF03567.17],,,,0,False,MKF
78,SCGC_AM-666-P13_contig14||full_15,cv1_AM-666-P13,SCGC_AM-666-P13_contig14||full,15,12620,12877,-1,C,K02970,small subunit ribosomal protein S21,,,,,,,,,,,,,,,,,,0,False,M
90,SCGC_AM-666-P13_contig14||full_27,cv1_AM-666-P13,SCGC_AM-666-P13_contig14||full,27,20205,21302,-1,C,K00526,ribonucleoside-diphosphate reductase beta chai...,YP_009810888.1,YP_009810888.1 ribonucleoside diphosphate redu...,True,0.676,506.0,0.0,,,,,,,,"Ribonucleotide reductase, small chain [PF00268...",VOG00527,sp|P69924|RIR2_ECOLI Ribonucleoside-diphosphat...,Xh,0,False,MKE
91,SCGC_AM-666-P13_contig14||full_28,cv1_AM-666-P13,SCGC_AM-666-P13_contig14||full,28,21340,23682,-1,C,K00525,ribonucleoside-diphosphate reductase alpha cha...,YP_009140943.1,YP_009140943.1 ribonucleotide reductase large ...,True,0.728,1127.0,0.0,,,,,,,,"Ribonucleotide reductase, barrel domain [PF028...",VOG00122,sp|P00452|RIR1_ECOLI Ribonucleoside-diphosphat...,Xh,0,False,MKE
114,SCGC_AM-666-P13_contig30||full_14,cv1_AM-666-P13,SCGC_AM-666-P13_contig30||full,14,5650,6327,-1,C,K01495,GTP cyclohydrolase IA [EC:3.5.4.16],YP_010114473.1,YP_010114473.1 GTP cyclohydrolase [Flavobacter...,False,0.389,140.0,0.0,,,,,,,,GTP cyclohydrolase I [PF01227.25],VOG02430,sp|A4J5D0|GCH1_DESRM GTP cyclohydrolase 1; Xh,Xh,0,False,MK
121,SCGC_AM-666-P13_contig30||full_21,cv1_AM-666-P13,SCGC_AM-666-P13_contig30||full,21,8895,9374,-1,C,K01737,6-pyruvoyltetrahydropterin/6-carboxytetrahydro...,YP_010678988.1,YP_010678988.1 QueD-like 6-pyruvoyl-tetrahydro...,False,0.446,125.0,0.0,,,,,,,,6-pyruvoyl tetrahydropterin synthase [PF01242.22],VOG02682,"sp|Q1RHI6|QUED_RICBR 6-carboxy-5,6,7,8-tetrahy...",Xh,0,False,MK
147,SCGC_AM-666-P13_contig60||full_11,cv1_AM-666-P13,SCGC_AM-666-P13_contig60||full,11,4859,5083,-1,C,K02909,large subunit ribosomal protein L31,,,,,,,,,,,,,,Ribosomal protein L31 [PF01197.21],,,,0,False,MF


## vMAGs

### vMAG1

In [39]:
# How many genes does DRAM-v recognize?
vmag1_annot = vmag1[vmag1['rank'] != 'E'] # E = no annotation
len(vmag1_annot)

20

In [40]:
# subset out AMGs
vmag1_amg = vmag1_annot[vmag1_annot['amg_flags'].str.contains('M')]
len(vmag1_amg)

0

### vMAG2

In [45]:
# How many genes does DRAM-v recognize?
vmag2_annot = vmag2[vmag2['rank'] != 'E'] # E = no annotation
len(vmag2_annot)

9

In [46]:
# subset out AMGs
vmag2_amg = vmag2_annot[vmag2_annot['amg_flags'].str.contains('M')]
len(vmag2_amg)

2

In [48]:
vmag2_amg['kegg_hit'].value_counts()

GTP cyclohydrolase IA [EC:3.5.4.16]    1
NA                                     1
Name: kegg_hit, dtype: int64

In [49]:
vmag2_amg['viral_hit'].value_counts()

YP_008125321.1 GTP cyclohydrolase [Vibrio phage nt-1]    1
NA                                                       1
Name: viral_hit, dtype: int64

In [50]:
vmag2_amg['pfam_hits'].value_counts()

GTP cyclohydrolase I [PF01227.25]                   1
Queuosine biosynthesis protein QueC [PF06508.16]    1
Name: pfam_hits, dtype: int64

In [69]:
vmag2_amg

Unnamed: 0.1,Unnamed: 0,fasta,scaffold,gene_position,start_position,end_position,strandedness,rank,ko_id,kegg_hit,viral_id,viral_hit,viral_RBH,viral_identity,viral_bitScore,viral_eVal,peptidase_id,peptidase_family,peptidase_hit,peptidase_RBH,peptidase_bitScore,peptidase_eVal,peptidase_identity,pfam_hits,vogdb_id,vogdb_hits,vogdb_categories,heme_regulatory_motif_count,is_transposon,amg_flags
12,jv-119-k141_5050887_13,jv-119-vMAG_32,jv-119-k141_5050887,13,9434,10111,1,C,K01495,GTP cyclohydrolase IA [EC:3.5.4.16],YP_008125321.1,YP_008125321.1 GTP cyclohydrolase [Vibrio phag...,False,0.437,143.0,0.0,,,,,,,,GTP cyclohydrolase I [PF01227.25],VOG02430,sp|A4J5D0|GCH1_DESRM GTP cyclohydrolase 1; Xh,Xh,0,False,MKF
18,jv-119-k141_6904186_6,jv-119-vMAG_32,jv-119-k141_6904186,6,3791,4435,1,D,,,,,,,,,,,,,,,,Queuosine biosynthesis protein QueC [PF06508.16],,,,0,False,MKF


### vMAG3

In [52]:
# How many genes does DRAM-v recognize?
vmag3_annot = vmag3[vmag3['rank'] != 'E'] # E = no annotation
len(vmag3_annot)

3

In [53]:
# subset out AMGs
vmag3_amg = vmag3_annot[vmag3_annot['amg_flags'].str.contains('M')]
len(vmag3_amg)

0

### vMAG4

In [56]:
# How many genes does DRAM-v recognize?
vmag4_annot = vmag4[vmag4['rank'] != 'E'] # E = no annotation
len(vmag4_annot)

8

In [57]:
# subset out AMGs
vmag4_amg = vmag4_annot[vmag4_annot['amg_flags'].str.contains('M')]
len(vmag4_amg)

0

### vMAG5

In [58]:
# How many genes does DRAM-v recognize?
vmag5_annot = vmag5[vmag5['rank'] != 'E'] # E = no annotation
len(vmag5_annot)

19

In [59]:
# subset out AMGs
vmag5_amg = vmag5_annot[vmag5_annot['amg_flags'].str.contains('M')]
len(vmag5_amg)

2

In [60]:
vmag5_amg['kegg_hit'].value_counts()

NA    2
Name: kegg_hit, dtype: int64

In [61]:
vmag5_amg['viral_hit'].value_counts()

YP_009210858.1 carboxylate deaminase [Mycobacterium phage Vincenzo]    1
NA                                                                     1
Name: viral_hit, dtype: int64

In [62]:
vmag5_amg['pfam_hits'].value_counts()

Pyridoxal-phosphate dependent enzyme [PF00291.28]    1
Sulfotransferase family [PF03567.17]                 1
Name: pfam_hits, dtype: int64

In [70]:
vmag5_amg

Unnamed: 0.1,Unnamed: 0,fasta,scaffold,gene_position,start_position,end_position,strandedness,rank,ko_id,kegg_hit,viral_id,viral_hit,viral_RBH,viral_identity,viral_bitScore,viral_eVal,pfam_hits,vogdb_id,vogdb_hits,vogdb_categories,heme_regulatory_motif_count,is_transposon,amg_flags
28,jv-154-k141_1078564_29,jv-154-vMAG_44,jv-154-k141_1078564,29,17463,18254,1,D,,,YP_009210858.1,YP_009210858.1 carboxylate deaminase [Mycobact...,False,0.217,73.0,0.0,Pyridoxal-phosphate dependent enzyme [PF00291.28],,,,0,False,MK
86,jv-154-k141_2919407_3,jv-154-vMAG_44,jv-154-k141_2919407,3,2214,2882,-1,D,,,,,,,,,Sulfotransferase family [PF03567.17],,,,0,False,MKF
