# 12_dramv_genome_summaries

This document includes an example command to run DRAM on host genomes and DRAM-v on both vMAGs and vSAGs, as well as summaries of the ouput of dram of the 10 selected genomes in the 10_interesting_virus_host_pairs jupyter notebook.


| virus name     | variable name | type | sample depth |
|----------------|---------------|------| ------------ |
| vir_AM-654-B04 | vsag1         | vSAG | 80           |
| vir_AM-654-E17 | vsag2         | vSAG | 80           |         
| vir_AM-656-P04 | vsag3         | vSAG | 95           |
| vir_AM-662-D22 | vsag4         | vSAG | 140          |
| vir_AM-666-P13 | vsag5         | vSAG | 400          |
| jv119_vMAG_29  | vmag1         | vMAG | 400          |          
| jv119_vMAG_32  | vmag2         | vMAG | 400          |
| jv121_vMAG_31  | vmag3         | vMAG | 95           |
| jv154_vMAG_31  | vmag4         | vMAG | 140          |
| jv154_vMAG_44  | vmag5         | vMAG | 140          |

## DRAM command line code examples

In [None]:
# DRAM- for hosts

DRAM.py annotate \
-i 'input_file_path' \ 
-o output_folder_name

DRAM.py distill \
-i annot_output_folder/annotations.tsv \
-o distill_output_folder_name \
--trna_path annot_output_folder/trnas.tsv \
--rrna_path annot_output_folder/rrnas.tsv

# example using a single host SAG file on my laptop

DRAM.py annotate \
-i '~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/genomes/sag_hosts/AM-656-B04_nvcontigs.fasta' \
-o annotations

DRAM.py distill \
-i annotations/annotations.tsv \
-o distill \
--trna_path annotations/trnas.tsv \
--rrna_path annotations/rrnas.tsv

In [None]:
# DRAM-v

# First, run contigs trough VirSorter to identify the viral contigs. Then concatenate viral contigs into a single file. 
# Run the concatenated fasta file and the VIRSorter_affi-contigs.tab file through DRAM-v.

DRAM-v.py annotate \
-i input_viral_path \
-v VIRSorter_affi-contigs.tab \
-o output_folder_name

DRAM-v.py distill \
-i annot_output_folder/annotations.tsv \
-o distill

# example using a single vSAG file on my laptop

DRAM-v.py annotate \
-i '~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/genomes/vsags/cv1_AM-656-B04.fasta' \
-v VIRSorter_affi-contigs.tab \
-o distill

## Load packages and data

In [55]:
import pandas as pd
import os
import sys
import csv
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import glob
import seaborn as sns
from collections import Counter

pd.set_option('display.max_columns', None)

vsag1 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-654-B04/annotations.tsv", sep = '\t').fillna('NA')
vsag2 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-654-E17/annotations.tsv", sep = '\t').fillna('NA')
vsag3 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-656-P04/annotations.tsv", sep = '\t').fillna('NA')
vsag4 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-662-D22/annotations.tsv", sep = '\t').fillna('NA')
vsag5 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/cv1_AM-666-P13/annotations.tsv", sep = '\t').fillna('NA')

vmag1 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-119-vMAG_29/annotations.tsv", sep = '\t').fillna('NA')
vmag2 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-119-vMAG_32/annotations.tsv", sep = '\t').fillna('NA')
vmag3 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-121-vMAG_31/annotations.tsv", sep = '\t').fillna('NA')
vmag4 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-154-vMAG_31/annotations.tsv", sep = '\t').fillna('NA')
vmag5 = pd.read_csv("~/Documents/Bigelow/Virus_Project/OMZ_MH_Analysis/Data/dramv_output/jv-154-vMAG_44/annotations.tsv", sep = '\t').fillna('NA')

## vSAGs

### vSAG1

In [2]:
# How many genes does DRAM-v recognize?
vsag1_annot = vsag1[vsag1['rank'] != 'E'] # E = no annotation
len(vsag1_annot)

63

In [9]:
# subset out AMGs
vsag1_amg = vsag1_annot[vsag1_annot['amg_flags'].str.contains('M')]
len(vsag1_amg)

14

In [10]:
vsag1_amg['kegg_hit'].value_counts()

NA                                                                      6
ferredoxin-nitrite reductase [EC:1.7.7.1]                               1
3-methyl-2-oxobutanoate hydroxymethyltransferase [EC:2.1.2.11]          1
GDP-D-mannose 3', 5'-epimerase [EC:5.1.3.18 5.1.3.-]                    1
ribonucleoside-diphosphate reductase beta chain [EC:1.17.4.1]           1
phosphoribosylformylglycinamidine cyclo-ligase [EC:6.3.3.1]             1
phosphoribosylformylglycinamidine synthase subunit PurS [EC:6.3.5.3]    1
cobaltochelatase CobS [EC:6.6.1.2]                                      1
ferredoxin                                                              1
Name: kegg_hit, dtype: int64

In [11]:
vsag1_amg['viral_hit'].value_counts()

NA                                                                                                                         1
YP_004324307.1 nucleotide-sugar epimerase [Synechococcus phage S-SSM7]                                                     1
YP_004322358.1 cysteine dioxygenase [Synechococcus phage S-SM2]                                                            1
YP_009810871.1 bifunctional heptose 7-phosphate kinase/heptose 1-phosphate adenyltransferase [Synechococcus phage S-T4]    1
YP_004061678.1 nucleotide-sugar epimerase [Ostreococcus lucimarinus virus 1]                                               1
YP_009172525.1 hypothetical protein APZ24_gp034 [Ostreococcus lucimarinus virus 2]                                         1
YP_004322339.1 nucleotide-sugar epimerase [Synechococcus phage S-SM2]                                                      1
YP_004322326.1 ribonucleoside diphosphate reductase small subunit [Synechococcus phage S-SM2]                              1


In [12]:
vsag1_amg['pfam_hits'].value_counts()

Nitrite and sulphite reductase 4Fe-4S domain [PF01077.25]; Nitrite/Sulfite reductase ferredoxin-like half domain [PF03460.20]                                                                                      1
NAD dependent epimerase/dehydratase family [PF01370.24]; GDP-mannose 4,6 dehydratase [PF16363.8]; RmlD substrate binding domain [PF04321.20]                                                                       1
Mannose-6-phosphate isomerase [PF01050.21]; Cupin [PF00190.25]; D-lyxose isomerase [PF07385.15]                                                                                                                    1
Cytidylyltransferase-like [PF01467.29]                                                                                                                                                                             1
NAD dependent epimerase/dehydratase family [PF01370.24]; GDP-mannose 4,6 dehydratase [PF16363.8]; RmlD substrate binding domain [PF04321.20]; 3-beta

### vSAG2

In [15]:
# How many genes does DRAM-v recognize?
vsag2_annot = vsag2[vsag2['rank'] != 'E'] # E = no annotation
len(vsag2_annot)

60

In [16]:
# subset out AMGs
vsag2_amg = vsag2_annot[vsag2_annot['amg_flags'].str.contains('M')]
len(vsag2_amg)

4

In [17]:
vsag2_amg['kegg_hit'].value_counts()

cobaltochelatase CobS [EC:6.6.1.2]             1
NA                                             1
nitrite reductase (NO-forming) [EC:1.7.2.1]    1
small subunit ribosomal protein S21            1
Name: kegg_hit, dtype: int64

In [18]:
vsag2_amg['viral_hit'].value_counts()

NA                                                                    2
YP_009325124.1 porphyrin biosynthesis [Synechococcus phage S-WAM1]    1
YP_009810895.1 cytidyltransferase [Synechococcus phage S-T4]          1
Name: viral_hit, dtype: int64

In [19]:
vsag2_amg['pfam_hits'].value_counts()

AAA domain (dynein-related subfamily) [PF07728.17]; ATPase family associated with various cellular activities (AAA) [PF07726.14]; P-loop containing dynein motor region [PF12775.10]; ATPase family associated with various cellular activities (AAA) [PF00004.32]    1
Cytidylyltransferase-like [PF01467.29]                                                                                                                                                                                                                                1
Multicopper oxidase [PF07732.18]                                                                                                                                                                                                                                      1
Ribosomal protein S21 [PF01165.23]                                                                                                                                                                              

### vSAG3

In [21]:
# How many genes does DRAM-v recognize?
vsag3_annot = vsag3[vsag3['rank'] != 'E'] # E = no annotation
len(vsag3_annot)

19

In [22]:
# subset out AMGs
vsag3_amg = vsag3_annot[vsag3_annot['amg_flags'].str.contains('M')]
len(vsag3_amg)

1

In [23]:
vsag3_amg['kegg_hit'].value_counts()

NA    1
Name: kegg_hit, dtype: int64

In [24]:
vsag3_amg['viral_hit'].value_counts()

YP_005087467.1 ribonucleotide reductase [Cyanophage NATL1A-7]    1
Name: viral_hit, dtype: int64

In [25]:
vsag3_amg['pfam_hits'].value_counts()

Ribonucleotide reductase, barrel domain [PF02867.18]    1
Name: pfam_hits, dtype: int64

### vSAG4

In [27]:
# How many genes does DRAM-v recognize?
vsag4_annot = vsag4[vsag4['rank'] != 'E'] # E = no annotation
len(vsag4_annot)

10

In [28]:
# subset out AMGs
vsag4_amg = vsag4_annot[vsag4_annot['amg_flags'].str.contains('M')]
len(vsag4_amg)

1

In [29]:
vsag4_amg['kegg_hit'].value_counts()

NA    1
Name: kegg_hit, dtype: int64

In [30]:
vsag4_amg['viral_hit'].value_counts()

YP_009831807.1 ribonucleotide-diphosphate reductase [Streptomyces phage BRock]    1
Name: viral_hit, dtype: int64

In [31]:
vsag4_amg['pfam_hits'].value_counts()

Ribonucleotide reductase, barrel domain [PF02867.18]    1
Name: pfam_hits, dtype: int64

### vSAG5

In [33]:
# How many genes does DRAM-v recognize?
vsag5_annot = vsag5[vsag5['rank'] != 'E'] # E = no annotation
len(vsag5_annot)

52

In [34]:
# subset out AMGs
vsag5_amg = vsag5_annot[vsag5_annot['amg_flags'].str.contains('M')]
len(vsag5_amg)

11

In [35]:
vsag5_amg['kegg_hit'].value_counts()

NA                                                                                      5
small subunit ribosomal protein S21                                                     1
ribonucleoside-diphosphate reductase beta chain [EC:1.17.4.1]                           1
ribonucleoside-diphosphate reductase alpha chain [EC:1.17.4.1]                          1
GTP cyclohydrolase IA [EC:3.5.4.16]                                                     1
6-pyruvoyltetrahydropterin/6-carboxytetrahydropterin synthase [EC:4.2.3.12 4.1.2.50]    1
large subunit ribosomal protein L31                                                     1
Name: kegg_hit, dtype: int64

In [36]:
vsag5_amg['viral_hit'].value_counts()

NA                                                                                                              4
YP_010761232.1 S-adenosylmethionine decarboxylase proenzyme (TIGR03330) [uncultured phage_MedDCM-OCT-S35-C6]    1
YP_214387.1 2OG-Fe(II) oxygenase [Prochlorococcus phage P-SSM2]                                                 1
YP_009810888.1 ribonucleoside diphosphate reductase small subunit [Synechococcus phage S-T4]                    1
YP_009140943.1 ribonucleotide reductase large subunit [Synechococcus phage ACG-2014i]                           1
YP_010114473.1 GTP cyclohydrolase [Flavobacterium phage vB_FspM_immuto_2-6A]                                    1
YP_010678988.1 QueD-like 6-pyruvoyl-tetrahydropterin synthase [Pseudomonas phage vB_PaeM_PA5oct]                1
YP_007677334.1 cytidyltransferase [Synechococcus phage S-SSM4]                                                  1
Name: viral_hit, dtype: int64

In [37]:
vsag5_amg['pfam_hits'].value_counts()

S-adenosylmethionine decarboxylase [PF02675.18]                                                                                                1
Putative 2OG-Fe(II) oxygenase [PF13759.9]                                                                                                      1
Cadherin domain [PF00028.20]                                                                                                                   1
Sulfotransferase family [PF03567.17]                                                                                                           1
NA                                                                                                                                             1
Ribonucleotide reductase, small chain [PF00268.24]                                                                                             1
Ribonucleotide reductase, barrel domain [PF02867.18]; Ribonucleotide reductase, all-alpha domain [PF00317.24]; ATP cone domain [PF

## vMAGs

### vMAG1

In [39]:
# How many genes does DRAM-v recognize?
vmag1_annot = vmag1[vmag1['rank'] != 'E'] # E = no annotation
len(vmag1_annot)

20

In [40]:
# subset out AMGs
vmag1_amg = vmag1_annot[vmag1_annot['amg_flags'].str.contains('M')]
len(vmag1_amg)

0

### vMAG2

In [45]:
# How many genes does DRAM-v recognize?
vmag2_annot = vmag2[vmag2['rank'] != 'E'] # E = no annotation
len(vmag2_annot)

9

In [46]:
# subset out AMGs
vmag2_amg = vmag2_annot[vmag2_annot['amg_flags'].str.contains('M')]
len(vmag2_amg)

2

In [48]:
vmag2_amg['kegg_hit'].value_counts()

GTP cyclohydrolase IA [EC:3.5.4.16]    1
NA                                     1
Name: kegg_hit, dtype: int64

In [49]:
vmag2_amg['viral_hit'].value_counts()

YP_008125321.1 GTP cyclohydrolase [Vibrio phage nt-1]    1
NA                                                       1
Name: viral_hit, dtype: int64

In [50]:
vmag2_amg['pfam_hits'].value_counts()

GTP cyclohydrolase I [PF01227.25]                   1
Queuosine biosynthesis protein QueC [PF06508.16]    1
Name: pfam_hits, dtype: int64

### vMAG3

In [52]:
# How many genes does DRAM-v recognize?
vmag3_annot = vmag3[vmag3['rank'] != 'E'] # E = no annotation
len(vmag3_annot)

3

In [53]:
# subset out AMGs
vmag3_amg = vmag3_annot[vmag3_annot['amg_flags'].str.contains('M')]
len(vmag3_amg)

0

### vMAG4

In [56]:
# How many genes does DRAM-v recognize?
vmag4_annot = vmag4[vmag4['rank'] != 'E'] # E = no annotation
len(vmag4_annot)

8

In [57]:
# subset out AMGs
vmag4_amg = vmag4_annot[vmag4_annot['amg_flags'].str.contains('M')]
len(vmag4_amg)

0

### vMAG5

In [58]:
# How many genes does DRAM-v recognize?
vmag5_annot = vmag5[vmag5['rank'] != 'E'] # E = no annotation
len(vmag5_annot)

19

In [59]:
# subset out AMGs
vmag5_amg = vmag5_annot[vmag5_annot['amg_flags'].str.contains('M')]
len(vmag5_amg)

2

In [60]:
vmag5_amg['kegg_hit'].value_counts()

NA    2
Name: kegg_hit, dtype: int64

In [61]:
vmag5_amg['viral_hit'].value_counts()

YP_009210858.1 carboxylate deaminase [Mycobacterium phage Vincenzo]    1
NA                                                                     1
Name: viral_hit, dtype: int64