## Genome Assembly Statistics

Ran assembly-stats on contigs.fasta and scaffolds.fasta with the following commands:

```assembly-stats contigs.fasta```  
```
stats for contigs.fasta
sum = 6678635, n = 176, ave = 37946.79, largest = 414008
N50 = 234690, n = 11
N60 = 202142, n = 14
N70 = 154232, n = 18
N80 = 103367, n = 23
N90 = 81286, n = 30
N100 = 128, n = 176
N_count = 0
Gaps = 0
```  
**Total Length of All Contigs:** 6678635 bp  
**Number of Contigs:** 176  
**N50:** 234690  


```assembly-stats scaffolds.fasta```  
```
stats for scaffolds.fasta
sum = 6678655, n = 174, ave = 38383.07, largest = 414008
N50 = 234690, n = 11
N60 = 202142, n = 14
N70 = 154232, n = 18
N80 = 103367, n = 23
N90 = 81821, n = 30
N100 = 128, n = 174
N_count = 20
Gaps = 2
```  
**Total Length of All Scaffolds:** 6678655 bp  
**Number of Scaffolds:** 174   
**N50:** 234690    


The N50 statistic is a measure of the quality of a genome assembly. It represents the sequence length of the shortest contig (or scaffold) at 50% of the total genome length. We use N50 rather than mean or median contig (or scaffold) length because it is a more robust statistic and can better account for very long reads. For our data, scaffolds are approximately equal in length to contigs

## 16S rRNA Genes

To isolate the 16S rRNA genes from our genome, we ran the ```rna_hmm3.py``` program on our assembled contigs as follows:  

```rna_hmm3.py -i contigs.fasta -o ~/16S_rRNA.gff```  

After removing all lines that did not contain 16S rRNA genes, we ran ```bedtools getfasta``` to extract nucleic acid sequences of the 16S rRNA genes from our assembly.  

```bedtools getfasta -fi contigs.fasta -bed 16S_rRNA.gf```

This generated a file called contigs.fasta.fai, but the appropriate sequences were printed to the command line. We used the Ribosomal Database Project’s SeqMatch tool to identify the genus each sequence originates from (to save space, we did not copy down the sequences in this ipython notebook).

**Sequence 1:** *Pseudomonas*  
**Sequence 2:** *Pseudomonas*  
**Sequence 3:** *Pseudomonas*  
**Sequence 4:** *Pseudomonas*  
**Sequence 5:** *Pseudomonas*  
**Sequence 6:** *Pseudomonas*  

The data clearly indicate that the bacterium was from the genus *Pseudomonas*. We weren't able to identify which species this genome was from because it matched with the 16S rRNA of many different species (S_ab scores greater than 0.98) , likely because rRNA is so highly conserved.  

## Genome Annotation

We used BASys and RAST to annotate our newly assembled genome.

## Annotation Analysis

According to Mark W. Silby, all of the *Pseudomonas* species share a versatile capacity for metabolic adaptation to a broad range of fluctuating environments and conditions. This claim is supported by our RAST analysis, which shows that this particular *Pseudomonas* specimen seems to be auxotrophic for a number of amino acids, including histidine, methionine, lysine, phenylalanine, tyrosine, and threonine. In fact, 559/2673 (20.9%) of the annotated genes appear to be devoted to the biosynthesis and degradation of amino acids and their derivatives - ideal for any bacterial species that must adapt to resource scarcity.

*Pseudomonas aeuroginosa* shows "outstanding capacity for developing antimicrobial resistance to nearly all available antipseudomonal agents through the selection of chromosomal mutations" (López-Causapé). It is likely that our *Pseudomonas* strain shares some or all of this resistance, given that it contains three genes that code for multidrug resistance efflux pumps, as well as several other genes that confer resistance to Streptothricin, fluoroquinolones, and Fosfomycin. Given these data, is entirely possible that this bacterium is also resistant to common antibacterial agents such as penicillin, tetracycline, and chloramphenicol. However, it encodes no toxins, bacteriocins, or antibiotics of its own (though it has two genes for E2 bacteriocin tolerance).

Though it does not appear to contain its own virulence genes, there is some evidence of horizontal gene transfer (HGT) of the *Mycobacterium* virulence operon. As HGT is common among bacteria, this is not surprising, though it may indicate increased pathogenicity in our particular sample.




## Works Cited
López-Causapé, C., Cabot, G., del Barrio-Tofiño, E., & Oliver, A. (2018). The Versatile Mutational Resistome of *Pseudomonas aeruginosa*. *Frontiers in Microbiology*, 9, 685. http://doi.org/10.3389/fmicb.2018.00685

Mark W. Silby, Craig Winstanley, Scott A.C. Godfrey, Stuart B. Levy, Robert W. Jackson; *Pseudomonas* genomes: diverse and adaptable, *FEMS Microbiology Reviews*, Volume 35, Issue 4, 1 July 2011, Pages 652–680, https://doi.org/10.1111/j.1574-6976.2011.00269.x