This jupyter notebook contains the source code for reproduction of the results detailed in the 2013 paper titled *Staphylococcus aureus* innate immume evasion is lineage-specific: A bioinfomatics study, authored by Alex J. McCarthy and Jodi A. Lindsay in Infection, Genetics and Evolution (https://doi.org/10.1016/j.meegid.2013.06.012)

### Download of *S. aureus*  genome isolates for analysis

88 isolates to be downloaded fron NCBI GeneBank Database

In [236]:
pip install biopython

Note: you may need to restart the kernel to use updated packages.


In [237]:
from Bio.Blast import NCBIWWW
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
from Bio.SeqUtils.ProtParam import ProteinAnalysis

In [238]:
from Bio import Entrez
Entrez.email = "dray@iastate.edu"

In [239]:
handle = Entrez.efetch(db="nucleotide", id="297250928", rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
print(record)

ID: NZ_ADVP00000000.1
Name: NZ_ADVP01000000
Description: Staphylococcus aureus subsp. aureus ATCC 51811, whole genome shotgun sequencing project
Database cross-references: BioProject:PRJNA224116, BioSample:SAMN00139434, Assembly:GCF_000164715.1
Number of features: 1
/molecule_type=DNA
/topology=linear
/data_file_division=BCT
/date=18-SEP-2019
/accessions=['NZ_ADVP00000000']
/sequence_version=1
/keywords=['WGS', 'HIGH_QUALITY_DRAFT', 'RefSeq']
/source=Staphylococcus aureus subsp. aureus ATCC 51811
/organism=Staphylococcus aureus subsp. aureus ATCC 51811
/taxonomy=['Bacteria', 'Firmicutes', 'Bacilli', 'Bacillales', 'Staphylococcaceae', 'Staphylococcus']
/references=[Reference(title='Direct Submission', ...)]
/comment=REFSEQ INFORMATION: The reference sequence was derived from
ADVP00000000.
The Staphylococcus aureus subsp. aureus ATCC 51811 whole genome
shotgun (WGS) project has the project accession NZ_ADVP00000000.
This version of the project (01) has the accession number
NZ_ADVP0100000

In [240]:
handle.close()

In [241]:
handle = Entrez.efetch(db="nucleotide", id="337208", rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
print(record)

ID: J01418.1
Name: HUMMTDL
Description: human mt (mitochondrial) d-loop region
Number of features: 1
/molecule_type=DNA
/topology=linear
/data_file_division=PRI
/date=03-AUG-1993
/accessions=['J01418']
/sequence_version=1
/keywords=['displacement loop']
/source=mitochondrion Homo sapiens (human)
/organism=Homo sapiens
/taxonomy=['Eukaryotae', 'mitochondrial eukaryotes', 'Metazoa', 'Chordata', 'Vertebrata', 'Eutheria', 'Primates', 'Catarrhini', 'Hominidae', 'Homo']
/references=[Reference(title='sequence and properties of the human kb cell and mouse l cell d-loop regions of mitochondrial dna', ...)]
Original source text: human mt dna from kb cells.
EMBL features not translated to GenBank features:
   key        from     to       description
   TRNA         <1     30       tRNA-Pro
   TRNA       1153  >1164       tRNA-Phe
   SITE        766    766       5' end of D-loop
   SITE        743    749       approx. 5' end of D-loop
   SITE        722    728       approx. 5' end of D-loop
   SIT

In [242]:
handle.close()

In [247]:
cat flpNC_017343.fasta

>NC_017343.1 Staphylococcus aureus subsp. aureus ECT-R 2 com plete genome
ATTAATCCCAATAAATCGAGTCGATTTCACCGTTTTTAACAACTTTAATATTTTTTCTGT 
TCTCTTCTAAAGGACGAGTTAAGTCAAAAGTATAGTAATCTCTAGGACCACCATCTTTAA 
TTCTGACAACTGCTTTCTTCACATCACCTTGGCTTAATTTTTTAACAATAAGGTACAGAT 
CATTAACAGTTTCAGCTTTGTAAGGAGTTAGATTTTTATCAGACTCTTTCATTAACTTAT 
CAATACGTTCATCGTCCTTTTTAGCTTGATCTGCTAAATTTTTTGCGATTTCTAAACCTT 
TCCATTCATAACTAAAGAAAGCTTTAGCATCATTAGTTTGAGTTAAAAGACCTGCAGCGA 
TTACTGTAGATGCGATAATAGTTTTTGTGATATTTTTTTTCAT


In [248]:
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.Alphabet import IUPAC
flpNC_017343_AA = SeqIO.read("flpNC_017343.fasta", "fasta")
, IUPAC.unambiguous_dna)
flpNC_017343_AA.translate(table="Bacterial")



SeqRecord(seq=Seq('INPNKSSRFHRF*QL*YFFCSLLKDELSQKYSNL*DHHL*F*QLLSSHHLGLIF...FFS', HasStopCodon(ExtendedIUPACProtein(), '*')), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])

In [249]:
from Bio.SeqUtils.ProtParam import ProteinAnalysis
flpNC_017343_AA = "INPNKSSRFHRF*QL*YFFCSLLKDELSQKYSNL*DHHL*F*QLLSSHHLGLIF...FFS"
flpNC_017343_AA = ProteinAnalysis(flpNC_017343_AA)    

In [250]:
flpNC_017343_AA.count_amino_acids()

{'A': 0,
 'C': 1,
 'D': 2,
 'E': 1,
 'F': 8,
 'G': 1,
 'H': 5,
 'I': 2,
 'K': 3,
 'L': 10,
 'M': 0,
 'N': 3,
 'P': 1,
 'Q': 3,
 'R': 2,
 'S': 8,
 'T': 0,
 'V': 0,
 'W': 0,
 'Y': 2}

In [251]:
flpNC_017343_AA.secondary_structure_fraction()

(0.3666666666666667, 0.21666666666666667, 0.18333333333333332)

          Calculate fraction of helix, turn and sheet. 
   
          Returns a list of the fraction of amino acids which tend 
          to be in Helix, Turn or Sheet. 
   
          Amino acids in helix: V, I, Y, F, W, L. 
          Amino acids in Turn: N, P, G, S. 
          Amino acids in sheet: E, M, A, L. 
   
          Returns a tuple of three floats (Helix, Turn, Sheet)
          

In [252]:
cat flpNC_002953.3.fasta

>NC_002953.3 Staphylococcus aureus strain MSSA476, complete  genome
ATTAATTTATCAGCACGTTCATCGTCCTTTTTAGCTTGGTCTGCTAAATTTTTAGCGATT        
TCTAAACCTTTCCATTCATAACTAAAGAACGCCTTAGCATCATTAGTTTGAGTTAGAAGA 
CCTGTAGCGATGACTGTAGATGCAATAATTACTTTTGTGATATTTTTTTTCAT


In [253]:
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.Alphabet import IUPAC
flpNC_002953_AA = SeqIO.read("flpNC_002953.3.fasta", "fasta")
, IUPAC.unambiguous_dna)
flpNC_002953_AA.translate(table="Bacterial")



SeqRecord(seq=Seq('INLSARSSSFLAWSAKFLAISKPFHS*LKNALASLV*VRRPVAMTVDAIITFVIFFF', HasStopCodon(ExtendedIUPACProtein(), '*')), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])

In [254]:
from Bio.SeqUtils.ProtParam import ProteinAnalysis
flpNC_002953_AA = "INLSARSSSFLAWSAKFLAISKPFHS*LKNALASLV*VRRPVAMTVDAIITFVIFFF"
flpNC_002953_AA = ProteinAnalysis(flpNC_002953_AA) 

In [255]:
flpNC_002953_AA.count_amino_acids()

{'A': 8,
 'C': 0,
 'D': 1,
 'E': 0,
 'F': 7,
 'G': 0,
 'H': 1,
 'I': 5,
 'K': 3,
 'L': 6,
 'M': 1,
 'N': 2,
 'P': 2,
 'Q': 0,
 'R': 3,
 'S': 8,
 'T': 2,
 'V': 5,
 'W': 1,
 'Y': 0}

In [256]:
flpNC_002953_AA.secondary_structure_fraction()

(0.42105263157894735, 0.21052631578947367, 0.2631578947368421)

          Calculate fraction of helix, turn and sheet. 
   
          Returns a list of the fraction of amino acids which tend 
          to be in Helix, Turn or Sheet. 
   
          Amino acids in helix: V, I, Y, F, W, L. 
          Amino acids in Turn: N, P, G, S. 
          Amino acids in sheet: E, M, A, L. 
   
          Returns a tuple of three floats (Helix, Turn, Sheet)
          