Python Version

In [None]:
from platform import python_version
python_version()

'3.8.15'

# Biopython
A set of libraries for analysis of biological data.

## Installation

In [None]:
pip install biopython

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting biopython
  Downloading biopython-1.80-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[K     |████████████████████████████████| 3.1 MB 5.1 MB/s 
Installing collected packages: biopython
Successfully installed biopython-1.80


In [None]:
# version installed
import Bio
print(Bio.__version__)

1.80


## Sequencing Data 

Multiple file formats can be used to read, write and index. For more detail, click [here](https://biopython.org/wiki/SeqIO)

* Provides a simple uniform interface to input and output assorted sequence file formats (including multiple sequence alignments)
* Will only deal with sequences as SeqRecord objects
* File formats:
  * `abi`
  * `cif-atoms`
  * `clustal`
  * `fasta`
  * `pdb-seqres`
  * `pdb-atom`
  * `swiss`

* Note: When using Bio.SeqIO for alignment, make sure all the sequences are the same length (aka. they need to include gaps)

### Drosophila X Virus

View the complete sequence for [Drosophila X Virus Segment A](https://www.ncbi.nlm.nih.gov/gene/993338) using Bio.SeqIO.

Note:
* Organism: Drosophila X virus
* Molecule type: genomic RNA
* Gene: 1 to 3099
* Locus tag: DxvsAgp1
* For more detail, click [here](https://www.ncbi.nlm.nih.gov/nuccore/NC_004177.1?report=genbank&from=108&to=3206)

In [243]:
gene_record = SeqIO.read("/content/google_drive/MyDrive/vp4_protease_structure/drosophila_x_virus_segment_A_complete_sequence.fasta", "fasta")
print(gene_record.seq)

ATGAATACGACAAACGAATACTTGAAAACTCTTTTAAACCCAGCACAATTTATCTCAGACATTCCTGATGATATAATGATCCGACACGTAAACAGCGCCCAGACCATCACCTACAACTTGAAGTCAGGGGCCTCTGGCACCGGCCTGATCGTGGTCTATCCAAACACCCCGTCGAGTATTAGCGGCTTCCATTACATATGGGATTCCGCTACCTCGAATTGGGTGTTTGATCAGTACATCTACACAGCTCAGGAGTTGAAGGACTCATATGACTATGGCAGACTGATTTCAGGCTCGCTAAGCATTAAGTCCAGCACCTTACCTGCGGGTGTTTATGCACTGAATGGCACATTCAATGCAGTCTGGTTCCAAGGGACCTTGAGTGAAGTGTCTGACTACTCTTACGATAGGATCCTGTCAATAACATCCAATCCTCTGGATAAGGTTGGAAATGTGTTGGTTGGAGACGGCATAGAGGTTCTAAGCCTGCCGCAGGGGTTCAACAACCCCTACGTTAGGCTGGGTGACAAGTCACCGTCCACTCTATCCTCTCCAACCCACATAACCAACACTTCCCAGAACTTGGCTACGGGAGGTGCATACATGATCCCAGTAACCACAGTTCCTGGGCAAGGATTCCATAACAAGGAATTCAGCATTAATGTGGACTCCGTAGGGCCAGTTGACATCTTGTGGTCTGGTCAAATGACTATGCAGGACGAATGGACTGTAACTGCAAATTATCAACCATTGAACATCTCTGGCACGCTAATTGCAAACAGTCAGCGAACCCTAACATGGTCCAACACTGGTGTATCCAATGGCAGCCACTACATGAACATGAACAACCTTAATGTCTCCCTTTTCCATGAGAATCCACCACCTGAACCCGTTGCCGCCATAAAAATAAACATCAATTATGGAAACAACACCAATGGTGACAGCTCGTTCAGTGTGGACTCATCATTTACCATCAATGTCATTGGGGGCGCCACCATTG

In [244]:
print("Sequence length (bp)", len(gene_record))

Sequence length (bp) 3099


View the complete amino acid sequence for [Drosophila X Virus Polyprotein](https://www.ncbi.nlm.nih.gov/protein/1545998) using `Bio.SeqIO`

* Chromosome: Segment A
* Region 2 to 442: Birnavirus VP2 protein
* Region 44 to 702: Birnavirus VP4 protein
* Region 734 to 983: Birnavirus VP3 protein

In [245]:
polyprotein_record = SeqIO.read("/content/google_drive/MyDrive/vp4_protease_structure/polyprotein_Drosophila_X_virus_sequence.fasta", "fasta")
print(polyprotein_record.seq)

MNTTNEYLKTLLNPAQFISDIPDDIMIRHVNSAQTITYNLKSGASGTGLIVVYPNTPSSISGFHYIWDSATSNWVFDQYIYTAQELKDSYDYGRLISGSLSIKSSTLPAGVYALNGTFNAVWFQGTLSEVSDYSYDRILSITSNPLDKVGNVLVGDGIEVLSLPQGFNNPYVRLGDKSPSTLSSPTHITNTSQNLATGGAYMIPVTTVPGQGFHNKEFSINVDSVGPVDILWSGQMTMQDEWTVTANYQPLNISGTLIANSQRTLTWSNTGVSNGSHYMNMNNLNVSLFHENPPPEPVAAIKININYGNNTNGDSSFSVDSSFTINVIGGATIGVNSPTVGVGYQGVAEGTAITISGINNYELVPNPDLQKNLPMTYGTCDPHDLTYIKYILSNREQLGLRSVMTLADYNRMKMYMHVLTNYHVDEREASSFDFWQLLKQIKNVAVPLAATLAPQFAPIIGAADGLANAILGDSASGRPVGNSASGMPISMSRRLRNAYSADSPLGEEHWLPNENENFNKFDIIYDVSHSSMALFPVIMMEHDKVIPSDPEELYIAVSLTESLRKQIPNLNDMPYYEMGGHRVYNSVSSNVRSGNFLRSDYILLPCYQLLEGRLASSTSPNKVTGTSHQLAIYAADDLLKSGVLGKAPFAAFTGSVVGSSVGEVFGINLKLQLTDSLGIPLLGNSPGLVQVKTLTSLDKKIKDMGDVKRRTPKQTLPHWTAGSASMNPFMNTNPFLEELDQPIPSNAAKPISEETRDLFLSDGQTIPSSQEKIATIHEYLLEHKELEEAMFSLISQGRGRSLINMVVKSALNIETQSREVTGERRQRLERKLRNLENQGIYVDESKIMSRGRISKEDTELAMRIARKNQKDAKLRRIYSNNASIQESYTVDDFVSYWMEQESLPTGIQIAMWLKGDDWSQPIPPRVQRRHYDSYIMMLGPSPTQEQADAVKDLVDDIYDRNQGKGPSQEQARELSHAVRRLISHSLVNQPATAPRVPPRR

In [246]:
print("Sequence length (bp)", len(polyprotein_record))

Sequence length (bp) 1032


#### `SeqRecord` class

* The only class of object returned by `SeqIO`.
* Extracting information from a `SeqRecord` object. Note: it depends on the file format (e.g. FASTA, GenBank).
* View all the information held in this object via: `print(record)`

In [248]:
from Bio import SeqIO
for polyprotein_record in SeqIO.parse("/content/google_drive/MyDrive/vp4_protease_structure/polyprotein_Drosophila_X_virus_sequence.fasta", "fasta"):
    print(polyprotein_record)

ID: AAB16798.1
Name: AAB16798.1
Description: AAB16798.1 polyprotein [Drosophila X virus]
Number of features: 0
Seq('MNTTNEYLKTLLNPAQFISDIPDDIMIRHVNSAQTITYNLKSGASGTGLIVVYP...DIV')


In [249]:
print("ID", polyprotein_record.id)
print("Name", polyprotein_record.name)
print("Description", polyprotein_record.description)
print("Sequence", polyprotein_record.seq)
print("Number of features", polyprotein_record.features) # empty

ID AAB16798.1
Name AAB16798.1
Description AAB16798.1 polyprotein [Drosophila X virus]
Sequence MNTTNEYLKTLLNPAQFISDIPDDIMIRHVNSAQTITYNLKSGASGTGLIVVYPNTPSSISGFHYIWDSATSNWVFDQYIYTAQELKDSYDYGRLISGSLSIKSSTLPAGVYALNGTFNAVWFQGTLSEVSDYSYDRILSITSNPLDKVGNVLVGDGIEVLSLPQGFNNPYVRLGDKSPSTLSSPTHITNTSQNLATGGAYMIPVTTVPGQGFHNKEFSINVDSVGPVDILWSGQMTMQDEWTVTANYQPLNISGTLIANSQRTLTWSNTGVSNGSHYMNMNNLNVSLFHENPPPEPVAAIKININYGNNTNGDSSFSVDSSFTINVIGGATIGVNSPTVGVGYQGVAEGTAITISGINNYELVPNPDLQKNLPMTYGTCDPHDLTYIKYILSNREQLGLRSVMTLADYNRMKMYMHVLTNYHVDEREASSFDFWQLLKQIKNVAVPLAATLAPQFAPIIGAADGLANAILGDSASGRPVGNSASGMPISMSRRLRNAYSADSPLGEEHWLPNENENFNKFDIIYDVSHSSMALFPVIMMEHDKVIPSDPEELYIAVSLTESLRKQIPNLNDMPYYEMGGHRVYNSVSSNVRSGNFLRSDYILLPCYQLLEGRLASSTSPNKVTGTSHQLAIYAADDLLKSGVLGKAPFAAFTGSVVGSSVGEVFGINLKLQLTDSLGIPLLGNSPGLVQVKTLTSLDKKIKDMGDVKRRTPKQTLPHWTAGSASMNPFMNTNPFLEELDQPIPSNAAKPISEETRDLFLSDGQTIPSSQEKIATIHEYLLEHKELEEAMFSLISQGRGRSLINMVVKSALNIETQSREVTGERRQRLERKLRNLENQGIYVDESKIMSRGRISKEDTELAMRIARKNQKDAKLRRIYSNNASIQESYTVDDFVSYWMEQESLPT

### Other application of `BioSeq.IO`:
* parse
* convert file format
* generate random subsequences
* filter by seqeuence length
* write seqeunce output

### VP4 Protease

Parse the FASTA file for the four proteins. Can perform this for other file formats too.

In [None]:
from Bio import SeqIO
bsnv = SeqIO.read("/content/drive/MyDrive/vp4_protease_structure/vp4_protease_sequence/rcsb_pdb_2GEF.fasta", "fasta")
ipnv_tri = SeqIO.read("/content/drive/MyDrive/vp4_protease_structure/vp4_protease_sequence/rcsb_pdb_2PNL.fasta", "fasta")
ipnv_hex = SeqIO.read("/content/drive/MyDrive/vp4_protease_structure/vp4_protease_sequence/rcsb_pdb_2PNM.fasta", "fasta")
tv_1 = SeqIO.read("/content/drive/MyDrive/vp4_protease_structure/vp4_protease_sequence/rcsb_pdb_3P06.fasta", "fasta")

print(bsnv)
print(ipnv_tri)
print(ipnv_hex)
print(tv_1)

ID: 2GEF_1|Chains
Name: 2GEF_1|Chains
Description: 2GEF_1|Chains A, B|Protease VP4|Blotched snakehead virus (311176)
Number of features: 0
Seq('MADLPISLLQTLAYKQPLGRNSRIVHFTDGALFPVVAFGDNHSTSELYIAVRGD...VTR')
ID: 2PNL_1|Chains
Name: 2PNL_1|Chains
Description: 2PNL_1|Chains A, B, C, D, E, F, G, H, I, J|Protease VP4|Infectious pancreatic necrosis virus (11002)
Number of features: 0
Seq('GKFSRALKNRLESANYEEVELPPPSKGVIVPVVHTVKSAPGEAFGSLAIIIPGE...QRA')
ID: 2PNM_1|Chain
Name: 2PNM_1|Chain
Description: 2PNM_1|Chain A|Protease VP4|Infectious pancreatic necrosis virus - Sp (11005)
Number of features: 0
Seq('LESANYEEVELPPPSKGVIVPVVHTVKSAPGEAFGSLAIIIPGEYPELLDANQQ...QRA')
ID: 3P06_1|Chain
Name: 3P06_1|Chain
Description: 3P06_1|Chain A|VP4 protein|Tellina virus 1 (321302)
Number of features: 0
Seq('NGVELSAVGVLLPVLMDSGRRISGGAFMAVKGDLSEHIKNPKNTRIAQTVAGGT...AQA')


In [136]:
print(tv_1.seq)

NGVELSAVGVLLPVLMDSGRRISGGAFMAVKGDLSEHIKNPKNTRIAQTVAGGTIYGLSEMVNIDEAEKLPIKGAITVLPVVQATATSILVPDNQPQLAFNSWEAAACAADTLESQQTPFLMVTGAVESGNLSPNLLAVQKQLLVAKPAGIGLAANSDRALKVVTLEQLRQVVGDKPWRKPMVTFSSGKNVAQA


View the type of information help in this object using the `dir()` function. Note: the `dir()` function returns all properties and methods of the specified object, without the values:
> `dir(record)`

*Replace* `record` *with object*

In [None]:
dir(bsnv)

['__add__',
 '__bool__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__radd__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_per_letter_annotations',
 '_seq',
 '_set_per_letter_annotations',
 '_set_seq',
 'annotations',
 'count',
 'dbxrefs',
 'description',
 'features',
 'format',
 'id',
 'islower',
 'isupper',
 'letter_annotations',
 'lower',
 'name',
 'reverse_complement',
 'seq',
 'translate',
 'upper']

In [None]:
#Sequence length for bsnv
print("Sequence length (bp)", len(bsnv))

Sequence length (bp) 217


In [None]:
# Note there are two chains
from Bio.SeqUtils import GC, molecular_weight
print("Molecular weight of Chain A:", Bio.SeqUtils.molecular_weight(bsnv.seq, "protein"))

Molecular weight of Chain A: 23441.183599999982


Convert DNA into RNA

In [None]:
def DNAtoRNA(dna):
  convert_T_to_U = dna.maketrans('T', 'U')
  generate_rna = dna.translate(convert_T_to_U)

  return generate_rna

dna_seq = "ATTTAGGGCC"
DNAtoRNA(dna_seq)

'AUUUAGGGCC'

Other application of `BioSeq.IO`:
parse
* convert file format
* generate random subsequences
* filter by seqeuence length
* write seqeunce output

# PyRosetta

italicized text

In [75]:
!pip install pyrosettacolabsetup
import pyrosettacolabsetup; pyrosettacolabsetup.install_pyrosetta()
import pyrosetta; pyrosetta.init()

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyrosettacolabsetup
  Downloading pyrosettacolabsetup-1.0.6-py3-none-any.whl (4.7 kB)
Installing collected packages: pyrosettacolabsetup
Successfully installed pyrosettacolabsetup-1.0.6
Mounted at /content/google_drive
Looking for compatible PyRosetta wheel file at google-drive/PyRosetta/colab.bin/wheels...
To obtain PyRosetta license please visit https://www.rosettacommons.org/software/license-and-download
Please enter you RC license login:levinthal
Please enter you RC license password:paradox
Downloading PyRosetta package...

Resolving west.rosettacommons.org (west.rosettacommons.org)... 128.95.160.153

HTTP request sent, awaiting response... 401 Unauthorized
Authentication selected: Basic realm="Rosetta Repo"

HTTP request sent, awaiting response... 302 Found
Location: https://west.rosettacommons.org/pyrosetta/release/release/PyRosetta4.MinSizeRel.python38.ubuntu.wheel/pyros

## `Pose` class in PyRosetta
Includes different types of information that describe the protein structure

In [78]:
from pyrosetta import *
init()

PyRosetta-4 2022 [Rosetta PyRosetta4.MinSizeRel.python38.ubuntu 2022.47+release.d2aee95a6b7bf6ee70c5e2c7b29d0915e9112fa7 2022-11-23T13:33:36] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Rosetta version: PyRosetta4.MinSizeRel.python38.ubuntu r336 2022.47+release.d2aee95a6b7 d2aee95a6b7bf6ee70c5e2c7b29d0915e9112fa7 http://www.pyrosetta.org 2022-11-23T13:33:36
core.init: command: PyRosetta -ex1 -ex2aro -database /usr/local/lib/python3.8/dist-packages/pyrosetta/database
basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/urandom', seed=-1065151969 seed_offset=0 real_seed=-1065151969
basic.random.init_random_generator: RandomGenerator:init: Normal mode, seed=-1065151969 RG_type=mt19937


Load the PDB file

In [83]:
pose = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb")

# Alternatively use pyrosetta.toolbox.rcsb.load_fasta_from_rcsb("2GEF")
# Note: It requires access the internet to load

core.import_pose.import_pose: File '/content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb' automatically determined to be of type PDB
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading Selenium SE from MSE as SD from MET
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFr

In [84]:
# View the sequence
pose.sequence()

'DLPISLLQTLAYKQPLGRNSRIVHFTDGALFPVVAFGDNHSTSELYIAVRGDHRDLMSPDVRDSYALTGDDHKVWGATHKFNVKTRTDLTILPVADVFWRADGSADVDVVWNDMPAVAGQSSSIALALASSLPFVPKAAYTGCLSGTNVQPVQFGNLKARAAHKIGLPLVGMTQDGGEDTRICTLDDAADHAFDSMESDLPISLLQTLAYKQPLGRNSRIVHFTDGALFPVVAFGDNHSTSELYIAVRGDHRDLMSPDVRDSYALTGDDHKVWGATHLKFNVKTRTDLTILPVADVFWRADGSADVDVVWNDMPAVAGQSSSIALALASSLPFVPKAAYTGCLSGTNVQPVQFGNLKARAAHKIGLPLVGMTQDGGEDTRICTLDDAADHAFDSMESTVTR'

Clean the PDB file to conform to the PyRosetta standard

In [85]:
from pyrosetta.toolbox import cleanATOM
cleanATOM("/content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb")

In [86]:
pose_clean = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb")

core.import_pose.import_pose: File '/content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb' automatically determined to be of type PDB
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading Selenium SE from MSE as SD from MET
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFr

In [87]:
# Sequence after cleaning the PDB file
pose_clean.sequence()
# No differences seen

'DLPISLLQTLAYKQPLGRNSRIVHFTDGALFPVVAFGDNHSTSELYIAVRGDHRDLMSPDVRDSYALTGDDHKVWGATHKFNVKTRTDLTILPVADVFWRADGSADVDVVWNDMPAVAGQSSSIALALASSLPFVPKAAYTGCLSGTNVQPVQFGNLKARAAHKIGLPLVGMTQDGGEDTRICTLDDAADHAFDSMESDLPISLLQTLAYKQPLGRNSRIVHFTDGALFPVVAFGDNHSTSELYIAVRGDHRDLMSPDVRDSYALTGDDHKVWGATHLKFNVKTRTDLTILPVADVFWRADGSADVDVVWNDMPAVAGQSSSIALALASSLPFVPKAAYTGCLSGTNVQPVQFGNLKARAAHKIGLPLVGMTQDGGEDTRICTLDDAADHAFDSMESTVTR'

In [88]:
pose.annotated_sequence()

'D[ASP:NtermProteinFull]LPISLLQTLAYKQPLGRNSRIVHFTDGALFPVVAFGDNHSTSELYIAVRGDHRDLMSPDVRDSYALTGDDHKVWGATH[HIS_D]KFNVKTRTDLTILPVADVFWRADGSADVDVVWNDMPAVAGQSSSIALALASSLPFVPKAAYTGCLSGTNVQPVQFGNLKARAAHKIGLPLVGMTQDGGEDTRICTLDDAADHAFDSMES[SER:CtermProteinFull]D[ASP:NtermProteinFull]LPISLLQTLAYKQPLGRNSRIVHFTDGALFPVVAFGDNHSTSELYIAVRGDHRDLMSPDVRDSYALTGDDHKVWGATHLKFNVKTRTDLTILPVADVFWRADGSADVDVVWNDMPAVAGQSSSIALALASSLPFVPKAAYTGCLSGTNVQPVQFGNLKARAAHKIGLPLVGMTQDGGEDTRICTLDDAADHAFDSMESTVTR[ARG:CtermProteinFull]'

In [89]:
pose_clean.annotated_sequence()

'D[ASP:NtermProteinFull]LPISLLQTLAYKQPLGRNSRIVHFTDGALFPVVAFGDNHSTSELYIAVRGDHRDLMSPDVRDSYALTGDDHKVWGATH[HIS_D]KFNVKTRTDLTILPVADVFWRADGSADVDVVWNDMPAVAGQSSSIALALASSLPFVPKAAYTGCLSGTNVQPVQFGNLKARAAHKIGLPLVGMTQDGGEDTRICTLDDAADHAFDSMES[SER:CtermProteinFull]D[ASP:NtermProteinFull]LPISLLQTLAYKQPLGRNSRIVHFTDGALFPVVAFGDNHSTSELYIAVRGDHRDLMSPDVRDSYALTGDDHKVWGATHLKFNVKTRTDLTILPVADVFWRADGSADVDVVWNDMPAVAGQSSSIALALASSLPFVPKAAYTGCLSGTNVQPVQFGNLKARAAHKIGLPLVGMTQDGGEDTRICTLDDAADHAFDSMESTVTR[ARG:CtermProteinFull]'

In [90]:
print(pose.total_residue())
print(pose_clean.total_residue())
# Same number of residues

401
401


Catalytic residues

Blotched Snakehead Virus (BSNV), PDB 2GEF:
- Serine 692
- Lysine 729

Infectious Pancreatic Necrosis Virus (IPNV) - Triclinic form, PDB 2PNL:
- Serine 633
- Lysine 674

Infectious Pancreatic Necrosis Virus (IPNV) - Hexagonal form, PDB 2PNM:
- Serine 633
- Lysine 674

Tellina Virus 1, PDB 3P06:
- Serine 738
- Lysine 777

## BSNV

### Pose

In [187]:
pose = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb")
pose.sequence()
from pyrosetta.toolbox import cleanATOM
cleanATOM("/content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb")
pose_clean = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb")
pose_clean.sequence()
pose.annotated_sequence()
pose_clean.annotated_sequence()
print(pose.total_residue())
print(pose_clean.total_residue())

core.import_pose.import_pose: File '/content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb' automatically determined to be of type PDB
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading Selenium SE from MSE as SD from MET
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFr

### Angles

In [188]:
print(pose.pdb_info())

# Catalytic Serine
resid_ser = pose.pdb_info().pdb2pose('A', 692)
print(resid_ser)
# Catalytic Lysine
resid_lys = pose.pdb_info().pdb2pose('A', 729)
print(resid_lys)

PDB file name: /content/google_drive/MyDrive/vp4_protease_structure/2gef.pdb
 Pose Range  Chain    PDB Range  |   #Residues         #Atoms

0001 -- 0079    A 0559  -- 0637  |   0079 residues;    01222 atoms
0080 -- 0198    A 0651  -- 0769  |   0119 residues;    01747 atoms
0199 -- 0277    B 0559  -- 0637  |   0079 residues;    01222 atoms
0278 -- 0401    B 0650  -- 0773  |   0124 residues;    01834 atoms
                           TOTAL |   0401 residues;    06025 atoms

121
158


Angles for Serine residue

In [189]:
print("phi:", pose.phi(resid_ser))
print("psi:", pose.psi(resid_ser))
print("chi1:", pose.chi(1, resid_ser))

phi: -61.94625549372842
psi: -8.219627882218836
chi1: 59.15864216058228


Angles for Lysine residue

In [190]:
print("phi:", pose.phi(resid_lys))
print("psi:", pose.psi(resid_lys))
print("chi1:", pose.chi(1, resid_lys))

phi: -53.85899147051257
psi: -42.578411192963664
chi1: -73.50990034388595


## Tellina virus 1

#### Pose

In [191]:
pose = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/3p06.pdb")
pose.sequence()
from pyrosetta.toolbox import cleanATOM
cleanATOM("/content/google_drive/MyDrive/vp4_protease_structure/3p06.pdb")
pose_clean = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/3p06.pdb")
pose_clean.sequence()
pose.annotated_sequence()
pose_clean.annotated_sequence()
print(pose.total_residue())
print(pose_clean.total_residue())

core.import_pose.import_pose: File '/content/google_drive/MyDrive/vp4_protease_structure/3p06.pdb' automatically determined to be of type PDB
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading Selenium SE from MSE as SD from MET
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFr

#### Angle

In [192]:
print(pose.pdb_info())

# Catalytic Serine
resid_ser = pose.pdb_info().pdb2pose('A', 738)
print(resid_ser)
# Catalytic Lysine
resid_lys = pose.pdb_info().pdb2pose('A', 777)
print(resid_lys)
# Note: result is 0, will omit

PDB file name: /content/google_drive/MyDrive/vp4_protease_structure/3p06.pdb
 Pose Range  Chain    PDB Range  |   #Residues         #Atoms

0001 -- 0194    A 0637  -- 0830  |   0194 residues;    02916 atoms
0195 -- 0201    A 0100  -- 0106  |   0007 residues;    00068 atoms
                           TOTAL |   0201 residues;    02984 atoms

102
141


Angles for Serine residue

In [193]:
print("phi:", pose.phi(resid_ser))
print("psi:", pose.psi(resid_ser))
print("chi1:", pose.chi(1, resid_ser))

phi: -70.07070649863586
psi: -1.4033892575937037
chi1: 74.3788294187084


Angles for Lysine residue

In [194]:
print("phi:", pose.phi(resid_lys))
print("psi:", pose.psi(resid_lys))
print("chi1:", pose.chi(1, resid_lys))

phi: -72.8864124621807
psi: -33.751707324943396
chi1: -66.1803096072293


## IPNV Hexagonal form

#### Pose

In [199]:
pose = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/2pnm.pdb")
pose.sequence()
from pyrosetta.toolbox import cleanATOM
cleanATOM("/content/google_drive/MyDrive/vp4_protease_structure/2pnm.pdb")
pose_clean = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/2pnm.pdb")
pose_clean.sequence()
pose.annotated_sequence()
pose_clean.annotated_sequence()
print(pose.total_residue())
print(pose_clean.total_residue())

core.import_pose.import_pose: File '/content/google_drive/MyDrive/vp4_protease_structure/2pnm.pdb' automatically determined to be of type PDB
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading Selenium SE from MSE as SD from MET
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFr

#### Angle

In [200]:
print(pose.pdb_info())

# Catalytic Serine
resid_ser = pose.pdb_info().pdb2pose('A', 633)
print(resid_ser)
# This residue is an ALA rather than LYS, therefore cannot obtain chi angle
resid_lys = pose.pdb_info().pdb2pose('A', 674)
print(resid_lys)

PDB file name: /content/google_drive/MyDrive/vp4_protease_structure/2pnm.pdb
 Pose Range  Chain    PDB Range  |   #Residues         #Atoms

0001 -- 0024    A 0524  -- 0547  |   0024 residues;    00372 atoms
0025 -- 0181    A 0555  -- 0711  |   0157 residues;    02333 atoms
                           TOTAL |   0181 residues;    02705 atoms

103
144


Angles for Serine residue

In [201]:
print("phi:", pose.phi(resid_ser))
print("psi:", pose.psi(resid_ser))
print("chi1:", pose.chi(1, resid_ser))

phi: -63.153077651620016
psi: -25.82048931582708
chi1: 69.11933415810839


Angles for Lysine residue

In [202]:
print("phi:", pose.phi(resid_lys))
print("psi:", pose.psi(resid_lys))
#print("chi1:", pose.chi(1, resid_lys))

phi: -54.20767301464966
psi: -41.86549466709202


## IPNV Triclinic form

#### Pose

In [195]:
pose = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/2pnl.pdb")
pose.sequence()
from pyrosetta.toolbox import cleanATOM
cleanATOM("/content/google_drive/MyDrive/vp4_protease_structure/2pnl.pdb")
pose_clean = pose_from_pdb("/content/google_drive/MyDrive/vp4_protease_structure/2pnl.pdb")
pose_clean.sequence()
pose.annotated_sequence()
pose_clean.annotated_sequence()
print(pose.total_residue())
print(pose_clean.total_residue())

core.import_pose.import_pose: File '/content/google_drive/MyDrive/vp4_protease_structure/2pnl.pdb' automatically determined to be of type PDB
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading Selenium SE from MSE as SD from MET
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFromSFRBuilder: Reading MSE as MET!
core.io.pose_from_sfr.PoseFr

#### Angle

In [196]:
print(pose.pdb_info())

# Catalytic Serine
resid_ser = pose.pdb_info().pdb2pose('A', 633)
print(resid_ser)
# Catalytic Lysine
resid_lys = pose.pdb_info().pdb2pose('A', 674)
print(resid_lys)

PDB file name: /content/google_drive/MyDrive/vp4_protease_structure/2pnl.pdb
 Pose Range  Chain    PDB Range  |   #Residues         #Atoms

0001 -- 0203    A 0514  -- 0716  |   0203 residues;    03054 atoms
0204 -- 0204    A 2002  -- 2002  |   0001 residues;    00009 atoms
0205 -- 0205    A 2005  -- 2005  |   0001 residues;    00009 atoms
0206 -- 0408    B 0514  -- 0716  |   0203 residues;    03054 atoms
0409 -- 0410    B 2003  -- 2004  |   0002 residues;    00018 atoms
0411 -- 0411    B 2009  -- 2009  |   0001 residues;    00009 atoms
0412 -- 0614    C 0514  -- 0716  |   0203 residues;    03054 atoms
0615 -- 0615    C 2001  -- 2001  |   0001 residues;    00009 atoms
0616 -- 0616    C 2006  -- 2006  |   0001 residues;    00009 atoms
0617 -- 0617    C 2014  -- 2014  |   0001 residues;    00009 atoms
0618 -- 0820    D 0514  -- 0716  |   0203 residues;    03054 atoms
0821 -- 0821    D 2010  -- 2010  |   0001 residues;    00009 atoms
0822 -- 1024    E 0514  -- 0716  |   0203 residues;    0

Angles for Serine residue

In [197]:
print("phi:", pose.phi(resid_ser))
print("psi:", pose.psi(resid_ser))
print("chi1:", pose.chi(1, resid_ser))

phi: -82.0006549823259
psi: -8.189326289083393
chi1: 72.9717782540581


Angles for Lysine residue

In [198]:
print("phi:", pose.phi(resid_lys))
print("psi:", pose.psi(resid_lys))
#print("chi1:", pose.chi(1, resid_lys))

phi: -67.21068735227797
psi: -35.91268287475042


## Protein Geometry

Next stages:
- Obtain bond length
- Alter chi angles
- Observe change in protein geometry using PyMOL
- Energy score between residues

# PyMol

## Installation

In [203]:
!pip install py3Dmol

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting py3Dmol
  Downloading py3Dmol-1.8.1-py2.py3-none-any.whl (6.5 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-1.8.1


In [204]:
import py3Dmol

RDKit Integration

In [207]:
!wget -c https://repo.continuum.io/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
!chmod +x Miniconda3-py37_4.8.3-Linux-x86_64.sh
!time bash ./Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -f -p /usr/local
!time conda install -q -y -c conda-forge rdkit=2020.03.5

--2022-12-04 09:05:48--  https://repo.continuum.io/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.200.79, 104.18.201.79, 2606:4700::6812:c94f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.200.79|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh [following]
--2022-12-04 09:05:48--  https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.130.3, 104.16.131.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.130.3|:443... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable

    The file is already fully retrieved; nothing to do.

PREFIX=/usr/local
Unpacking payload ...
Collecting package metadata (current_repodata.json): - \ done
Solving environment: / 

Import libraries

In [211]:
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')
import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

## View Protein Structure

In [234]:
view = py3Dmol.view(query='pdb:2gef', linked = False, viewergrid = (1,2))
chA = {'chain':'A'}
chB = {'chain':'B'}
view.setStyle(chA,{'cartoon': {'color':'red'}}, viewer = (0,0))
view.addSurface(py3Dmol.VDW,{'opacity':0.7,'color':'white'}, chA, viewer = (0,0))
view.addSurface(py3Dmol.VDW,{'opacity':0.5,'color':'orange'}, chB, viewer = (0,0))
view.setStyle(chB,{'cartoon':{'color':'blue'}}, viewer = (0,0))

view.setStyle(chA,{'cartoon': {'color':'spectrum'}}, viewer = (0,1))
view.setStyle(chB,{'cartoon':{'color':'spectrum'}}, viewer = (0,1))

view.show()