#### Visualisation of COVID protein 

##### target students: TBD

##### references:
* https://github.com/lanadominkovic/12-days-of-biopython/blob/main/12_days_of_biopython/day_07/day_07-3d_visualisation_covid.ipynb
* https://www.youtube.com/watch?v=ocA2IMe7dpAation_covid.ipynb

# Visualisation of COVID protein
In previous lesson we performed BLAST query on our covid protein.

We can also retrieve the same structural file SARS-CoV-2 from another database PDB (Protein Data Bank). PDB database stores protein records that contain coordinate information of each atom, which we will be using to visualize SARS-CoV-2 protein.

In [1]:
# id of protein we are searching for (cp from day 6 lecture)
seq_id = "pdb|6YYT|A"

In [2]:
id = seq_id.split("|")[1] # extract ID so we can download the PDB file from Protein Data Bank database. 

In [3]:
id

'6YYT'

The Protein Data Bank (pdb) file format is a textual file format describing the three-dimensional structures of molecules held in the Protein Data Bank.

Download pdb file with wget command:

In [4]:
import os.path

if not os.path.isfile('./6YYT.pdb'):
    print('Need to download the PDB file')
    !wget https://files.rcsb.org/download/6YYT.pdb
else:
    print('file exists, no need to download')

file exists, no need to download


### Reading PDB file with Biopython
Bio.PDB is a Biopython module that focuses on working with crystal structures of biological macromolecules. Among other things, Bio.PDB includes a PDBParser class that produces a Structure object, which can be used to access the atomic data in the file in a convenient manner. 

More about it in some later video :)

In [5]:
from Bio.PDB import PDBParser # PDBParser - parser for pdb files

In [6]:
parser = PDBParser()
structure = parser.get_structure('6YYT', '6YYT.pdb') # After parsing, we can fetch the protein structure using get_structure .
structure



<Structure id=6YYT>

#### Identify the number of chains
To identify how many chains this 6YYT covid viral protein has, we use chain.id function which gives us the list of the chains that are present.

In [7]:
for chain in structure[0]:
    print(f'chain ID: {chain.id}')

chain ID: A
chain ID: B
chain ID: C
chain ID: D
chain ID: P
chain ID: Q
chain ID: T
chain ID: U


We see that this viral SARS-CoV-2 polymerase has 8 chains or 8 accessory proteins, represented with single alphabet.

It is finally time for us We will use **nglview** which is an IPython/Jupyter widget to interactively view molecular structures and trajectories, to create an interactive visualization of 6YYT SARS-CoV-2 protein.

In [8]:
import nglview as nv



In [9]:
nv.show_biopython(structure, gui=True)

NGLWidget()

Tab(children=(Box(children=(Box(children=(Box(children=(Label(value='step'), IntSlider(value=1, min=-100)), la…

This is what the 6YYT SARS-CoV-2 protein looks like.
- Two helical stands with different shades of blue color are the RNA template strand and its product strand
- The bulk of red ribbons is the polymerase which is an enzyme (functional protein) that makes copies of the RNA chain. This polymerase is an attractive target for the antivirals COVID-19 vaccine.
- If we flip the molecule, we can see the yellow and orange ribbons, which are the viral proteins that help the polymerase stay on track and copy long portions of the RNA chain.