# PROVAV DOCUMENTATION
**Authors**: Sandra Castro Labrador,  Mª Rocío Valderrama Palacios 

## Contents Index
* [1. Introduction](#sec_1)
* [2. PDB files lecture](#sec_2)
* [3. Measures of sequences similarity](#sec_3)

    - [Root mean squared deviation](#sec_3_1)
    - [TM score](#sec_3_2)
    
* [4. 3D protein visualization](#sec_4)
* [5. Resources](#sec_5)


# 1. INTRODUCTION <a name="sec_1"/>

# 2. PDB FILES LECTURE <a name="sec_2"/>

# 3. MEASURES OF SEQUENCES SIMILARITY <a name="sec_3"/>

We have chosen two similarity measurement values in order to get an idea of structures of selected proteins, possibly changes between these structures. 
The objective is to obtain root mean squared deviation with some module from **BioPython** based on information we get with **PDB** proteins files. Moreover we want to obtain other important value to comparise structures, that is tm-score, with a different python package we have also intalled. To compare our results with accurate values: on the one hand we have download **PyMol** software to get some RMSD values, on the other hand we have used online software **TM-score** to get values of tm obtained from concrete structures proteins.

## 3.1 Root Mean Squared Deviation <a name="sec_3_1"/>

**What is RMSD?**
Is a commonly used measure of the average distance between the atoms of two superimposed structures. The equation for calculating RMSD is:

\begin{align}
\dot{RMSD} & = \sqrt{{1\over N} \sum_{k=1}^N \delta _i^2}
\end{align} 

N is the number of atoms to be aligned. We measure squared difference between the positions of the atom, coordinates of them. This measure is useful when we have two conformations of the same molecule, or parts of the same molecule. It needs the same number of atoms in both structures, so in different proteins this measure can change depends on atoms selections. 

When comparing structures this way, one array of atoms (we can call then move atoms) is translated and rotated with respect to the other (reference atoms). The final orientation chosen for this comparison is the one that minimizes the RMSD between the two structures. The smaller the RMSD is between two structures, more similarity exits between them.

### Biopython module used

To get this value we have choosen **Bio.SVDSuperimposer** module from BioPython. We have tested some modules like QCPSuperimposer, however selected class calculates the most exact values between proteins which with we have tested these features. The basic code is easy to apply. We have to get atoms coordinates from PDB files and the class *SVDSuperimposer()* will apply the superimposer and gets the RMSD value. It also provides initial value to compare with the final result. 

In PDB file structure we found different models proteins, with chain, residues and atoms. With the aim of achieving the best value with this module, we have create different functions and methods to get the best rmsd from all the models we found in structures. So we select the minimun RMSD. We have tested with different kind of proteins and isoforms proteins, so there are some cases that the number of atoms isn't the same in the two structures. We have matched the arrays to observe the results, and there are some variations. For that, this method is more optimal with isoforms proteins. 

## 3.2 TM Score <a name="sec_3_2"/>

We have complemented previous idea with tm score. We have found a python package, **tmscoring**, that provides very good results. We get very similar values compared to TM-score results.
This scoring function that assess the similarity of protein structures, 

This function makes the score value more sensitive to the global fold similarity than to the local structural variations and provides normalize distances. TM-score has the value in (0,1], where 1 indicates a perfect match between two structures. It's based on statistic: 
- 0.0 < TM-score < 0.17 : random structural similarity                 
- 0.5 < TM-score < 1.00 : in about the same fold   

We have get coherence results and in a very optimal way. This function has like parameters PDB files of proteins, so we haven't make any selection, so the results may be more accurate.


## 3D PROTEIN VISUALIZATION 

We have used NGLview to see 3D proteins structures. We have combined previous alignment with **Bio.PDB.Superimposer()** to show protein superimposer and see clear differences. Also we have used **Bio.PDB.PDBIO()** to create new PDB files with alignments and atom structures. 

For the visualization of Jupiter Widget of proteins 3D structures, in this case is necessary to have the installation of **NGLview**:
- conda install -c conda-forge nglview or conda upgrade nglview --force
- pip install nglview
- might need: jupyter-nbextension enable nglview --py --sys-prefix


In [2]:
# VISUALIZATION FUNCTION
import nglview
def visualizeNGLview(fileNameProtein):
    view = nglview.NGLWidget()
    view.add_component(fileNameProtein)
    
    return view

A Jupyter Widget

In [8]:
# Function for get structure -> its needed for some visualization
from Bio.PDB.PDBParser import PDBParser
def read_protein_pdb(file: str, proteinId: str):
    
    parser = PDBParser(PERMISSIVE=1)
    structure: Structure = parser.get_structure(proteinId, file)

    return structure

### 2PKA & 2PKB (V-ATPase a2-subunit isoforms)
We get a similar display from pyMol thanks to alignment of atoms with Bio.PDB.Superimposer: We have set the atoms and apply superimposer to one of the structures. We save the result in a new PDB and we compare it with the other structure. 

In [5]:
import nglview
view = nglview.NGLWidget()
view.add_component("aligned_ver1.pdb")
view.add_component("2kpb.pdb")
view

A Jupyter Widget

### Creatine Kinase from Human Muscle & Creatine Kinase from Human Brain

In [6]:
visualizeNGLview("G2_aligned.pdb")

A Jupyter Widget

Two proteins with high RMSD, so the similarity between these structures is lower.

In [24]:
visualizeNGLview("4i0p.pdb")

A Jupyter Widget

In [25]:
visualizeNGLview("/Users/scl/Desktop/PAB/1hdm.pdb")

A Jupyter Widget

In [9]:
# struct_1 = read_protein_pdb("G2_aligned.pdb", "G2_protein")

In [None]:
struct = nglview.PdbIdStructure("3B6R")

initial_repr = [
    {"type": "cartoon", "params": {
        "sele": "atoms", "color": "residueindex"
    }}
]
viewer = nglview.NGLWidget(struct, representations = initial_repr)
viewer

In [19]:
del viewer

In [22]:
view_2 = nglview.NGLWidget()
view_2.add_component("G2_aligned.pdb")
# view.add_surface(selection="protein", opacity=0.2)
view_2

A Jupyter Widget

In [23]:
# view.add_representation(repr_type='cartoon', selection='protein')

# view.add_cartoon(selection="protein")
# view.add_surface(selection="protein", opacity=0.2)
view_2.add_surface(component=1, color='blue', wireframe=True, opacity=0.2, isolevel=3.)
view_2

# specify color
# view.add_cartoon(selection="protein", color='blue')

# specify residue
#view.add_licorice('ALA, GLU')
# clear representations
#view.clear_representations()

A Jupyter Widget

In [9]:
del view_2

In [8]:
import nglview
view = nglview.NGLWidget()
view.add_component("4i0p.pdb")
view

A Jupyter Widget

In [None]:
view_2 = nglview.NGLWidget()
view_2.add_component("G2_aligned.pdb")

initial_repr = [
    {"type": "line", "params": {
        "sele": "protein", "color": "residueindex"
    }}
]
view_2.add_representation(initial_repr)
# view.add_representation('licorice', selection='not hydrogen')
view_2

#
#struct = nv.adaptor.FileStructure(nv.datafiles.PDB)


#viewer = nv.NGLWidget(struct, representations = initial_repr)
#viewer
#view

In [None]:
struct = nv.PdbIdStructure("2KPA")
#struct = nv.adaptor.FileStructure(nv.datafiles.PDB)
initial_repr = [
    {"type": "line", "params": {
        "sele": "protein", "color": "residueindex"
    }}
]

repr_2 = [
    {"type": "licorice", "params": {
        "sele": "protein" , "color": "atoms"
    }}
]

viewer = nv.NGLWidget(struct, representations = repr_2)
viewer

In [None]:
initial_repr = [
    {"type": "cartoon", "params": {
        "sele": "protein", "color": "sstruc"
    }}
]

view = nglview.NGLWidget(struc, representation=initial_repr)
view

In [None]:
import pytraj as pt
import nglview as nv

traj = traj = pt.load('2kpa.pdb', 'data/DPDP.parm7') 
view = nv.show_pytraj(traj)
# view

In [None]:
view.add_representation(repr_type='cartoon', selection='protein')

# or shorter
view.add_cartoon(selection="protein")
view.add_surface(selection="protein", opacity=0.3)

# specify color
view.add_cartoon(selection="protein", color='blue')

# specify residue
view.add_licorice('ALA, GLU')

# clear representations
view.clear_representations()
...

In [None]:
traj2 = pt.load('data/2kpa.pdb', 'data/DPDP.parm7')

# superpose to 1st frame, using only CA atoms
traj2.superpose(ref=0, mask='@CA')

view2 = nv.show_pytraj(traj2)
view2.clear_representations()
view2.add_representation('cartoon')
view2.add_representation('licorice', selection='not hydrogen')
view2

In [None]:
# load pdb file
traj3 = pt.load('data/3b6r.pdb')

# create view
view3 = nv.show_pytraj(traj3)

# display
view3.center_view()
view3