# Assignment 2. MolVis, DSSP/STRIDE

## Task 4

For a given protein determine what residues are exposed and what are buried: calculate [relative accessible surface area (RSA)](https://en.wikipedia.org/wiki/Relative_accessible_surface_area) and, assuming the threshold of solvent exposure is 25%, define the residue type (buried/exposed).

Submit a script that inputs a protein and for every residue calculates RSA and assigns the solvent exposure type.

## Solution

> Measure of residue solvent exposure is relative solvent accessibility (RSA) that is defined as the ratio between accessible surface area of a residue in a protein and maximal accessible surface area for a given amino acid (tabulated values):

> $$ RSA = \frac{ASA}{maxASA} $$

The task is to define residue exposure level (buried or exposed) given RSA threshold 25% - to print 'buried' if the RSA is greater that or equals to 0.25 or 'exposed' if RSA is less than 0.25. There are two ways how to approach the task using Biopython:

a) To create dssp file from *pdb file* in Biopython using `DSSP()` function and retrieve RSA values from the tuple. When you create dssp file in Biopython it calculates RSA for you (by default Sander maximal ASA values are used).

b) To retrieve ASA values for each residue from *dssp file* (they are stored in dssp dictionary created by `make_dssp_dict()` function) and calculate RSA dividing ASA by maximal ASA (first, create a dictionary of max ASA values by your own).

## a) From .pdb file using `DSSP()`

https://biopython.org/docs/1.75/api/Bio.PDB.DSSP.html

In [61]:
from Bio.PDB import PDBParser
from Bio.PDB.DSSP import *
from Bio.PDB.Polypeptide import three_to_one, one_to_three
p = PDBParser()

In [5]:
!wget https://files.rcsb.org/download/2axi.pdb -P ../pdb

--2020-11-27 11:18:37--  https://files.rcsb.org/download/2axi.pdb
Resolving files.rcsb.org (files.rcsb.org)... 128.6.158.49
Connecting to files.rcsb.org (files.rcsb.org)|128.6.158.49|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘../pdb/2axi.pdb’

2axi.pdb                [  <=>               ] 124.19K   336KB/s    in 0.4s    

2020-11-27 11:18:38 (336 KB/s) - ‘../pdb/2axi.pdb’ saved [127170]



In [71]:
# create a dssp file from pdb file
file = "../pdb/2axi.pdb"
structure = p.get_structure("2axi", file)
model = structure[0]
dssp = DSSP(model, file)



In [72]:
### EXPLANATION ### 

# DSSP() identifies the installed dssp program on cluster and generates a dssp file in a form of a tuple. 
# Iterrating through dssp you get information on each residue. The 4th item is RSA (index 3).
for i in dssp:
    print(i)

(1, 'E', '-', 1.0, 360.0, 16.0, 0, 0.0, 2, -1.6, 0, 0.0, 3, -0.1)
(2, 'Q', '-', 0.32323232323232326, -89.0, 87.6, 1, -0.2, 27, -2.2, 25, -0.1, 28, -0.2)
(3, 'E', '-', 0.711340206185567, -102.8, 1.3, -2, -1.6, 89, -0.2, 25, -0.2, -1, -0.2)
(4, 'T', '-', 0.2605633802816901, -42.7, 123.5, 23, -0.1, 24, -2.0, -3, -0.1, 2, -0.4)
(5, 'L', 'E', 0.22560975609756098, -87.2, 128.7, 22, -0.2, 83, -2.2, 83, -0.2, 2, -0.3)
(6, 'V', 'E', 0.0, -140.1, 152.8, 20, -3.0, 20, -2.7, -2, -0.4, 81, -0.2)
(7, 'R', 'E', 0.4838709677419355, -120.2, 103.1, 79, -2.6, 79, -2.0, -2, -0.3, 18, -0.2)
(8, 'P', 'E', 0.08088235294117647, -58.3, 143.0, 0, 0.0, 77, -0.3, 0, 0.0, 74, -0.1)
(9, 'K', '-', 0.44878048780487806, -74.4, 171.5, 75, -2.8, 4, -2.8, 1, -0.1, 5, -0.3)
(10, 'P', 'H', 0.7867647058823529, -56.0, -43.4, 0, 0.0, 4, -1.9, 0, 0.0, 5, -0.1)
(11, 'L', 'H', 0.32926829268292684, -61.2, -52.7, 2, -0.2, 4, -1.7, 1, -0.2, 5, -0.1)
(12, 'L', 'H', 0.0, -58.4, -40.6, 72, -0.3, 4, -2.4, 1, -0.2, 5, -0.2)
(13, 'L', 'H

In [18]:
### EXPLANATION ### 

# keys store information on chain and pdb residue number
for i in dssp.keys():
    print(i)

('A', (' ', 23, ' '))
('A', (' ', 24, ' '))
('A', (' ', 25, ' '))
('A', (' ', 26, ' '))
('A', (' ', 27, ' '))
('A', (' ', 28, ' '))
('A', (' ', 29, ' '))
('A', (' ', 30, ' '))
('A', (' ', 31, ' '))
('A', (' ', 32, ' '))
('A', (' ', 33, ' '))
('A', (' ', 34, ' '))
('A', (' ', 35, ' '))
('A', (' ', 36, ' '))
('A', (' ', 37, ' '))
('A', (' ', 38, ' '))
('A', (' ', 39, ' '))
('A', (' ', 40, ' '))
('A', (' ', 41, ' '))
('A', (' ', 42, ' '))
('A', (' ', 43, ' '))
('A', (' ', 44, ' '))
('A', (' ', 45, ' '))
('A', (' ', 46, ' '))
('A', (' ', 47, ' '))
('A', (' ', 48, ' '))
('A', (' ', 49, ' '))
('A', (' ', 50, ' '))
('A', (' ', 51, ' '))
('A', (' ', 52, ' '))
('A', (' ', 53, ' '))
('A', (' ', 54, ' '))
('A', (' ', 55, ' '))
('A', (' ', 56, ' '))
('A', (' ', 57, ' '))
('A', (' ', 58, ' '))
('A', (' ', 59, ' '))
('A', (' ', 60, ' '))
('A', (' ', 61, ' '))
('A', (' ', 62, ' '))
('A', (' ', 63, ' '))
('A', (' ', 64, ' '))
('A', (' ', 65, ' '))
('A', (' ', 66, ' '))
('A', (' ', 67, ' '))
('A', (' '

In [26]:
### EXPLANATION ### 

# each residue might be accessed through a corresponding key:
dssp[('A', (' ', 26, ' '))]

(4,
 'T',
 '-',
 0.2605633802816901,
 -42.7,
 123.5,
 23,
 -0.1,
 24,
 -2.0,
 -3,
 -0.1,
 2,
 -0.4)

In [24]:
### EXPLANATION ### 

# more convenient to store keys in a variable
residues = dssp.keys()
residues[3]

('A', (' ', 26, ' '))

In [27]:
### EXPLANATION ### 

# chain
residues[3][0]

'A'

In [31]:
### EXPLANATION ### 

# residue number as in pdb file
residues[3][1][1]

26

In [25]:
### EXPLANATION ### 

dssp[residues[3]]

(4,
 'T',
 '-',
 0.2605633802816901,
 -42.7,
 123.5,
 23,
 -0.1,
 24,
 -2.0,
 -3,
 -0.1,
 2,
 -0.4)

In [29]:
### EXPLANATION ### 

# retrieving residues for a certain chain
for residue in residues:
    if residue[0] == 'A':
        print(dssp[residue])

(1, 'E', '-', 1.0, 360.0, 16.0, 0, 0.0, 2, -1.6, 0, 0.0, 3, -0.1)
(2, 'Q', '-', 0.32323232323232326, -89.0, 87.6, 1, -0.2, 27, -2.2, 25, -0.1, 28, -0.2)
(3, 'E', '-', 0.711340206185567, -102.8, 1.3, -2, -1.6, 89, -0.2, 25, -0.2, -1, -0.2)
(4, 'T', '-', 0.2605633802816901, -42.7, 123.5, 23, -0.1, 24, -2.0, -3, -0.1, 2, -0.4)
(5, 'L', 'E', 0.22560975609756098, -87.2, 128.7, 22, -0.2, 83, -2.2, 83, -0.2, 2, -0.3)
(6, 'V', 'E', 0.0, -140.1, 152.8, 20, -3.0, 20, -2.7, -2, -0.4, 81, -0.2)
(7, 'R', 'E', 0.4838709677419355, -120.2, 103.1, 79, -2.6, 79, -2.0, -2, -0.3, 18, -0.2)
(8, 'P', 'E', 0.08088235294117647, -58.3, 143.0, 0, 0.0, 77, -0.3, 0, 0.0, 74, -0.1)
(9, 'K', '-', 0.44878048780487806, -74.4, 171.5, 75, -2.8, 4, -2.8, 1, -0.1, 5, -0.3)
(10, 'P', 'H', 0.7867647058823529, -56.0, -43.4, 0, 0.0, 4, -1.9, 0, 0.0, 5, -0.1)
(11, 'L', 'H', 0.32926829268292684, -61.2, -52.7, 2, -0.2, 4, -1.7, 1, -0.2, 5, -0.1)
(12, 'L', 'H', 0.0, -58.4, -40.6, 72, -0.3, 4, -2.4, 1, -0.2, 5, -0.2)
(13, 'L', 'H

In [65]:
# a function inputs RSA value and outputs solvent exposure - exposed or buried
def solvent_acc(rsa):
    if rsa >= 0.25: 
        return 'is exposed'
    elif rsa < 0.25:
        return 'is buried'

In [77]:
print(solvent_acc(0.18) + '\n' + solvent_acc(0.56))

is buried
is exposed


In [70]:
# collect amino acid (aa) residue number (resnum) and rsa only for for chain A and print solvent exposure via the function
for residue in residues:
    if residue[0] == 'A':
        aa = dssp[residue][1]
        resnum = residue[1][1]
        rsa = dssp[residue][3]
        print(aa+str(resnum), solvent_acc(rsa) + ', RSA =', "%.2f" % rsa)

E23 is exposed, RSA = 1.00
Q24 is exposed, RSA = 0.32
E25 is exposed, RSA = 0.71
T26 is exposed, RSA = 0.26
L27 is buried, RSA = 0.23
V28 is buried, RSA = 0.00
R29 is exposed, RSA = 0.48
P30 is buried, RSA = 0.08
K31 is exposed, RSA = 0.45
P32 is exposed, RSA = 0.79
L33 is exposed, RSA = 0.33
L34 is buried, RSA = 0.00
L35 is buried, RSA = 0.18
K36 is exposed, RSA = 0.66
L37 is buried, RSA = 0.00
L38 is buried, RSA = 0.01
K39 is exposed, RSA = 0.26
S40 is exposed, RSA = 0.51
V41 is exposed, RSA = 0.25
G42 is exposed, RSA = 0.69
A43 is buried, RSA = 0.08
Q44 is exposed, RSA = 0.75
K45 is exposed, RSA = 0.45
D46 is exposed, RSA = 0.54
T47 is exposed, RSA = 0.35
Y48 is buried, RSA = 0.01
T49 is exposed, RSA = 0.26
M50 is buried, RSA = 0.00
K51 is exposed, RSA = 0.44
E52 is exposed, RSA = 0.31
V53 is buried, RSA = 0.00
L54 is buried, RSA = 0.00
F55 is exposed, RSA = 0.31
Y56 is exposed, RSA = 0.27
L57 is buried, RSA = 0.00
G58 is buried, RSA = 0.00
Q59 is buried, RSA = 0.24
Y60 is buried, R

## b) From .dssp file using `make_dssp_dict()`

https://biopython.org/docs/1.75/api/Bio.PDB.DSSP.html#Bio.PDB.DSSP.make_dssp_dict

In [41]:
!mkdssp -i ../pdb/2axi.pdb -o ../2axi.dssp

In [46]:
dssp_dict = make_dssp_dict('../2axi.dssp')
dssp_dict

({('A', (' ', 23, ' ')): ('E',
   '-',
   223,
   360.0,
   16.0,
   1,
   0,
   0.0,
   2,
   -1.6,
   0,
   0.0,
   3,
   -0.1),
  ('A', (' ', 24, ' ')): ('Q',
   '-',
   64,
   -89.0,
   87.6,
   2,
   1,
   -0.2,
   27,
   -2.2,
   25,
   -0.1,
   28,
   -0.2),
  ('A', (' ', 25, ' ')): ('E',
   '-',
   138,
   -102.8,
   1.3,
   3,
   -2,
   -1.6,
   89,
   -0.2,
   25,
   -0.2,
   -1,
   -0.2),
  ('A', (' ', 26, ' ')): ('T',
   '-',
   37,
   -42.7,
   123.5,
   4,
   23,
   -0.1,
   24,
   -2.0,
   -3,
   -0.1,
   2,
   -0.4),
  ('A', (' ', 27, ' ')): ('L',
   'E',
   37,
   -87.2,
   128.7,
   5,
   22,
   -0.2,
   83,
   -2.2,
   83,
   -0.2,
   2,
   -0.3),
  ('A', (' ', 28, ' ')): ('V',
   'E',
   0,
   -140.1,
   152.8,
   6,
   20,
   -3.0,
   20,
   -2.7,
   -2,
   -0.4,
   81,
   -0.2),
  ('A', (' ', 29, ' ')): ('R',
   'E',
   120,
   -120.2,
   103.1,
   7,
   79,
   -2.6,
   79,
   -2.0,
   -2,
   -0.3,
   18,
   -0.2),
  ('A', (' ', 30, ' ')): ('P',
   'E',
   11,
   

In [49]:
# create a dictionary of max ASA 
# Miller et al. 1987 https://doi.org/10.1016/0022-2836(87)90038-6
rsa_miller = {
        "ALA": 113.0,
        "ARG": 241.0,
        "ASN": 158.0,
        "ASP": 151.0,
        "CYS": 140.0,
        "GLN": 189.0,
        "GLU": 183.0,
        "GLY": 85.0,
        "HIS": 194.0,
        "ILE": 182.0,
        "LEU": 180.0,
        "LYS": 211.0,
        "MET": 204.0,
        "PHE": 218.0,
        "PRO": 143.0,
        "SER": 122.0,
        "THR": 146.0,
        "TRP": 259.0,
        "TYR": 229.0,
        "VAL": 160.0
    }

In [58]:
# function calculating RSA as ASA / maxASA
def rsa_calc(aa, asa):
    for i in rsa_miller.items():
        if aa == i[0]:
            rsa = asa/i[1]
            return rsa

In [59]:
rsa_calc('TYR', 27)

0.11790393013100436

In [69]:
# 1. collect amino acid (aa) residue number (resnum) and asa only for for chain A
# 2. calculate rsa via the function
# 3. print solvent exposure via the function
for residue in dssp_dict[0].items():
    if residue[0][0] == 'A':
        resnum = residue[0][1][1]
        aa = residue[1][0]
        asa = residue[1][2]
        rsa = rsa_calc(one_to_three(aa), asa)
        print(f'{aa}{resnum} {solvent_acc(rsa)}, RSA = {rsa:.2f}')

E23 is exposed, RSA = 1.22
Q24 is exposed, RSA = 0.34
E25 is exposed, RSA = 0.75
T26 is exposed, RSA = 0.25
L27 is buried, RSA = 0.21
V28 is buried, RSA = 0.00
R29 is exposed, RSA = 0.50
P30 is buried, RSA = 0.08
K31 is exposed, RSA = 0.44
P32 is exposed, RSA = 0.75
L33 is exposed, RSA = 0.30
L34 is buried, RSA = 0.00
L35 is buried, RSA = 0.17
K36 is exposed, RSA = 0.64
L37 is buried, RSA = 0.00
L38 is buried, RSA = 0.01
K39 is exposed, RSA = 0.25
S40 is exposed, RSA = 0.54
V41 is buried, RSA = 0.23
G42 is exposed, RSA = 0.68
A43 is buried, RSA = 0.07
Q44 is exposed, RSA = 0.79
K45 is exposed, RSA = 0.44
D46 is exposed, RSA = 0.58
T47 is exposed, RSA = 0.34
Y48 is buried, RSA = 0.01
T49 is exposed, RSA = 0.25
M50 is buried, RSA = 0.00
K51 is exposed, RSA = 0.43
E52 is exposed, RSA = 0.33
V53 is buried, RSA = 0.00
L54 is buried, RSA = 0.00
F55 is exposed, RSA = 0.28
Y56 is exposed, RSA = 0.26
L57 is buried, RSA = 0.00
G58 is buried, RSA = 0.00
Q59 is exposed, RSA = 0.25
Y60 is buried, R