# Secondary structure. DSSP, Stride

## 0. Install necessary programs and download files

In [None]:
! wget https://files.rcsb.org/download/1A1L.pdb
! wget https://github.com/PDB-REDO/dssp/releases/download/v4.4.0/mkdssp-4.4.0-linux-x64
! chmod +x mkdssp-4.4.0-linux-x64
! pip install DSSPparser biopython

## 1. DSSP (Dictionary of Secondary Structure in Proteins)

Annotation of secondary structure, calculation of torsion angles, solvent accessibility and more.

DSSP is available as a standalone program, a web-server, and a databse:
- Installation (Linux/Windows): https://github.com/PDB-REDO/dssp/releases/tag/v4.4.0 
- Web-server takes PDB ID or custom structure as input: https://pdb-redo.eu/dssp

DSSP output is now availbale in traditional `.dssp` format and `.mmcif` format, that supports annotation of large structures. Check description of the format: https://pdb-redo.eu/dssp/about.

DSSP annotation is computed for all PDB structures. Single entry and whole databse downloads are available: \
`wget https://pdb-redo.eu/dssp/db/1csp/mmcif` - mmcif format \
`wget https://pdb-redo.eu/dssp/db/1csp/legacy` - traditional dssp format

**Secondary structure annotation in DSSP**

`H` - Alpha helix (4-12) \
`B` - Isolated beta-bridge residue \
`E`- Strand \
`G`- 3-10 helix \
`I` - Pi helix \
`T` - Turn \
`S` - Bend \
`-` - None

Citations:

*Joosten, R. P., te Beek, T. A. H., Krieger, E., Hekkelman, M. L., Hooft, R. W. W., Schneider, R., Sander, C., & Vriend, G. (2010). A series of PDB related databases for everyday needs. Nucleic Acids Research, 39(Database), D411–D419. https://doi.org/10.1093/nar/gkq1105*

*Kabsch, W., & Sander, C. (1983). Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features. Biopolymers, 22(12), 2577–2637. Portico. https://doi.org/10.1002/bip.360221211*

### Option 1: generate DSSP file with standalone program and parse it

In [None]:
# run DSSP
! ./mkdssp-4.4.0-linux-x64 1A1L.pdb --output-format dssp > 1A1L.dssp

In [None]:
# check file contents
! cat 1A1L.dssp

You may parse DSSP file into Pandas dataframe:

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)
from DSSPparser import parseDSSP

In [None]:
parser = parseDSSP('1A1L.dssp')
parser.parse()
pddict = parser.dictTodataframe()
pddict

Alternatively, you may parse DSSP file into dictionary:

In [None]:
from Bio.PDB.DSSP import DSSP, dssp_dict_from_pdb_file, make_dssp_dict

In [None]:
dssp = make_dssp_dict('1A1L.dssp')
dssp = dssp[0] # the function returns a tuple where the first and the only element is a dictionary

for i in dssp.items():
    print(i)
    break

#### How DSSP object is organized

A dictionary with the information for every residue. Both keys and values are tuples.

**Elements of keys:**

0 - chain identifier \
1 - tuple, where the second element is residue number as in PDB (insertion code)

**Elements of values:**

0 - Amino acid \
1 - Secondary structure \
2 - Relative ASA \
3 - Phi \
4 - Psi \
5 - Residue index \
6 - NH–>O_1_relidx \
7 - NH–>O_1_energy \
8 - O–>NH_1_relidx \
9 - O–>NH_1_energy \
10 - NH–>O_2_relidx \
11 - NH–>O_2_energy \
12 - O–>NH_2_relidx \
13 - O–>NH_2_energy

In [None]:
# print residues involved in a strand

for keys, values in dssp.items():
    chain = keys[0]
    res = keys[1][1]
    aa = values[0]
    ss = values[1]
    if ss == 'E':
        print('residue', aa, res, 'chain', chain)

### Option 2: generate DSSP dictionary object from PDB structure object in Biopython

Another way to parse DSSP is to, firstly, parse the PDB structure, and then generate secondary structure annotation using `DSSP()`. In this case the output will be a dictionary, but, in contrast to the previous outputs, the residue index will be the first element (not sixth) and instead of absolute values of accessible surface area there will be values of [relative ASA](https://en.wikipedia.org/wiki/Relative_accessible_surface_area) - absolute ASA diveded by maximum possible ASA. By default the `acc_array` parameter is `"Sander"` - the max ASA values from Sander & Rost (1994). See other options: https://biopython.org/docs/1.75/api/Bio.PDB.DSSP.html#Bio.PDB.DSSP.DSSP.

In [None]:
from Bio.PDB import PDBParser
p = PDBParser(QUIET=True) # ommit warnings

structure = p.get_structure("1A1L", "1A1L.pdb")
model = structure[0]

In [None]:
dssp = DSSP(model, "1A1L.pdb", acc_array='Miller', dssp='./mkdssp-4.4.0-linux-x64',)

In [None]:
for i in dssp:
    print(i)
    break

### Option 3: generate DSSP dictionary object from PDB file in Biopython

With DSSP installed locally, you can obtain DSSP annotation using `dssp_dict_from_pdb_file()` for the given PDB file.

In [None]:
pdb_file = '1A1L.pdb'
dssp = dssp_dict_from_pdb_file(in_file = pdb_file, DSSP = './mkdssp-4.4.0-linux-x64', dssp_version='4.4.0')
# the function returns a tuple where the first and the only element is a dictionary
dssp = dssp[0]

In [None]:
for k,v in dssp.items():
    print(k, v)
    break

## 2. STRIDE

STRIDE (**Str**uctural **ide**ntification) is a program used to assign secondary structure annotations to a protein structure. STRIDE has slightly more complex criteria to assign codes compared to DSSP. STRIDE utilizes the atomic coordinates of a structure to assign the structure codes, which are:

`H` - Alpha helix \
`G` - 3-10 helix \
`I` - PI-helix \
`E` - Extended conformation \
`B` or `b` - Isolated bridge \
`T` - Turn \
`C` - Coil (none of the above)

Citation:

*Frishman, D., & Argos, P. (1995). Knowledge‐based protein secondary structure assignment. Proteins: Structure, Function, and Bioinformatics, 23(4), 566–579. Portico. https://doi.org/10.1002/prot.340230412*

In [None]:
! mkdir -p stride
! wget https://webclu.bio.wzw.tum.de/stride/stride.tar.gz -P stride/
! cd stride; tar -zxf stride.tar.gz; make

In [None]:
# run STRIDE
! ./stride/stride 1A1L.pdb > 1A1L.stride

In [None]:
# check file contents
! cat 1A1L.stride

## 3. Ramachandran plot

In [None]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.colors as mplcolors
from matplotlib import colors

In [None]:
# data to plot rama plot general case contours
! wget https://github.com/kluwik/structural-bioinformatics/raw/main/rama_general.npy
rama_general = np.load('rama_general.npy')

In [None]:
### <--- Ramachandran plot for general case 

# https://github.com/gerdos/PyRAMA/
cmap = mplcolors.ListedColormap(['#FFFFFF', '#C2E6E2', '#AECFCB'])

plt.figure(figsize = (6, 6)) 
plt.imshow(rama_general, cmap=cmap, extent=(-180, 180, 180, -180), 
           norm=colors.BoundaryNorm([0, 0.0005, 0.02, 1], cmap.N))

plt.xlim([-180, 180])
plt.ylim([-180, 180])
plt.xticks(np.arange(-180,181,90))
plt.yticks(np.arange(-180,181,90))
plt.plot([-180, 180], [0, 0], color="black")
plt.plot([0, 0], [-180, 180], color="black")
plt.xlabel(r'$\phi$')
plt.ylabel(r'$\psi$')
plt.grid(True)

### Ramachandran plot for general case --->


# Plot dihedrals of the D-amino acids helix below in this cell to map them on the Rama plot



plt.show()