# The Most Beautiful Disulfide Bond in the World
Eric G. Suchanek, PhD 2/14/24

I illustrate some of the features of proteusPy by analyzing the lowest energy disulfide bond in the PDB database.


In [8]:
# General imports

import pandas as pd
import pyvista as pv
from pyvista import set_plot_theme

from proteusPy.Disulfide import *
from proteusPy.DisulfideList import DisulfideList
from proteusPy.DisulfideLoader import Load_PDB_SS

# pyvista setup for notebooks
pv.set_jupyter_backend('trame')

set_plot_theme('dark')
LIGHT = True

### Load the RCSB Disulfide Database
We load the database and get its properties as follows:

In [9]:
PDB_SS = Load_PDB_SS(verbose=True)
PDB_SS.describe()

-> load_PDB_SS(): Reading /Users/egs/miniforge3/envs/proteusPy/lib/python3.11/site-packages/proteusPy/data/PDB_SS_ALL_LOADER.pkl... 
-> load_PDB_SS(): Done reading /Users/egs/miniforge3/envs/proteusPy/lib/python3.11/site-packages/proteusPy/data/PDB_SS_ALL_LOADER.pkl... 
PDB IDs present:                    35818
Disulfides loaded:                  120494
Average structure resolution:       2.34 Å
Lowest Energy Disulfide:            2q7q_75D_140D
Highest Energy Disulfide:           1toz_456A_467A
Cα distance cutoff:                 8.00 Å
Total RAM Used:                     30.72 GB.


We see from the statistics above that disulfide 2q7q_75D_140D has the lowest energy, so let's extract it from the database and display it. 

A few notes about the display window. You might need to click into the window to refresh it. Click drag to rotate the structures, mousewheel to zoom. The window titles display several parameters about the disulfide bonds including their approximate torsional energy, their Ca-Ca distance, and the *torsion length*. 

The latter parameter is formally, the Euclidean length of the sidechain dihedral angle when treated as a five-dimensional vector. This sounds all mathy and complicated, but in essence it gives a measure of how 'long' that five dimensional vector is. This is used by the package to compare individual structures and gauge their structural similarity.

In [10]:
best_ss = PDB_SS['2q7q_75D_140D']
best_ss.pprint()
best_ss.display(style='sb', light=LIGHT)

<Disulfide 2q7q_75D_140D, Source: 2q7q, Resolution: 1.6 Å 
Χ1-Χ5: -59.36°, -59.28°, -83.66°, -59.82° -59.91°, -25.17°, 0.49 kcal/mol 
Cα Distance: 5.50 Å 
Torsion length: 145.62 deg>


Widget(value='<iframe src="http://localhost:58522/index.html?ui=P_0x17c263150_5&reconnect=auto" class="pyvista…

And that, gentle reader, is it. *The most beautiful disulfide bond in the world*! Look at it. It's gorgeous. The sidechain dihdedral angles (Χ1-Χ5: -59.36°, -59.28°, -83.66°, -59.82° -59.91°), and estimated energy, (0.49 kcal/mol) is *at* the analytical global minimum, (Χ1-Χ5: -60.00°, -60.00°, -90.00°, -60.00° -60.00°, 0.60 kcal/mol)! Let's have a look at the 'CPK' style rendering to see the structure's overall appearance:


In [None]:
best_ss.display(style='cpk', light=LIGHT, shadows=False)

### Finding Potential Structural Relatives by Sequence Similarity
Working under the assumption that similar sequence -> similar structure I generated a query on: https://www.ebi.ac.uk/pdbe/entry/pdb/2q7q to return PDB IDs for structures with high sequence similarity. This yielded a ```.csv``` file, which we will import below:

In [None]:
ss_df = pd.read_csv('2q7q_seqsim.csv')
ss_df.head(5)

All of the nearest sequence neighbors are sadly, bacterial. Let's extract the unique ids next.

In [None]:
relative_list = ss_df['pdb_id'].unique()
relative_list

We now need to convert the list of PDB IDs into real disulfides from the database. We do this with the ``DisulfideLoader.build_ss_from_idlist()`` function. Next we print out some relevant statistics.


In [None]:
relatives = DisulfideList([], 'relatives')
relatives = PDB_SS.build_ss_from_idlist(relative_list)

print(f'There are: {relatives.length} related structures.\nAverage Energy: {relatives.Average_Energy:.2f} kcal/mol\nAverage Ca distance: {relatives.Average_Distance:.2f} Å')
print(f'Average resolution: {relatives.Average_Resolution:.2f} Å \nAverage torsion distance: {relatives.Average_Torsion_Distance:.2f}°')

In [None]:
ssmin, ssmax = relatives.minmax_energy()
duolist = DisulfideList([ssmin, ssmax], 'mM')
duolist.display(style='sb', light=LIGHT)

In [None]:
duolist.display_overlay(light=LIGHT)

The two Disulfides...

We can find disulfides that are conformationally related by using the DisulfideList.nearest_neighbors() function with a dihedral angle cutoff. This cutoff is measure of angular similarity across all five sidechain dihedral angles.

In [None]:
close_neighbors = relatives.nearest_neighbors(ssmin.chi1, ssmin.chi2, ssmin.chi3, ssmin.chi4, ssmin.chi5, 10.0)
close_neighbors.length

In [None]:
close_neighbors.display_overlay(light=LIGHT)

In [None]:
ssTotList = PDB_SS.SSList
global_neighbors = ssTotList.nearest_neighbors(ssmin.chi1, ssmin.chi2, ssmin.chi3, ssmin.chi4, ssmin.chi5, 5.0)
global_neighbors.length

In [None]:
global_neighbors.display_overlay(light=LIGHT)

In [None]:
from scipy.optimize import minimize
import numpy as np


def ss_energy_function(x):
    chi1, chi2, chi3, chi4, chi5 = x
    energy = 2.0 * (np.cos(np.deg2rad(3.0 * chi1)) + np.cos(np.deg2rad(3.0 * chi5)))
    energy += np.cos(np.deg2rad(3.0 * chi2)) + np.cos(np.deg2rad(3.0 * chi4))
    energy += 3.5 * np.cos(np.deg2rad(2.0 * chi3)) + 0.6 * np.cos(np.deg2rad(3.0 * chi3)) + 10.1
    return energy
    
initial_guess = [-60.0, -60.0, -90.0, -60.0, -60.0] # initial guess for chi1, chi2, chi3, chi4, chi5
result = minimize(ss_energy_function, initial_guess, method="Nelder-Mead")
minimum_energy = result.fun
inputs = result.x

print(f'Minimum Energy: {minimum_energy:.3f} for conformation: {[f"{x:.3f}" for x in inputs]}')


## References
* *Application of Artificial Intelligence in Protein Design* - Doctoral Dissertation, EG Suchanek, 1987, Johns Hopkins Medical School
* https://doi.org/10.1021/bi00368a023
* https://doi.org/10.1021/bi00368a024
* https://doi.org/10.1016/0092-8674(92)90140-8
* http://dx.doi.org/10.2174/092986708783330566
* https://doi.org/10.1021/bi0603064
* https://doi.org/10.1021/bi9826658
* https://pubmed.ncbi.nlm.nih.gov/22782563/
* 