# The Most Beautiful Disulfide Bond in the World
Eric G. Suchanek, PhD 2/24/24

In this notebook illustrate some of the features of proteusPy by analyzing the lowest energy disulfide bond in the RCSB protein structure database. If you are not familiar with `proteusPy` you can find the API at: https://suchanek.githubio.com/proteusPy.html


In [1]:
import pyvista as pv
from pyvista import set_plot_theme

from proteusPy.Disulfide import Disulfide
from proteusPy.DisulfideList import DisulfideList
from proteusPy.DisulfideLoader import Load_PDB_SS

# pyvista setup for notebooks
pv.set_jupyter_backend("trame")

set_plot_theme("default")
LIGHT = True

### Load the RCSB Disulfide Database
We load the database and get its properties as follows:

In [2]:
PDB_SS = Load_PDB_SS(verbose=True)
PDB_SS.describe()

--> DisulfideLoader: Downloading Disulfide Database from Drive...


Downloading...
From (original): https://drive.google.com/uc?id=1igF-sppLPaNsBaUS7nkb13vtOGZZmsFp
From (redirected): https://drive.google.com/uc?id=1igF-sppLPaNsBaUS7nkb13vtOGZZmsFp&confirm=t&uuid=849c5738-7386-4a11-a235-48a82ec7fc94
To: /Users/egs/miniforge3/envs/ppydev/lib/python3.11/site-packages/proteusPy/data/PDB_SS_ALL_LOADER.pkl
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 340M/340M [00:44<00:00, 7.62MB/s]


--> DisulfideLoader: Downloading Disulfide Subset Database from Drive...


Downloading...
From: https://drive.google.com/uc?id=1puy9pxrClFks0KN9q5PPV_ONKvL-hg33
To: /Users/egs/miniforge3/envs/ppydev/lib/python3.11/site-packages/proteusPy/data/PDB_SS_SUBSET_LOADER.pkl
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.64M/9.64M [00:01<00:00, 8.68MB/s]


-> load_PDB_SS(): Reading /Users/egs/miniforge3/envs/ppydev/lib/python3.11/site-packages/proteusPy/data/PDB_SS_ALL_LOADER.pkl... 


ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'

We see from the statistics above that disulfide 2q7q_75D_140D has the lowest energy, so let's extract it from the database and display it. 

A few notes about the display window. You might need to click into the window to refresh it. Click drag to rotate the structures, mousewheel to zoom. The window titles display several parameters about the disulfide bonds including their approximate torsional energy, their Ca-Ca distance, and the *torsion length*. 

The latter parameter is formally, the Euclidean length of the sidechain dihedral angle when treated as a five-dimensional vector. This sounds all mathy and complicated, but in essence it gives a measure of how 'long' that five dimensional vector is. This is used by the package to compare individual structures and gauge their structural similarity.

In [None]:
ssmin, ssmax = PDB_SS.SSList.minmax_energy
ssmin_energy = ssmin.energy

best_ss = PDB_SS["2q7q_75D_140D"]
best_dihedrals = best_ss.dihedrals
best_ss.pprint()
best_ss.display(style="sb", light=LIGHT)

And that, gentle reader, is it. *The most beautiful disulfide bond in the world*! Look at it. This is the lowest energy structure in the entire database. The sidechain dihdedral angles (Χ1-Χ5: -59.36°, -59.28°, -83.66°, -59.82° -59.91°), and the estimated energy, (0.49 kcal/mol). 

How does this compare to an analytical (modelled) minimum? We can use the ``minimize`` module from ``scipy`` to check. We know from chemistry that a reasonable guess for a low energy conformation would have the dihedral angles: (Χ1-Χ5: -60.00°, -60.00°, -90.00°, -60.00° -60.00°, 0.60 kcal/mol). Let's run this through scipy and compute a minimum energy conformation:


In [None]:
from scipy.optimize import minimize
from proteusPy.Disulfide import Disulfide_Energy_Function

initial_guess = [
    -60.0,
    -60.0,
    -90.0,
    -60.0,
    -60.0,
]  # initial guess for chi1, chi2, chi3, chi4, chi5
result = minimize(Disulfide_Energy_Function, initial_guess, method="Nelder-Mead")
minimum_energy = result.fun
min_conf = result.x
print(
    f'Modeled minimum energy: {minimum_energy:.3f} kcal/mol for conformation: {[f"{x:.3f}" for x in min_conf]}'
)

So the computed minimum energy structure is *0.489* kcal/mol, estimated. The difference from the actual conformation is:

In [None]:
diff = minimum_energy - ssmin_energy
print(f"Modeled - actual energy difference is: {diff} kcal/mol")

The real structure is actually at a *lower* energy than the predicted! What an amazing Disulfide bond! Let's build a model for its conformation. We do that by creating an empty disulfide and then using the `Disulfide.build_yourself` function.

In [None]:
modelled_min = Disulfide("model")
modelled_min.dihedrals = min_conf
modelled_min.build_yourself()

Now make a ``DisulfideList`` list and put the real structure and modelled structure into it.

In [None]:
minmax = DisulfideList([modelled_min, ssmin], "minmax")

Finally, display them in a common reference frame:

In [None]:
minmax.display_overlay()

The two structures overlap with an overall RMS error of 2.14 A. Not bad considering the modeled structure is made using idealized bond lengths and angles. This serves to remind us that nature is the ultimate modeling engine, and that we still have much to learn.

## References
* *Application of Artificial Intelligence in Protein Design* - Doctoral Dissertation, EG Suchanek, 1987, Johns Hopkins Medical School
* https://doi.org/10.1021/bi00368a023


In [3]:
ss1 = PDB_SS["2q7q_75D_140D"]
ss1.pprint()

<Disulfide 2q7q_75D_140D, Source: 2q7q, Resolution: 1.6 Å 
Χ1-Χ5: -59.36°, -59.28°, -83.66°, -59.82° -59.91°, -25.17°, 0.49 kcal/mol 
Cα Distance: 5.50 Å 
Torsion length: 145.62 deg>


In [4]:
ss1

<Disulfide 2q7q_75D_140D, Source: 2q7q, Resolution: 1.6 Å>

In [6]:
str1 = ss1.pprint()

<Disulfide 2q7q_75D_140D, Source: 2q7q, Resolution: 1.6 Å 
Χ1-Χ5: -59.36°, -59.28°, -83.66°, -59.82° -59.91°, -25.17°, 0.49 kcal/mol 
Cα Distance: 5.50 Å 
Torsion length: 145.62 deg>


In [7]:
str1