# Finding Potential Structural Relatives by Sequence Similarity using proteusPy
Eric G. Suchanek, PhD 2/23/24

Working under the assumption that similar sequence -> similar structure I generated a query on the lowest energy Disulfide Bond in the RCSB database (2q7q) to return PDB IDs for structures with high sequence similarity. I then use some of the proteusPy functions to find structures with similar disulfide bonds.

In [None]:
#
import pandas as pd
import pyvista as pv
from pyvista import set_plot_theme

from proteusPy import Disulfide, DisulfideList, Load_PDB_SS

# pyvista setup for notebooks
pv.set_jupyter_backend("trame")

set_plot_theme("dark")
LIGHT = True

### Load the RCSB Disulfide Database
We load the database and get its properties as follows:

In [None]:
PDB_SS = Load_PDB_SS(verbose=True)
PDB_SS.describe()

In [None]:
best_ss = PDB_SS["2q7q_75D_140D"]
best_ss.pprint()
best_ss.display(style="sb", light=LIGHT)

I generated a query on: https://www.ebi.ac.uk/pdbe/entry/pdb/2q7q to return PDB IDs for structures with high sequence similarity to 2q7q - the protein with the lowest energy disulfide bond in the RCSB database. This yielded a ```.csv``` file, which we will import below:

In [None]:
ss_df = pd.read_csv("2q7q_seqsim.csv")
ss_df.head(5)

All of the nearest sequence neighbors are sadly, bacterial. Let's extract the unique ids next.

In [None]:
relative_list = ss_df["pdb_id"].unique()
relative_list

We now need to convert the list of PDB IDs into real disulfides from the database. We do this with the ``DisulfideLoader.build_ss_from_idlist()`` function. Next we print out some relevant statistics.


In [None]:
relatives = DisulfideList([], "relatives")
relatives = PDB_SS.build_ss_from_idlist(relative_list)

print(
    f"There are: {relatives.length} related structures.\nAverage Energy: {relatives.Average_Energy:.2f} kcal/mol\nAverage Ca distance: {relatives.Average_Distance:.2f} Å"
)
print(
    f"Average resolution: {relatives.Average_Resolution:.2f} Å \nAverage torsion distance: {relatives.Average_Torsion_Distance:.2f}°"
)

Now let's look at the lowest and highest energy structures in this list of relatives.

In [None]:
ssmin, ssmax = relatives.minmax_energy
duolist = DisulfideList([ssmin, ssmax], "mM")
# duolist.display(style='sb', light=LIGHT)

In [None]:
duolist.display_overlay(light=LIGHT)

The two Disulfides...

We can find disulfides that are conformationally related by using the DisulfideList.nearest_neighbors() function with a dihedral angle cutoff. This cutoff is measure of angular similarity across all five sidechain dihedral angles.

In [None]:
close_neighbors = relatives.nearest_neighbors(
    ssmin.chi1, ssmin.chi2, ssmin.chi3, ssmin.chi4, ssmin.chi5, 10.0
)
close_neighbors.length

In [None]:
# close_neighbors.display_overlay(light=LIGHT)

So now we have the 18 close neighbors of the lowest energy structure.

In [None]:
ssTotList = PDB_SS.SSList
global_neighbors = ssTotList.nearest_neighbors(
    ssmin.chi1, ssmin.chi2, ssmin.chi3, ssmin.chi4, ssmin.chi5, 5.0
)
global_neighbors.length

In [None]:
global_neighbors.display_overlay(light=LIGHT)

In [None]:
from scipy.optimize import minimize
import numpy as np

initial_guess = [
    -60.0,
    -60.0,
    -90.0,
    -60.0,
    -60.0,
]  # initial guess for chi1, chi2, chi3, chi4, chi5
result = minimize(ss_energy_function, initial_guess, method="Nelder-Mead")
minimum_energy = result.fun
inputs = result.x

print(
    f'Minimum Energy: {minimum_energy:.3f} for conformation: {[f"{x:.3f}" for x in inputs]}'
)