
# DisulfideLoader.ipynb
Demonstrating the ways to access Disulfide Bonds using the proteusPy
package.

Author: Eric G. Suchanek, PhD.

Last Modification: 2025-02-15 15:00:50

In [None]:
import proteusPy as pp
from proteusPy.DisulfideBase import DisulfideList

pp.configure_master_logger("loadertest.log")
_logger = pp.create_logger("__name__", "WARNING")

Create an unfiltered ``DisulfideLoader`` with no distance constraints:

In [None]:
#pdb = pp.DisulfideLoader(verbose=True, subset=False, cutoff=-1, sg_cutoff=-1)
pdb = pp.Load_PDB_SS(subset=False, cutoff=-1, sg_cutoff=-1, verbose=True)

Access to the database is through the ``DisulfideLoader`` object. We can retrieve disulfides by PDB ID, integer indexing or slicing, disulfide name and by disulfide class. These will be illustrated below.

- Access by PDB ID:

In [None]:
pdb["6dmb"]

- Access by numerical index:

In [None]:
pdb[0]

- Access by slice:

In [None]:
pdb[:5]

- Access by Disulfide Name:

In [None]:
pdb["6dmb_203A_226A"]

- Access by Class Identifier, (specifiy the base explicity with a 'o' or 'b' suffix). If no suffix is included than base=8 is assumed, (to list the classes and their number of elements use the ``DisulfideLoader.print_classes()`` method with appropriate base:

In [None]:
pdb["11212"]

- Access by Class Identifier without a suffix. Octant class is assumed

In [None]:
pdb.quiet = True
pdb.verbose = True

In [None]:
pdb["12212"]

It's easy to display statistical information about the returned lists as follows:

- Show bond length and bond angle deviations for the first 1000 Disulfides:

In [None]:
subset = pdb[:1000]
_ = subset.plot_deviation_histograms()

- Show the Cα-Cα and Sγ-Sγ distances for the first 1000 Disulfides:

In [None]:
subset.plot_distances(distance_type="ca", cutoff=-1, theme="auto", log=True)

- Show the Ca-Ca distances for the entire database:

In [None]:
pdb.plot_distances(theme="auto", log=True, distance_type="ca")

As you can see, the unfiltered database has a number of disulfides that exceed the maximum possible distance, (~8 A).

We can also display the torsion statistics in several ways:

- Statistics for a slice of disulfides:

In [None]:
pdb[:10].display_torsion_statistics()

- Statistics for a class:
  - To list the binary classIDs use:

In [None]:
pdb.get_class_df(base=2)

Since there are only 32 binary classes the above shows the overall distribution of Disulfides across ALL binary classes. The 8-fold (*octant*) class dataframe is *much* larger, (9697 members) and can be shown with:

```python
    pdb.get_class_df(base=8)
```


  Let's look at one of the binary classes (note: creating long lists of Disulfides takes time. The next cell takes over 12 seconds on my M3 Max MacBook Pro):

In [None]:
pdb["02202b"].display_torsion_statistics(save=False, theme="auto")

The above shows quite large deviations for the dihedral angles. This suggests that the class is very broad in structural diversity. This is to be expected with a coarse structural filter, and was the driving reason to develop the *octant* dihedral angle quantization method.

In [None]:
pdb["11212"].display_torsion_statistics(save=False, theme="auto")

In [None]:
pdb["6dmb"].display_torsion_statistics(theme="auto")

Finally, we can readily display either individual Disulfides or lists of them as follows:

In [None]:
best_ss = pdb["2q7q_75D_140D"]
worst_ss = pdb["6vxk_801B_806B"]
duo = DisulfideList([best_ss, worst_ss], "bestworst")

In [None]:
best_ss.display(style="sb")

We can display the list as multiple panels:

In [None]:
duo.display(style="sb", light="dark")

Or we can display them overlaid onto a common coordinate frame:

In [None]:
duo.display_overlay()

## Class Plotting

In [None]:
# most prevelent
LHSpiral_neg = "00000"
RHSpiral_neg = "02220"

# Allosteric
RHStaple_neg = "00200"

In [None]:
pdb.plot_classes_vs_cutoff(0.04, steps=50, base=8, theme="auto", verbose=False)

In [None]:
pdb.plot_binary_to_eightclass_incidence(verbose=True, save=False, theme="auto")

In [None]:
pdb.plot_count_vs_class_df(
    RHStaple_neg, title="RHStaple_neg (Allosteric)", theme="auto", log=False
)

In [None]:
pdb.plot_count_vs_class_df(
    LHSpiral_neg,
    title="LHSpiral_neg (Most Common)",
    theme="auto",
    log=False,
)

In [None]:
pdb.plot_count_vs_class_df_paginated(LHSpiral_neg, title="LHSpiral_neg", theme="auto")

In [None]:
pdb.plot_classes(LHSpiral_neg)