
# DisulfideLoader.ipynb
Demonstrating the ways to access Disulfide Bonds using the proteusPy
package.

Author: Eric G. Suchanek, PhD.

Last Modification: 2025-02-15 15:00:50

In [8]:
import proteusPy as pp
from proteusPy.DisulfideBase import DisulfideList

pp.configure_master_logger("loadertest.log")
_logger = pp.create_logger("__name__", "WARNING")

Create an unfiltered ``DisulfideLoader`` with no distance constraints:

In [9]:
#pdb = pp.DisulfideLoader(verbose=True, subset=False, cutoff=-1, sg_cutoff=-1)
pdb = pp.Load_PDB_SS(subset=False, cutoff=-1, sg_cutoff=-1, verbose=True)


proteusPy: INFO 2025-02-15 17:40:25,425 - proteusPy.DisulfideLoader.Load_PDB_SS - Reading disulfides from: /Users/egs/miniforge3/envs/ppydev/lib/python3.12/site-packages/proteusPy/data/PDB_SS_ALL_LOADER.pkl...
proteusPy: INFO 2025-02-15 17:40:35,763 - proteusPy.DisulfideLoader.Load_PDB_SS - Done reading disulfides from: /Users/egs/miniforge3/envs/ppydev/lib/python3.12/site-packages/proteusPy/data/PDB_SS_ALL_LOADER.pkl...


PDB IDs present:                 36968
Disulfides loaded:               175277
Average structure resolution:    2.19 Å
Lowest Energy Disulfide:         2q7q_75D_140D
Highest Energy Disulfide:        6vxk_801B_806B
Cα distance cutoff:              -1.00 Å
Sγ distance cutoff:              -1.00 Å
               ===== proteusPy: 0.99.2.dev1 =====


Access to the database is through the ``DisulfideLoader`` object. We can retrieve disulfides by PDB ID, integer indexing or slicing, disulfide name and by disulfide class. These will be illustrated below.

- Access by PDB ID:

In [10]:
pdb["6dmb"]

DisulfideList([<Disulfide 6dmb_203A_226A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 6dmb_234A_327A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 6dmb_296A_304A, Source: 6dmb, Resolution: 3.0 Å>])

- Access by numerical index:

In [11]:
pdb[0]

<Disulfide 6dmb_203A_226A, Source: 6dmb, Resolution: 3.0 Å>

- Access by slice:

In [12]:
pdb[:5]

DisulfideList([<Disulfide 6dmb_203A_226A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 6dmb_234A_327A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 6dmb_296A_304A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 3rik_4A_16A, Source: 3rik, Resolution: 2.0 Å>,
               <Disulfide 3rik_18A_23A, Source: 3rik, Resolution: 2.0 Å>])

- Access by Disulfide Name:

In [13]:
pdb["6dmb_203A_226A"]

<Disulfide 6dmb_203A_226A, Source: 6dmb, Resolution: 3.0 Å>

- Access by Class Identifier, (specifiy the base explicity with a 'o' or 'b' suffix). If no suffix is included than base=8 is assumed, (to list the classes and their number of elements use the ``DisulfideLoader.print_classes()`` method with appropriate base:

In [14]:
pdb["11212o"]

  0%|          | 0/5 [00:00<?, ?it/s]

DisulfideList([<Disulfide 3c34_202B_256B, Source: 3c34, Resolution: 1.0 Å>,
               <Disulfide 3c36_202B_256B, Source: 3c36, Resolution: 1.0 Å>,
               <Disulfide 4uip_195A_207A, Source: 4uip, Resolution: 2.0 Å>,
               <Disulfide 7nxd_561B_570B, Source: 7nxd, Resolution: 4.0 Å>,
               <Disulfide 7mqs_196E_207E, Source: 7mqs, Resolution: 4.0 Å>])

- Access by Class Identify without a suffix. Octant class is assumed

In [15]:
pdb["11212"]

  0%|          | 0/5 [00:00<?, ?it/s]

DisulfideList([<Disulfide 3c34_202B_256B, Source: 3c34, Resolution: 1.0 Å>,
               <Disulfide 3c36_202B_256B, Source: 3c36, Resolution: 1.0 Å>,
               <Disulfide 4uip_195A_207A, Source: 4uip, Resolution: 2.0 Å>,
               <Disulfide 7nxd_561B_570B, Source: 7nxd, Resolution: 4.0 Å>,
               <Disulfide 7mqs_196E_207E, Source: 7mqs, Resolution: 4.0 Å>])

It's easy to display statistical information about the returned lists as follows:

- Show bond length and bond angle deviations for the first 1000 Disulfides:

In [16]:
subset = pdb[:1000]
_ = subset.plot_deviation_histograms()

                                                                   

- Show the Cα-Cα and Sγ-Sγ distances for the first 1000 Disulfides:

In [17]:
subset.plot_distances(distance_type="ca", cutoff=-1, theme="auto", log=True)

- Show the Ca-Ca distances for the entire database:

In [18]:
pdb.plot_distances(theme="auto", log=True, distance_type="ca")

As you can see, the unfiltered database has a number of disulfides that exceed the maximum possible distance, (~8 A).

We can also display the torsion statistics in several ways:

- Statistics for a slice of disulfides:

In [19]:
pdb[:10].display_torsion_statistics()

- Statistics for a class:
  - To list the binary classIDs use:

In [20]:
pdb.get_class_df(base=2)

Unnamed: 0,class_id,count,incidence,percentage
0,0,40943,0.23359,23.359026
1,2,9391,0.053578,5.357805
2,20,4844,0.027636,2.763626
3,22,2426,0.013841,1.384095
4,200,16146,0.092117,9.211705
5,202,1396,0.007965,0.796454
6,220,7238,0.041295,4.129464
7,222,6658,0.037986,3.798559
8,2000,7104,0.04053,4.053013
9,2002,8044,0.045893,4.589307


Since there are only 32 binary classes the above shows the overall distribution of Disulfides across ALL binary classes. The 8-fold (*octant*) class dataframe is *much* larger, (9697 members) and can be shown with:

```python
    pdb.get_class_df(base=8)
```


  Let's look at one of the binary classes (note: creating long lists of Disulfides takes time. The next cell takes over 12 seconds on my M3 Max MacBook Pro):

In [21]:
pdb["02202b"].display_torsion_statistics(save=False, theme="auto")

  0%|          | 0/1021 [00:00<?, ?it/s]

The above shows quite large deviations for the dihedral angles. This suggests that the class is very broad in structural diversity. This is to be expected with a coarse structural filter, and was the driving reason to develop the *octant* dihedral angle quantization method.

In [22]:
pdb["11212"].display_torsion_statistics(save=False, theme="auto")

  0%|          | 0/5 [00:00<?, ?it/s]

In [23]:
pdb["6dmb"].display_torsion_statistics(theme="auto")

Finally, we can readily display either individual Disulfides or lists of them as follows:

In [24]:
best_ss = pdb["2q7q_75D_140D"]
worst_ss = pdb["6vxk_801B_806B"]
duo = DisulfideList([best_ss, worst_ss], "bestworst")

In [25]:
best_ss.display(style="sb")

Widget(value='<iframe src="http://localhost:62098/index.html?ui=P_0x300d3b440_0&reconnect=auto" class="pyvista…

We can display the list as multiple panels:

In [26]:
duo.display(style="sb", light="dark")

Widget(value='<iframe src="http://localhost:62098/index.html?ui=P_0x303a0fc20_1&reconnect=auto" class="pyvista…

Or we can display them overlaid onto a common coordinate frame:

In [27]:
duo.display_overlay()

Widget(value='<iframe src="http://localhost:62098/index.html?ui=P_0x3026518b0_2&reconnect=auto" class="pyvista…

## Class Plotting

In [28]:
# most prevelent
LHSpiral_neg = "00000"
RHSpiral_neg = "02220"

# Allosteric
RHStaple_neg = "00200"

In [29]:
pdb.plot_classes_vs_cutoff(0.04, steps=50, base=8, theme="auto", verbose=False)

In [30]:
pdb.plot_binary_to_eightclass_incidence(verbose=True, save=False, theme="auto")

proteusPy: INFO 2025-02-15 17:41:09,910 - proteusPy.DisulfideVisualization.plot_binary_to_eightclass_incidence - Graph generation complete.


In [31]:
pdb.plot_count_vs_class_df(
    RHStaple_neg, title="RHStaple_neg (Allosteric)", theme="auto", log=False
)

In [33]:
pdb.plot_count_vs_class_df(
    LHSpiral_neg,
    title="LHSpiral_neg (Most Common)",
    theme="auto",
    log=False,
)

In [34]:
pdb.plot_count_vs_class_df_paginated(LHSpiral_neg, title="LHSpiral_neg", theme="auto")