
# DisulfideLoader.ipynb
Demonstrating the ways to access Disulfide Bonds using the proteusPy
package.

Author: Eric G. Suchanek, PhD.

Last Modification: 2025-02-14 14:51:45

In [1]:
import proteusPy as pp
from proteusPy import DisulfideList

pp.configure_master_logger("loadertest.log")
_logger = pp.create_logger("__name__", "WARNING")
# pp.set_logger_level("loadertest", "WARNING")

Create an unfiltered ``DisulfideLoader`` with no distance constraints:

In [3]:
pdb = pp.DisulfideLoader(verbose=True, subset=False, cutoff=-1, sg_cutoff=-1)

proteusPy: INFO 2025-02-15 14:59:16,718 - proteusPy.DisulfideLoader.__post_init__ - Reading disulfides from: /Users/egs/miniforge3/envs/ppydev/lib/python3.12/site-packages/proteusPy/data/PDB_all_ss.pkl... 
proteusPy: INFO 2025-02-15 14:59:22,587 - proteusPy.DisulfideLoader.__post_init__ - Filtering with Cα cutoff -1.00: old: 175277, new: 175277
proteusPy: INFO 2025-02-15 14:59:22,589 - proteusPy.DisulfideLoader.__post_init__ - Filtering Sγ: cutoff -1.00: old: 175277, new: 175277
proteusPy: INFO 2025-02-15 14:59:44,299 - proteusPy.DisulfideClassManager.__init__ - Loading binary consensus structure list from SS_consensus_class_32.pkl
proteusPy: INFO 2025-02-15 14:59:44,302 - proteusPy.DisulfideClassManager.__init__ - Loading octant consensus structure list from SS_consensus_class_oct.pkl
proteusPy: INFO 2025-02-15 14:59:44,318 - proteusPy.DisulfideClassManager.build_classes - Creating binary SS classes...
proteusPy: INFO 2025-02-15 14:59:46,886 - proteusPy.DisulfideClassManager.build_cla

PDB IDs present:                 36968
Disulfides loaded:               175277
Average structure resolution:    2.19 Å
Lowest Energy Disulfide:         2q7q_75D_140D
Highest Energy Disulfide:        6vxk_801B_806B
Cα distance cutoff:              -1.00 Å
Sγ distance cutoff:              -1.00 Å
               ===== proteusPy: 0.99.2.dev1 =====


Access to the database is through the ``DisulfideLoader`` object. We can retrieve disulfides by PDB ID, integer indexing or slicing, disulfide name and by disulfide class. These will be illustrated below.

- Access by PDB ID:

In [4]:
pdb["6dmb"]

DisulfideList([<Disulfide 6dmb_203A_226A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 6dmb_234A_327A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 6dmb_296A_304A, Source: 6dmb, Resolution: 3.0 Å>])

- Access by numerical index:

In [5]:
pdb[0]

<Disulfide 6dmb_203A_226A, Source: 6dmb, Resolution: 3.0 Å>

- Access by slice:

In [6]:
pdb[:5]

DisulfideList([<Disulfide 6dmb_203A_226A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 6dmb_234A_327A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 6dmb_296A_304A, Source: 6dmb, Resolution: 3.0 Å>,
               <Disulfide 3rik_4A_16A, Source: 3rik, Resolution: 2.0 Å>,
               <Disulfide 3rik_18A_23A, Source: 3rik, Resolution: 2.0 Å>])

- Access by Disulfide Name:

In [7]:
pdb["6dmb_203A_226A"]

<Disulfide 6dmb_203A_226A, Source: 6dmb, Resolution: 3.0 Å>

- Access by Class Identifier, (specifiy the base explicity with a 'o' or 'b' suffix). If no suffix is included than base=8 is assumed, (to list the classes and their number of elements use the ``DisulfideLoader.print_classes()`` method with appropriate base:

In [None]:
pdb["11212o"]

- Access by Class Identify without a suffix. Octant class is assumed

In [None]:
pdb["11212"]

It's easy to display statistical information about the returned lists as follows:

- Show bond length and bond angle deviations for the first 1000 Disulfides:

In [None]:
subset = pdb[:1000]
_ = subset.plot_deviation_histograms()

- Show the Cα-Cα and Sγ-Sγ distances for the first 1000 Disulfides:

In [None]:
subset.plot_distances(distance_type="ca", cutoff=-1, theme="auto", log=True)

- Show the Ca-Ca distances for the entire database:

In [None]:
pdb.plot_distances(theme="auto", log=True, distance_type="ca")

As you can see, the unfiltered database has a number of disulfides that exceed the maximum possible distance, (~8 A).

We can also display the torsion statistics in several ways:

- Statistics for a slice of disulfides:

In [None]:
pdb[:10].display_torsion_statistics()

- Statistics for a class:
  - To list the binary classIDs use:

In [None]:
pdb.get_class_df(base=2)

Since there are only 32 binary classes the above shows the overall distribution of Disulfides across ALL binary classes. The 8-fold (*octant*) class dataframe is *much* larger, (9697 members) and can be shown with:

```python
    pdb.get_class_df(base=8)
```


  Let's look at one of the binary classes (note: creating long lists of Disulfides takes time. The next cell takes over 12 seconds on my M3 Max MacBook Pro):

In [None]:
pdb["02202b"].display_torsion_statistics(save=False, theme="auto")

The above shows quite large deviations for the dihedral angles. This suggests that the class is very broad in structural diversity. This is to be expected with a coarse structural filter, and was the driving reason to develop the *octant* dihedral angle quantization method.

In [None]:
pdb["11212"].display_torsion_statistics(save=False, theme="auto")

- Statistics for a specific pdbID:

In [None]:
pdb["6dmb"].display_torsion_statistics(theme="auto")

Finally, we can readily display either individual Disulfides or lists of them as follows:

In [None]:
best_ss = pdb["2q7q_75D_140D"]
worst_ss = pdb["6vxk_801B_806B"]
duo = DisulfideList([best_ss, worst_ss], "bestworst")

In [None]:
best_ss.display(style="sb")

We can display the list as multiple panels:

In [None]:
duo.display(style="sb", light="dark")

Or we can display them overlaid onto a common coordinate frame:

In [None]:
duo.display_overlay()