The module uses numpy arrays for atomic positions and atomic numbers. You can read structures with read_xyz. The ensemble object mimics cclib and exposes the "atomcoords" and "atomnos" properties.

In [1]:
from prism_pruner.utils import read_xyz

ensemble = read_xyz("ensemble.xyz")
ensemble.atomcoords.shape

(1086, 136, 3)

The three main pruning functions are the following: first, the fastest, a principal moments of inertia-based mode. Similarity between two structures occurs for all three principal MOI under a relative threshold:

In [None]:
from prism_pruner.pruner import prune_by_moment_of_inertia

pruned, mask = prune_by_moment_of_inertia(
    ensemble.atomcoords,
    ensemble.atomnos,
    max_deviation=0.01,  # 1% difference
    debugfunction=print,
)

pruned.shape

DEBUG: Pruner(moi) - k=50, rejected 428 (keeping 658/1086)
DEBUG: Pruner(moi) - k=20, rejected 116 (keeping 542/1086)
DEBUG: Pruner(moi) - k=10, rejected 37 (keeping 505/1086)
DEBUG: Pruner(moi) - k=5, rejected 34 (keeping 471/1086)
DEBUG: Pruner(moi) - k=2, rejected 49 (keeping 422/1086)
DEBUG: Pruner(moi) - k=1, rejected 15 (keeping 407/1086)
DEBUG: prune_by_moment_of_inertia - Used cached data 105197/207415 times, 50.72% of total calls


(407, 136, 3)

Second, a classic heavy atom (non-H) RMSD. Similarity between two structures occurs for both RMSD and maximum atomic deviation (after optimal alignment) under the specified thresholds. If unspecified, the maximum deviation threshold (max_dev) is taken as 2*max_rmsd.

In [None]:
from prism_pruner.pruner import prune_by_rmsd

pruned, mask = prune_by_rmsd(
    ensemble.atomcoords,
    ensemble.atomnos,
    max_rmsd=1.0,  # Will reject below 1 Å and
    debugfunction=print,
)

pruned.shape

DEBUG: Pruner(rmsd) - k=50, rejected 294 (keeping 792/1086)
DEBUG: Pruner(rmsd) - k=20, rejected 120 (keeping 672/1086)
DEBUG: Pruner(rmsd) - k=10, rejected 41 (keeping 631/1086)
DEBUG: Pruner(rmsd) - k=5, rejected 52 (keeping 579/1086)
DEBUG: Pruner(rmsd) - k=2, rejected 38 (keeping 541/1086)
DEBUG: Pruner(rmsd) - k=1, rejected 10 (keeping 531/1086)
DEBUG: prune_by_rmsd - Used cached data 159042/320757 times, 49.58% of total calls


(531, 136, 3)

Lastly, another RMSD pruning implementation, this time featuring rotationally corrected results to correct for the artificially high RMSD values for rotameric structures.

In [None]:
from prism_pruner.graph_manipulations import graphize
from prism_pruner.pruner import prune_by_rmsd_rot_corr

graph = graphize(ensemble.atomnos, ensemble.atomcoords[0])

pruned, mask = prune_by_rmsd_rot_corr(
    ensemble.atomcoords,
    ensemble.atomnos,
    graph,
    max_rmsd=1.0,
    logfunction=print,
    debugfunction=print,
)

pruned.shape


 >> Dihedrals considered for rotamer corrections:
 1  - [8, 10, 11, 12]       : CCCC : 2-fold
 2  - [116, 70, 14, 13]     : CCCC : 2-fold
 3  - [25, 26, 28, 29]      : CCCC : 3-fold
 4  - [19, 73, 111, 113]    : CCCF : 3-fold


DEBUG: Pruner(rmsd_rot_corr) - k=50, rejected 545 (keeping 541/1086)
DEBUG: Pruner(rmsd_rot_corr) - k=20, rejected 118 (keeping 423/1086)
DEBUG: Pruner(rmsd_rot_corr) - k=10, rejected 59 (keeping 364/1086)
DEBUG: Pruner(rmsd_rot_corr) - k=5, rejected 63 (keeping 301/1086)
DEBUG: Pruner(rmsd_rot_corr) - k=2, rejected 63 (keeping 238/1086)
DEBUG: Pruner(rmsd_rot_corr) - k=1, rejected 18 (keeping 220/1086)
DEBUG: prune_by_rmsd_rot_corr - Used cached data 51207/94488 times, 54.19% of total calls


(220, 136, 3)