The module uses numpy arrays for atomic positions and atomic numbers. You can read structures with ConformerEnsemble.from_xyz. The ensemble object mimics cclib and exposes the "atomcoords" and "atomnos" properties.

In [1]:
from prism_pruner.conformer_ensemble import ConformerEnsemble

ensemble = ConformerEnsemble.from_xyz("ensemble.xyz")
ensemble.coords.shape

(1086, 136, 3)

The three main pruning functions are the following: first, the fastest, a principal moments of inertia-based mode. Similarity between two structures occurs for all three principal MOI under a relative threshold:

In [2]:
from prism_pruner.pruner import prune_by_moment_of_inertia

In [4]:
pruned, mask = prune_by_moment_of_inertia(
    ensemble.coords,
    ensemble.atoms,
    max_deviation=0.01,  # 1% difference
    debugfunction=print,
)

pruned.shape

DEBUG: MOIPrunerConfig - k=50, rejected 449 (keeping 637/1086), in 0.1 s
DEBUG: MOIPrunerConfig - k=20, rejected 109 (keeping 528/1086), in 0.0 s
DEBUG: MOIPrunerConfig - k=10, rejected 27 (keeping 501/1086), in 0.0 s
DEBUG: MOIPrunerConfig - k=5, rejected 28 (keeping 473/1086), in 0.1 s
DEBUG: MOIPrunerConfig - k=2, rejected 38 (keeping 435/1086), in 0.2 s
DEBUG: MOIPrunerConfig - k=1, rejected 10 (keeping 425/1086), in 0.3 s
DEBUG: MOIPrunerConfig - keeping 425/1086 (0.8 s)
DEBUG: MOIPrunerConfig - Used cached data 105595/211707 times, 49.88% of total calls


(425, 136, 3)

Second, a classic heavy atom (non-H) RMSD. Similarity between two structures occurs for both RMSD and maximum atomic deviation (after optimal alignment) under the specified thresholds. If unspecified, the maximum deviation threshold (max_dev) is taken as 2*max_rmsd.

In [4]:
from prism_pruner.pruner import prune_by_rmsd

pruned, mask = prune_by_rmsd(
    ensemble.coords,
    ensemble.atoms,
    max_rmsd=1.0,  # Will reject below 1 Ã… and
    debugfunction=print,
)

pruned.shape

DEBUG: RMSDPrunerConfig - k=50, rejected 389 (keeping 697/1086), in 1.7 s
DEBUG: RMSDPrunerConfig - k=20, rejected 115 (keeping 582/1086), in 1.2 s
DEBUG: RMSDPrunerConfig - k=10, rejected 48 (keeping 534/1086), in 1.6 s
DEBUG: RMSDPrunerConfig - k=5, rejected 66 (keeping 468/1086), in 2.9 s
DEBUG: RMSDPrunerConfig - k=2, rejected 63 (keeping 405/1086), in 5.9 s
DEBUG: RMSDPrunerConfig - k=1, rejected 32 (keeping 373/1086), in 5.0 s
DEBUG: RMSDPrunerConfig - keeping 373/1086 (18.3 s)
DEBUG: RMSDPrunerConfig - Used cached data 107904/207984 times, 51.88% of total calls


(373, 136, 3)

Lastly, another RMSD pruning implementation, this time featuring rotationally corrected results to correct for the artificially high RMSD values for rotameric structures.

In [2]:
from prism_pruner.graph_manipulations import graphize
from prism_pruner.pruner import prune_by_rmsd_rot_corr

graph = graphize(ensemble.atoms, ensemble.coords[0])

pruned, mask = prune_by_rmsd_rot_corr(
    ensemble.coords[0:100],
    ensemble.atoms,
    graph,
    max_rmsd=1.0,
    logfunction=print,
    debugfunction=print,
)

pruned.shape


 >> Dihedrals considered for rotamer corrections:
 1  - [ 8 10 11 12]         : CCCC : 2-fold
 2  - [116  70  14  13]     : CCCC : 2-fold
 3  - [25 26 28 29]         : CCCC : 3-fold
 4  - [ 19  73 111 113]     : CCCF : 3-fold


DEBUG: RMSDRotCorrPrunerConfig - k=2, rejected 96 (keeping 4/100), in 2.5 s
DEBUG: RMSDRotCorrPrunerConfig - k=1, rejected 1 (keeping 3/100), in 0.0 s
DEBUG: RMSDRotCorrPrunerConfig - keeping 3/100 (2.5 s)
DEBUG: RMSDRotCorrPrunerConfig - Used cached data 2/176 times, 1.14% of total calls


(3, 136, 3)