The module uses numpy arrays for atomic positions and atomic numbers. You can read structures with ConformerEnsemble.from_xyz. The ensemble object usesd ASE variable names, exposing the "coords" and "atoms" properties.

In [1]:
from prism_pruner.conformer_ensemble import ConformerEnsemble

ensemble = ConformerEnsemble.from_xyz("ensemble.xyz")
ensemble.coords.shape

(1086, 136, 3)

The simplest way to prune your ensemble is via the prune function, which chains the execution of the two main pruning routines with reasonble default thresholds.

In [None]:
from prism_pruner.pruner import prune

pruned, mask = prune(
    ensemble.coords,
    ensemble.atoms,
    # the third pruning routine can be
    # slow and it is off by default
    rot_corr_rmsd_pruning=False,
    debugfunction=print,
)

DEBUG: MOIPrunerConfig - k=50, rejected 449 (keeping 637/1086), in 0.1 s
DEBUG: MOIPrunerConfig - k=20, rejected 109 (keeping 528/1086), in 0.0 s
DEBUG: MOIPrunerConfig - k=10, rejected 27 (keeping 501/1086), in 0.1 s
DEBUG: MOIPrunerConfig - k=5, rejected 28 (keeping 473/1086), in 0.1 s
DEBUG: MOIPrunerConfig - k=2, rejected 38 (keeping 435/1086), in 0.2 s
DEBUG: MOIPrunerConfig - k=1, rejected 10 (keeping 425/1086), in 0.4 s
DEBUG: MOIPrunerConfig - keeping 425/1086 (1.0 s)
DEBUG: MOIPrunerConfig - Used cached data 105595/211707 times, 49.88% of total calls

DEBUG: RMSDPrunerConfig - k=20, rejected 13 (keeping 412/425), in 0.7 s
DEBUG: RMSDPrunerConfig - k=10, rejected 7 (keeping 405/425), in 0.9 s
DEBUG: RMSDPrunerConfig - k=5, rejected 9 (keeping 396/425), in 1.4 s
DEBUG: RMSDPrunerConfig - k=2, rejected 5 (keeping 391/425), in 4.5 s
DEBUG: RMSDPrunerConfig - k=1, rejected 4 (keeping 387/425), in 6.4 s
DEBUG: RMSDPrunerConfig - keeping 387/425 (14.0 s)
DEBUG: RMSDPrunerConfig - Use

The three main pruning functions are the following: first, the fastest, a principal moments of inertia-based mode. Similarity between two structures occurs for all three principal MOI under a relative threshold:

In [None]:
from prism_pruner.pruner import prune_by_moment_of_inertia

pruned, mask = prune_by_moment_of_inertia(
    ensemble.coords,
    ensemble.atoms,
    max_deviation=0.01,  # 1% relative difference
    debugfunction=print,
)

DEBUG: MOIPrunerConfig - k=50, rejected 449 (keeping 637/1086), in 0.1 s
DEBUG: MOIPrunerConfig - k=20, rejected 109 (keeping 528/1086), in 0.1 s
DEBUG: MOIPrunerConfig - k=10, rejected 27 (keeping 501/1086), in 0.2 s
DEBUG: MOIPrunerConfig - k=5, rejected 28 (keeping 473/1086), in 0.1 s
DEBUG: MOIPrunerConfig - k=2, rejected 38 (keeping 435/1086), in 0.2 s
DEBUG: MOIPrunerConfig - k=1, rejected 10 (keeping 425/1086), in 0.5 s
DEBUG: MOIPrunerConfig - keeping 425/1086 (1.2 s)
DEBUG: MOIPrunerConfig - Used cached data 105595/211707 times, 49.88% of total calls


Second, a classic RMSD implementation (by default, using "heavy" non-H atoms only) . Similarity between two structures, after optimal alignment, occurs for both RMSD and maximum atomic deviation under the specified thresholds. If unspecified, the maximum deviation threshold (max_dev) is taken as 2*max_rmsd.

In [14]:
from prism_pruner.pruner import prune_by_rmsd

pruned, mask = prune_by_rmsd(
    ensemble.coords,
    ensemble.atoms,
    max_rmsd=1.0,  # 1 Å
    debugfunction=print,
)

DEBUG: RMSDPrunerConfig - k=50, rejected 389 (keeping 697/1086), in 1.6 s
DEBUG: RMSDPrunerConfig - k=20, rejected 115 (keeping 582/1086), in 1.6 s
DEBUG: RMSDPrunerConfig - k=10, rejected 48 (keeping 534/1086), in 1.8 s
DEBUG: RMSDPrunerConfig - k=5, rejected 66 (keeping 468/1086), in 3.0 s
DEBUG: RMSDPrunerConfig - k=2, rejected 63 (keeping 405/1086), in 7.4 s
DEBUG: RMSDPrunerConfig - k=1, rejected 32 (keeping 373/1086), in 4.4 s
DEBUG: RMSDPrunerConfig - keeping 373/1086 (19.9 s)
DEBUG: RMSDPrunerConfig - Used cached data 107904/207984 times, 51.88% of total calls


Lastly, another RMSD pruning implementation, this time featuring rotationally corrected results to correct for the artificially high RMSD values for rotameric structures.

In [15]:
from prism_pruner.graph_manipulations import graphize
from prism_pruner.pruner import prune_by_rmsd_rot_corr

graph = graphize(ensemble.atoms, ensemble.coords[0])

pruned, mask = prune_by_rmsd_rot_corr(
    ensemble.coords[0:100],
    ensemble.atoms,
    graph,
    max_rmsd=1.0,  # 1 Å
    logfunction=print,
    debugfunction=print,
)


 >> Dihedrals considered for rotamer corrections:
 1  - [ 8 10 11 12]         : CCCC : 2-fold
 2  - [116  70  14  13]     : CCCC : 2-fold
 3  - [25 26 28 29]         : CCCC : 3-fold
 4  - [ 19  73 111 113]     : CCCF : 3-fold


DEBUG: RMSDRotCorrPrunerConfig - k=2, rejected 97 (keeping 3/100), in 0.9 s
DEBUG: RMSDRotCorrPrunerConfig - k=1, rejected 2 (keeping 1/100), in 0.0 s
DEBUG: RMSDRotCorrPrunerConfig - keeping 1/100 (0.9 s)
DEBUG: RMSDRotCorrPrunerConfig - Used cached data 1/123 times, 0.81% of total calls
