Skip to content

Latest commit

 

History

History
42 lines (25 loc) · 3 KB

mrmsd.md

File metadata and controls

42 lines (25 loc) · 3 KB

Combined RMSD

This submodule calculate a multiple RMSD variant which does not superpose/'align' and bases which atoms to use on a given pairing, by default this which atoms were donors in Fragmenstein.

It requires RDKit but not pyrosetta and can be used independently of Fragmenstein core functionality.

  • mRMSD(followup: Chem.Mol, hits: Sequence[Chem.Mol], mappings: List[List[Tuple[int, int]]]) the mappings is a list of len(hits) containing lists of tuples of atom idx that go from followup to hit
  • mRMSD.from_annotated_mols(annotated_followup: Chem.Mol, hits: Sequence[Chem.Mol]) The annotated variant requires the mol to have the _Origin Chem.Atom props.
  • mRMSD.from_unannotated_mols(moved_followup: Chem.Mol, hits: Sequence[Chem.Mol], placed_followup: Chem.Mol) the positional mapping is (re)calculated
  • mRMSD.from_other_annotated_mols(followup: Chem.Mol, hits: Sequence[Chem.Mol], annotated: Chem.Mol) uses the second case, but mRMSD.copy_origins(annotated, followup) is called first.

It is a multiple RMSD, that is basically a N_atom weighted "2-mean" of RMSDs.

To properly discuss this, it is best to recap some maths.

An Euclidean distance (or 2-norm) between the vectors a and b, representing two atom positions, is the square root of the sum of the squared differences of each element/axis-position

Which can be better written

An RMSD between matrices A and B, representing two arrays atom positions, is the square root of the average of the squared Euclidean distances. If a single atom pair were compared it would nothing more than the Euclidean distance.

where N is the number of atom pairs compared, n is the index of a given atom pair and i is simply a dimension in space (x, y, z).

So to extend the RMSD to multiple hits one can extend the pre-squared average to include all atoms pairs. One fudgey way of writing it is:

where H is the number of hits. Note that A and B are just fudges for example purposes and they the atom pairs for one hit will differ in number between one and the next. Written properly it would be the same as regular RMSD except A and B are the matrix concatenations for each hit

This means that atoms in the followup compound are re-used in the metrix as they will appear in multiple pairings.