In [None]:
#@title Intro & Install

#@markdown # Aims
#@markdown This notebook takes you through the basics of molecular mechanics,
#@markdown as a primer for docking.

#@markdown # Step 1: Press play!

#@markdown ### Preface on Colab

#@markdown ← Press the button with the play icon to run a _cell_.
#@markdown In this cell: install requirements.

#@markdown This is a _Colab notebook_, a variant of a Jupyter notebook.
#@markdown If you are not in Colab press [this](https://colab.research.google.com/github/matteoferla/Fragment-hit-follow-up-chemistry/blob/main/colab/upload_prep.ipynb).

#@markdown Colab runs in Google's servers, hence why you will get asked
#@markdown to sign in if not done so already.
#@markdown Likewise it will ask if you trust the author (Matteo Ferla),
#@markdown if unsure about whether you should trust anything I do
#@markdown [click here for details](https://www.youtube.com/watch?v=dQw4w9WgXcQ).

#@markdown To inspect code press `show code` ↓

#@markdown Still confused about notebook? Then see the [about_notebooks.md page in the repo](https://github.com/matteoferla/DTC-compchem-practical/blob/main/about_notebooks.md).

#@markdown You will NOT get a prize for pressing `Run all` and going for a coffee.
#@markdown Read the code, google up and use `help(🤖)` where 🤖 is the object.
#@markdown (No, you cannot use emoji as variables in vanilla Python).
#@markdown In this notebook, the imported module `dtc` will have `dtc.show_source(🤖)` will show the source code
#@markdown in a colourful way (it calls `inspect.getsource` a handy function). Please use it.
#@markdown Somethings are hidden in the module within the repo, you can pretend they work by magic

#@markdown # Step 2: Basic concepts
#@markdown While we wait N minutes for conda to install things,
#@markdown we will go through the basics of molecular mechanics in the [theory.md page in the repo](https://github.com/matteoferla/DTC-compchem-practical/blob/main/theory.md).
#@markdown ![mm](https://upload.wikimedia.org/wikipedia/commons/thumb/5/5c/MM_PEF.png/1920px-MM_PEF.png)

#@markdown If you do not want to read anything (one of those days, eh?),
#@markdown then watch the [video in the repo](https://github.com/matteoferla/DTC-compchem-practical/raw/main/media/stefans_animation.mp4) that shows the fragment hits
#@markdown from the Mac1 domain of NSP13 protein of SARS-COV-2 screens that were expanded into strong binders!
#@markdown In the next cell there are questions to answer based on the video though!

# ------------------------------------------------

import os

#source /etc/os-release && echo $PRETTY_NAME
with open('/etc/os-release') as fh:
  for line in fh:
    if line and line[0] != '#':
      os.environ[line.split('=')[0]] = line.split('=')[1]
print(f'Running {os.environ["PRETTY_NAME"]}') # on {os.environ["HOST"]} as {os.environ["USER"]}')

# ------------------------------------------------

#@title Installation
local_debug = False
if local_debug:
    raise Exception('CURRENTLY IN DEBUG MODE.... REMEMBER TO CLEAR ALL CELLS!')

## Install all requirements and get some goodies
!pip install git+https://github.com/matteoferla/DTC-compchem-practical.git
# this will be called as:
# import DTC_compchem_practical as dtc

## Jupyter lab? use `trident-chemwidgets`
!pip install git+https://github.com/matteoferla/JSME_notebook_hack.git
!pip install --upgrade plotly matplotlib

from google.colab import output  # noqa (It's a colaboratory specific repo)
output.enable_custom_widget_manager()

In [None]:
#@title Video summary
#@markdown This video (press play to run), from the Mac1 domain of NSP13 protein of SARS-COV-2 screens summarises perfectly
#@markdown the challenges of a fragment based drug discovery screen.

from IPython.display import HTML, display
display(
    HTML("""
        <video alt="test" controls>
            <source src="https://github.com/matteoferla/DTC-compchem-practical/raw/main/media/stefans_animation.mp4" type="video/mp4">
        </video>
    """)
)

## Questions
So you chose not to read the theory page, eh? Well, let's see if you can answer these questions based on the video.

Please replace alien-monster emoji with your answers. Remember to save or pdf-print your notebook before leaving
to show your grandkids.

> This is an enzyme, which binds to a nucleotide modification on protein. Can you guess where its native substrate binds? Which fragment site would be good for a competitive inhibitor?

👾👾👾

> Do all fragments bind to the protein with the same out of contacts per atom? If not, what is ligand efficiency?

👾👾👾

>  Is the protein rigid? I.e. lock & key Vs. induced fit.

👾👾👾

> Towards the end a loop "everts", reflecting a native product bound state vs. a native substrate bound state. Would repacking sidechains model this change?

👾👾👾

> Would you reckon, the apo structure choice affect docking results?

👾👾👾

> Some compounds have labels like `ZINC922`. What is Zinc?

👾👾👾

> The theoretical merger has a ester bond, while the analogue in make-on-demand space has an amide. Can you guess why?

👾👾👾

> What makes a lead, "drug-like"? (cf. [Lipinski rule_of 5](https://en.wikipedia.org/wiki/Lipinski%27s_rule_of_five) ) And is a carboxyl group good? What is a bioisostere?

👾👾👾

> Is synthetic accessibility really important, why?

👾👾👾

> Is chirality easy to make with _synthetic chemistry_ (not biocatalysis)?

👾👾👾

> Different isomers of racemic compounds bind in different poses within the same crystal. Do these ligands both bind together in the same protein macromolecule at the same time, sequentially in time or on different macromolecules in the crystal lattice?

👾👾👾

## Step 3: Playing with molecules in RDKit

Let's play with a molecule. We will use RDKit, a cheminformatics library.
It is the most popular cheminformatics library in Python.

Go to Wikipedia and search for a molecules and copy its SMILES from the infobox.

If you want a top searched molecule from Wikipedia, [here is an analysis](https://github.com/matteoferla/Wikipedian-compounds) with SMILES in the csv file. For SMILES of common placeholders see the [last section of this blog post](https://www.blopig.com/blog/2023/08/placeholder-compounds-distraction-vs-accuracy/).

In [None]:
#@markdown Input name and SMILES of molecule
mol_name: str =  '👾👾👾'   #@param {type:"string"}
smiles: str =  '👾👾👾'   #@param {type:"string"}
if local_debug:
    mol_name: str =  'caffeine'   #@param {type:"string"}
    smiles: str =  'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'   #@param {type:"string"}

In [None]:
## Let's load it into RDKit and display it
# paying attention to the code

from IPython.display import display
from rdkit import Chem

#RDKit can and will misbehave in Colab unless this line is called to activate it
from rdkit.Chem.Draw import IPythonConsole

mol: Chem.Mol = Chem.MolFromSmiles(smiles)
mol.SetProp('_Name', mol_name)
display(mol)

In [None]:
#@markdown Let's look at its partial charges (Gasteiger-Marsili).
#@markdown Partial charges are a way to model the electron density of a molecule.
#@markdown In molecular mechanics, as used in docking, they are generally precomputed before the run during parameterisation.

from rdkit.Chem.Draw import SimilarityMaps
from rdkit.Chem import AllChem, Draw
from PIL import Image
from io import BytesIO
#@markdown via `rdkit.Chem.AllChem.ComputeGasteigerCharges`
AllChem.ComputeGasteigerCharges(mol)
contribs = [a.GetDoubleProp('_GasteigerCharge') for a in mol.GetAtoms()]
# old way
# fig = SimilarityMaps.GetSimilarityMapFromWeights(mol, contribs, colorMap='jet', contourLines=10)
d = Draw.MolDraw2DCairo(400, 400)
SimilarityMaps.GetSimilarityMapFromWeights(mol, contribs, draw2d=d)
d.FinishDrawing()
Image.open( BytesIO(d.GetDrawingText()) )

## Questions

> what is 'RDKit'?

👾👾👾

> What is a SMILES (or 'SMILEString')?

👾👾👾

> What does the function `AllChem.ComputeGasteigerCharges` do?

👾👾👾

> In addition to Marsili-Gasteiger partial charges, there is another form of partial charges, what is it?

👾👾👾

In [None]:
#@markdown As a rewarded for acing those questions. Here is a fun function:
#@markdown adding meme gifs to your outputs, because ... ehr... science?
#@markdown Ask a demonstrator to reward you with the password to unlock the fun!
#@markdown Please do not tell the serious people of this cell or Matteo will get in trouble.

password = '👾👾👾' #@param {type:"string"}
token = b'gAAAAABjhNPjteatHENaWRl63ddoEY6VRJCegYVx1ROsLwDi-4NOIXJ3gr8pYffO42OGVzJDlr0qg_F_1Sbh1RP9W5jzyyW12diw58RDj2ydR96o0kIJVPGbl7JMWJy0BjOiowGnvPET'
import DTC_compchem_practical as dtc
import os
os.environ['GIPHY_API'] = dtc.decrypt(token, password)

from gist_import import GistImporter

retrieve_giphy = GistImporter('25a53d54d3ba65919610bb34b188ac67')['retrieve_giphy']

retrieve_giphy('protein')

In [None]:
#@markdown Let's alter the molecule and see how the partial charges change.
from jsme_hack.rdkit import JSMERDKit
import ipywidgets as widgets

button = widgets.Button(description="Calculate", icon='fire')
output = widgets.Output(layout={'border': '1px solid black'})
display(button, output)
jsme = JSMERDKit(mol)
def on_click(remove:bool):
    alt_mol = jsme.mol
    AllChem.ComputeGasteigerCharges(alt_mol)
    contribs = [a.GetDoubleProp('_GasteigerCharge') for a in alt_mol.GetAtoms()]
    with output:
        d = Draw.MolDraw2DCairo(400, 400)
        SimilarityMaps.GetSimilarityMapFromWeights(mol, contribs, draw2d=d)
        d.FinishDrawing()
        Image.open( BytesIO(d.GetDrawingText()) )

button.on_click(on_click)
on_click(False)

In [None]:
#@title Connectivity
#@markdown A molecule is represented as a graph. The atoms are the nodes and the bonds are the edges.
#@markdown How is your molecule connected?

from typing import Tuple, List, Iterator
import operator

for i, atom in enumerate(mol.GetAtoms()):  #: Tuple[int, Chem.Atom]
    # neighbours are atoms it is bonded to
    neigh: Tuple[Chem.Atom] = atom.GetNeighbors()
    neigh_i: Tuple[int] = tuple(map(Chem.Atom.GetIdx, neigh))
    bonds: Tuple[Chem.Bond] = (mol.GetBondBetweenAtoms(i, n) for n in neigh_i)
    # bondtype is an enum `Chem.Bond.GetBondType().name`
    bondtypes: Iterator[Chem.BondType] = map(Chem.Bond.GetBondType, bonds)
    bondtype_names: Tuple[str] = tuple(map(operator.attrgetter('name'), bondtypes))
    print(f'{atom.GetSymbol()} atom (index {i}) is connected to {neigh_i} via {bondtype_names}')

In [None]:
#@title Conformer generation
#@markdown A molecule is not only its connectivity, but also its 3D shape, the conformer.
#@markdown Each node has an associated 3-dimensional vector in Cartesian space.

from rdkit.Chem import rdDistGeom, rdMolAlign, Draw

# Add Hs
hydromol = AllChem.AddHs(mol)

# Generate a single conformer
mol3d = Chem.Mol(hydromol) # making a copy
AllChem.EmbedMolecule(mol3d)
mol3d.SetProp('_Name', 'single-conf')

# Generate a 2D representation
mol2d = Chem.Mol(hydromol) # making a copy
AllChem.Compute2DCoords(mol2d)
mol2d.SetProp('_Name', '2D-repr')

# Generate multiple unique conformers
multimol = Chem.Mol(hydromol)
numconf: int = 10
param = rdDistGeom.ETKDGv2()
param.pruneRmsThresh = 0.1
cids = rdDistGeom.EmbedMultipleConfs(multimol, numconf, param)
mp = AllChem.MMFFGetMoleculeProperties(multimol, mmffVariant='MMFF94s')
AllChem.MMFFOptimizeMoleculeConfs(multimol, numThreads=0, mmffVariant='MMFF94s')
multimol.SetProp('_Name', 'multi-conf')
print(f'{multimol.GetNumConformers()} conformers made out of {numconf}')

mols = [mol2d, mol3d]
for i, conf in enumerate(multimol.GetConformers()):
    m = Chem.Mol(hydromol)
    m.AddConformer(conf)
    m.SetProp('_Name', f'conf_{i}')
    mols.append(m)
display(Draw.MolsToGridImage(mols,
                             legends=[mol.GetProp('_Name') if mol.HasProp('_Name') else '-' for mol in mols],
                             subImgSize=(200,200), useSVG=True,
                             molsPerRow=4))

In [None]:
#@title Test: distort molecule and Constrained MMFF
import nglview as nv
from rdkit.Chem import rdMolAlign
from io import StringIO
from gist_import import GistImporter
from typing import Callable
align_mols: Callable[[Chem.Mol, Chem.Mol], None]= GistImporter('e9399caa07a9206d05330b0c5aca6ec1')['align_mols']


from rdkit.Geometry import Point3D

def shift_atom(mol: Chem.Mol, atom_idx:0, x_offset: float=0, y_offset: float=0, z_offset: float=0):
    """
    Shifts atom indexed ``atom_idx`` in ``conf`` by ``(atom_idx)``
    """
    conf: Chem.Conformer = mol.GetConformer()
    p: Point3D = conf.GetAtomPosition(atom_idx)
    # let's shift it by 2Å on atom 0 on axis x
    new_p: Point3D = Point3D(p.x + x_offset, p.y + y_offset, p.z + z_offset)
    conf.SetAtomPosition(atom_idx, new_p)
    mol.GetAtomWithIdx(atom_idx).SetBoolProp('Fixed', True)

# make 3D
mol2 = AllChem.AddHs(mol)
AllChem.EmbedMolecule(mol2)
## ----------------------------------------------------------------------------

#@markdown Run this a few times and change the constraints!
# Please tinker with these values:
shift_atom(mol2, atom_idx=0, x_offset=2, z_offset=5)
shift_atom(mol2, atom_idx=1, y_offset=-1)
mol2.GetAtomWithIdx(3).SetBoolProp('Fixed', True)
# in the next block `AllChem.MMFFGetMoleculeForceField.MMFFAddPositionConstraint` will be called
# constrain these.
## ----------------------------------------------------------------------------

Chem.SanitizeMol(mol2)

p = AllChem.MMFFGetMoleculeProperties(mol2, 'MMFF94')
if p is None:
    raise ValueError(f'MMFF cannot work on a molecule that has errors!')

ff = AllChem.MMFFGetMoleculeForceField(mol2, p)
# restrain
for atom in mol2.GetAtomsMatchingQuery(Chem.rdqueries.HasPropQueryAtom('Fixed', negate=False)):
    # Atom cannot move beyond 2 Aangstroems
    ff.MMFFAddPositionConstraint( atom.GetIdx(), 2, 1e5)
pre: float = ff.CalcEnergy()
outcomes = {-1: 'MMFF Minisation could not be started',
            0: 'MMFF Minisation was successful',
            1: 'MMFF Minisation was run, but the minimisation was not unsuccessful'}
try:
    m: int = ff.Minimize()
    print(outcomes.get(m, "Iä! Iä! Cthulhu fhtagn! Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn"))
except RuntimeError as error:
    print(f'MMFF minimisation failed {error.__class__.__name__}: {error}')

post: float = ff.CalcEnergy()
print(f'∆G went from {pre} to {post} kcal/mol')
#rdMolAlign.AlignMol(mol2, mol)
#align_mols(mol2, mol)
view = nv.show_rdkit(mol2)
# fh = StringIO(Chem.MolToPDBBlock(mol))
# view.add_component(fh, ext='mol')

display(view)

In [None]:
#@title Get fingerprints
#@markdown `AllChem.GetMorganFingerprint` will generate the Morgan fingerprints of a molecule.
#@markdown see [Morgan fingerprinting](https://www.rdkit.org/docs/GettingStartedInPython.html#feature-definitions-used-in-the-morgan-fingerprints) for more.
#@markdown This is helpful for substructure searching, but also for clustering molecules.
#@markdown Chemical space is vast. Catalogue space and synthetically accessible space are a tiny fraction of it.
#@markdown So to navigate chemical space, we need to easy compare them.
#@markdown In docking, you will cannot simply dock everything. You will need to filter out molecules that are too similar to each other.

from rdkit import Chem
from rdkit.Chem import AllChem

info = {}
morgan = AllChem.GetMorganFingerprintAsBitVect(mol,3,nBits=1024,bitInfo=info)
#@markdown We now have a series of 0s and 1s (1024 of them) to describe each 3-atom chunk, a fingerprint, in the molecule
print('fingerprint bitmap: ', morgan.ToBitString()[:20]+'...'+morgan.ToBitString()[-20:],
      'N bits: ', len(morgan.ToBitString()),
       'N non-zero elements: ', morgan.GetNumOnBits())
print('The substructures are: ', info)

## Questions

> Why is there a drop in Gibbs free energy (a potential) after minimisation? (Unusure? Next notebook will explain it further)

👾👾👾

> How does this relate to entropy of the system (think chelate effect)? (Unusure? Next notebook will explain it further)

👾👾👾

> What does `AllChem.EmbedMolecule` do?

👾👾👾

> What command does a 2D representation?

👾👾👾

> What is the difference between a `Chem.Mol` instance and its `Chem.Conformer(s)`? And do both store cartesian atomic positions?

👾👾👾

> Thinking of neural networks... Does the order of the atoms matter? Does rototranslating a whole ligand-protein complex change the binding energy? If not, would this transition and order invariance have an effect in certain NNs? Are there two ways a way to avoid this? (hint: Google equivariance)

👾👾👾

## Synthetic accessibility & make-on-demand space
Not all compounds can be made easily or are available from a vendor
Fragments are not bought like large common chemicals from Sigma, Fischer, Alfa Aesar or German-Merck etc.
Instead they are bought in low mg quantities from Enamine, WuXi or ChemDiv among specialist vendors.
These can be divided into off-the-shelf or make-on-demand.
The former can be search using for example [www.molport.com](https://www.molport.com/shop/find-chemicals).
While the latter can be a very large space, the prime example of this is
Enamine REAL, which has 5 billion molecules, while their Store has 1.7 million.
But even then 5e6 does not cover the whole of possible chemistry. The site [zinc.docking.org](https://zinc.docking.org/substances/home/) collates the former and some of the latter.

Go to [sw.docking.org](https://sw.docking.org/) and try drawing a molecule.
Also try picking a _natural compound_ from within say [Wiki category: Flavonoids](https://en.wikipedia.org/wiki/Category:Flavonoids)
and copy the SMILES from the infobox into Smallworld. Just because it exists does not mean you can buy it.

## Questions

> What is the SMILES of the compound you searched in SmallWorld? And what was the distance to a purchasable analogue?

👾👾👾

> Did you find a molecule that has a Wikipedia page but is not in EnamineREAL DB?

👾👾👾

## Rototranslations
Rototranslations are the 3D transformations that can be applied to a molecule.
A (4,4) matrix is used to represent them, this encodes all affine transformations,
including translations, rotations, reflections, scaling and shearing.
The latter three do not count for molecules!
Actually this means that we will be doing extra song-and-dance due to operations on this form of matrix, but it is worth it for didactic purposes...
We will also use radians to max out on the confusion.

Successive steps of rototranslations can be combined by matrix multiplication.
To rotate a molecule by 90° around the z-axis around a point,
one would first translate the molecule so that the point is at the origin,
then rotate it and then translate it back.
This is done by multiplying the matrices together.

For theory see this [helpful guide](https://www.euclideanspace.com/maths/geometry/affine/matrix4x4/index.htm).
Do note that there are two types of transformations: [active/alibi and passive/alias](https://en.wikipedia.org/wiki/Active_and_passive_transformation) and PyMol, NGL etc. will use one or the other —you might get stuck in the future by this, so hopefully this passing note will help future-you!

In [None]:
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
from rdkit import Geometry

def transform(conformer: Chem.Conformer, affine_matrix):
    """
    Applies a 4x4 affine matrix to a conformer.
    """
    for i in range(conformer.GetNumAtoms()):
        position: Geometry.Point3D = conformer.GetAtomPosition(i)
        # we make the 3D vector homogeneous by appending a 1, the homogenous component
        # one for position, zero for direction
        position_homogeneous: npt.NDArray[np.float64] = np.append(list(position), 1)
        transformed: npt.NDArray[np.float64] = np.dot(affine_matrix, position_homogeneous)
        # we no longer need the homogenous component!
        transformed_position = transformed[:3]
        conformer.SetAtomPosition(i, transformed_position)

# Test molecule
# trans-Anethole is the main "aroma" component of anise oil and fennel oil
# don't like anise? The pick a different one!
original = AllChem.AddHs(Chem.MolFromSmiles('CC=CC1=CC=C(C=C1)OC'))
AllChem.EmbedMolecule(original)

# easy peasy
translation_matrix = np.array([
    [1, 0, 0, 1],  # Translation along x-axis by 1 unit
    [0, 1, 0, 0],  # No translation along y-axis
    [0, 0, 1, 0],  # No translation along z-axis
    [0, 0, 0, 1]   # Homogeneous coordinate
])

mol = Chem.Mol(original)
transform(mol.GetConformer(), translation_matrix)

import py3Dmol
import DTC_compchem_practical as dtc

view: py3Dmol.view = dtc.get_mols_view(whiteCarbon=original, yellowCarbon=mol)
view.show()

In [None]:
# Rotation
# That was not complicated. Let's do a rotation.

# first let's do some trigonometry
def create_rotation_matrix(axis: Sequence, angle: float):
    """
    Create a 4x4 affine matrix for rotation.

    :param axis: The axis of rotation (x, y, z).
    :param angle: The angle of rotation in radians.
    :return: 4x4 affine transformation matrix.
    """
    # Normalize the axis vector
    axis = np.asarray(axis)
    axis = axis / np.sqrt(np.dot(axis, axis))
    # Compute the cosine and sine of the angle
    c = np.cos(angle)
    s = np.sin(angle)
    C = 1 - c
    # Compute the components of the rotation matrix
    x, y, z = axis
    r11 = x*x*C + c
    r12 = x*y*C - z*s
    r13 = x*z*C + y*s
    r21 = y*x*C + z*s
    r22 = y*y*C + c
    r23 = y*z*C - x*s
    r31 = z*x*C - y*s
    r32 = z*y*C + x*s
    r33 = z*z*C + c
    # Construct the rotation matrix
    rotation_matrix = np.array([
        [r11, r12, r13, 0],
        [r21, r22, r23, 0],
        [r31, r32, r33, 0],
        [0,   0,   0,   1]
    ])
    return rotation_matrix

def create_translation_matrix(x: float=0, y: float=0, z: float=0):
    """
    Create a 4x4 affine matrix for translations.

    :param x: ...
    :param y: ...
    :param z: ...
    :return: 4x4 affine transformation matrix.
    """
    # identity_matrix
    v = np.array([x, y, z])
    translation_matrix = np.identity(4)
    translation_matrix[0:3, 3] = v
    return translation_matrix

def get_centroid(conf: Chem.Conformer) -> npt.NDArray[np.float64]:
    """
    geometric center of all the atoms' coordinates
    Not centre of mass
    """
    return sum(conf.GetPositions()) / conf.GetNumAtoms()

In the previous cell we defined a few functions:

* `cen: npt.NDArray[np.float64] = get_centroid(mol.GetConformer())`
* `translation_by_one: npt.NDArray[np.float64] = create_translation_matrix(x=1)`
* `rotation_by_90: npt.NDArray[np.float64] = create_rotation_matrix([1,0,0], angle=np.pi / 2)`

Let's use them to rotate the molecule by 90° around the x-axis around a point.
As mentioned, we will first translate the molecule so that the point is at the origin,
then rotate it and then translate it back.

But oh no! Matteo accidentally deleted the cell with the code to do it!
[Demonstrators seem answers.md]

In [None]:
# first let's move the molecule by 5 Å along the x-axis
transform(original.GetConformer(), create_translation_matrix(x=5) )
print(get_centroid(original.GetConformer()))

# then let's make a copy and rotate it by 90° around the x-axis
mol = Chem.Mol(original)

👾👾👾
👾👾👾
👾👾👾
# combine the affine transform matrices by matrix multiplication
rotation_on_spot: npt.NDArray[np.float64] = np.matmul(np.matmul(👾👾👾,👾👾👾), 👾👾👾)
transform(mol.GetConformer(), rotation_on_spot )

dtc.get_mols_view(whiteCarbon=original, yellowCarbon=mol).show()

# ## Questions

> What does a translation look like in a 4x4 matrix?
👾👾👾
> A dot product between a 3x1 vector and a 4x4 matrix is not possible, what is the missing detail?
👾👾👾
> How do you rotate a molecule around a point?
👾👾👾

## RMSD

The root mean square deviation (RMSD) is a measure of the average distance between the atoms of two molecules.
There are four things to note:

* Do you want the conformers to be superposed first or what it in place? Always be careful of this!
* Mean of squares is more or less sensitive to bigger values and say regular arithmetic mean?
* Are isomorphism taken into account? (Flip a benzene ring and it is the same molecule bar for atom numbering and naming)
* Atom mapping are always important

"align" --> overlay two sequences (of matched atoms) pairwise
"superpose" --> overlay with rototranslations
"superimpose" --> overlay with translations only (i.e. avoid using this word)
👾👾👾

In [None]:
print('In place: ', AllChem.CalcRMS(original, mol))
print('Superimposed: ', AllChem.GetBestRMS(original, mol))

## Crude docking

Using the Mac1 from the video, let's dock them into the protein.
Normally a classic testbed is BRD4 but the video was fun.

Normally, data polishing does not happen by magic.
Template choice is important, generally the native substrate-bound structure stripped the substrate is best.
Using the ligand-bound structure removed of the ligand for docking the ligand is very much cheating
as you normally don't have that structure...
This dataset was created for this [blog post](https://www.blopig.com/blog/2023/11/the-workings-of-fragmensteins-rdkit-neighbour-aware-minimisation/).

Gibbs free energy (G) is enthalpy (H) minus temperature times entropy (TS).
Enthalpy is internal energy (U) plus pressure times volume (PV).
The logarithm of binding as measured in K_D (pKD) is proportional to Gibbs free energy over Boltzmann's constant (k) times temperature (T).
Therefore, the internal energy (U) is not the same as K_D and there is entropy (S) to consider.
Yes, from this you will have spotted that K_D is temperature dependent, but please don't say it aloud or you will make cheminformatitians cry.

Here will simply extract the neighbouring residues in RDKit and "freeze" them.
A frozen particle in MM parlance is a particle that is not allowed to move.
Crystallographers call it constrained, while they call harmonically constrained particles restrained (Coot) or tethered.

In [None]:
import pkg_resources

template_block: str = pkg_resources.resource_string('DTC_compchem_practical', 'data/mac1-stripped.pdb')
hit_block: str = pkg_resources.resource_string('DTC_compchem_practical', 'data/QRU.mol')
hit: Chem.Mol = Chem.MolFromMolBlock(hit_block)

# as we are in a rush we will use RDKit's MMFF, in the afternoon we will use OpenMM.
# hack: we will use the neighbourhood
from fragmenstein import Monster
neighborhood = Monster.get_neighborhood(None, template_block, radius=10, mol=hit)

# let's see what the neighbourhood looks like
view = dtc.get_protein_view(template_block, resn='HOH')
dtc.add_mols(view, CyanCarbon=hit, whiteCarbon=neighborhood)
view.show()

### Get internal energy

RDKit has the MMFF94 forcefield (Merck Molecular ForceField).
The whole thing can be called to fix a molecules via `AllChem.MMFFOptimizeMolecule`,
but often one may want to play with the inner workings.
This system is not perfect, but does parameterisation under the hood, which is great for now.
Say one has

`atom_indices_to_freeze: List[int]` and `atom_indices_to_tether: List[int]`
and `system: Chem.Mol`, one can do the following:

```python
p: AllChem.ForceField.MMFFMolProperties = AllChem.MMFFGetMoleculeProperties(system, 'MMFF94')
ff: AllChem.ForceField.ForceField = AllChem.MMFFGetMoleculeForceField(system, p)
ff.Initialize()
for i in atom_indices_to_freeze:
      ff.AddFixedPoint(i)
for i in atom_indices_to_tether:
      ff.MMFFAddPositionConstraint(i, maxDispl=max_displacement, forceConstant=constraint)
outcome: int = ff.Minimize()
U: float = ff.CalcEnergy() if outcome == 0 else np.nan
```

Where `ff.MMFFAddPositionConstraint` adds a bounded harmonic constrain that is zero within `ff_max_displacement`
and a `constraint` time the square of the displacement outside of it.

So arming ourselves with

* the `neighborhood` hack from above
* the `ff.CalcEnergy()` function, so we can now calculate the internal energy of the system,
* the knowlage that a negative potential is good
* the function `AllChem.EmbedMultipleConfs(mol, numConfs=10)` to generate conformers

And cheating by using:
* the empirical confomer of `hit` from the crystal structure
* the function `AllChem.CalcRMS` to get the in place RMSD of the conformers

Let's do a crude docking. Doing the following:

* Devise a system to keep track of operation and results (e.g. `dict` or `DataClass`)
* Cheat by getting the centroid vector of `hit`
* Make a copy of `hit` (call it say `vc` for virtual compound)
* Generate _n_ conformers of `vc`
* make them a `List[Chem.Mol]` (copy `vc`, `copied.RemoveAllConformers`, add a confomer of `vc` to `copied.AddConformer`) —this is a hack, but it is fine for now
* Move each conformer to the centroid of `hit`
* Combine each mol with `neighborhood` (`AllChem.CombineMols`)
* Get the internal energy and store information
* Making copies, exhaustively rotating in place by first 90° around the x-axis, the y-axis, z-axis
* See how the best compares to the crystal structure (in place RMSD)

In [None]:
# First we need double check if it's true that the MMFF works in place, and see what happens if we change the conformer after initialisation, which could save us time.
# Did the internal energy get worse (positive)?

from rdkit.Chem import AllChem
from rdkit import Chem, Geometry

mol = Chem.MolFromSmiles('CCO')

AllChem.EmbedMolecule(mol)

p: AllChem.ForceField.MMFFMolProperties = AllChem.MMFFGetMoleculeProperties(mol, 'MMFF94')
ff: AllChem.ForceField.ForceField = AllChem.MMFFGetMoleculeForceField(mol, p)
ff.Initialize()
print('Before... ', ff.CalcEnergy())
mol.GetConformer().SetAtomPosition(0, Geometry.Point3D(10,0,0))
print('After... ', ff.CalcEnergy())

Notice anything?

The energy was not negative. `AllChem.MMFFOptimizeMoleculeConfs` could be useful here on the initial compound.
(Don't do it on the combined neighbourhood as this defeats the purpose of the excercise)

RDKit complained: `AllChem.AddHs` (rightfully so!) RDkit and MM in generally will treat impossible chemistry like a triple radical as valid but the partial charges will be wrong along with the atom type allocation.
If you know that RDKit is wrong:

```python
from rdkit import RDLogger
logger = RDLogger.logger()
logger.setLevel(RDLogger.CRITICAL)
```

# Let's do it!

Damn. Something went wrong and _you_ will have to do the manual docking.
Remember Google, StackOverflow and ChatGTP exists but so does `help` function,
and asking!
_Enjoy!_

In [None]:
👾👾👾
👾👾👾
👾👾👾

# Concluding question

The above was an exhaustive _crude_ search. How would you do a more intelligent search?
(cryptic hint: what would Andrey Markov do in a Monégasque casino?)

👾👾👾