For the analysis, we will use fragment screen data from [Fragalysis](https://fragalysis.diamond.ac.uk/),
the app that provides an interface to the various datasets in XChem, prof Frank von Delft's group at Diamond.
For what is what consult [this table](https://github.com/matteoferla/munged-Fragalysis-targets/blob/main/targets.md).
In this practical we will be using it for the data, but you are welcome to explore it.
You will be shown it properly in the Diamond visit.
Additionally, a key idea is fragment binding sites are no way of equal important to a researcher,
i.e. designing an inhibitor for an enzyme requires knowledge of where and how catalysis occurs.
This is also beyond the scope of this practical but worth keeping in mind.

## Jupyter basics
A notebook is composed of 'cells'.
This is a markdown cell, it is for words, the next is for code.
The code cells can be run by pressing the play button at the top or shift+Enter.
You can freely change things as you will have to in the next cell.

You have several varieties of notebooks:

* (original=vanilla) Jupyter notebook, runs on your machine
* Jupyter lab, newer version of the above
* Colab, run on Google cloud
* Jupyter hub, the multiuser version of Jupyter

Not sure how to use a python object? Make a cell a type `help(🤖)` where 🤖 is the object.
(No, you cannot use emoji as variables in Python).
In this notebook, the imported module `dtc` will have `dtc.show_source(🤖)` will show the source code in a colourful way.

In [None]:
#@title Installation
#@markdown Press the play button on the top right hand side of this cell
#@markdown once you have checked the settings.
#@markdown You will be notified that this notebook is not from Google, that is normal.

## Install all requirements and get some goodies
!pip install git+https://github.com/matteoferla/DTC-compchem-practical.git
# this will be called as:
# import DTC_compchem_practical as dtc

## Jupyter lab? use `trident-chemwidgets`
!pip install git+https://github.com/matteoferla/JSME_notebook_hack.git
!pip install --upgrade plotly

from google.colab import output  # noqa (It's a colaboratory specific repo)
output.enable_custom_widget_manager()

In [None]:
#@title RDKit
#@markdown Let's play with a molecule.
#@markdown Go to Wikipedia and
#@markdown search for a molecules and copy it's SMILEString from the infobox.

mol_name: str =  '👾👾👾'   #@param {type:"string"}
smiles: str =  '👾👾👾'   #@param {type:"string"}
#@markdown **Q**: what is 'RDKit'?
#@markdown **Q**: what is a 'SMILEString'?

In [None]:
#@markdown Let's load it into RDKit:
from IPython.display import display
from rdkit import Chem

#RDKit can and will misbehave in Colab unless this line is called to activate it
from rdkit.Chem.Draw import IPythonConsole

mol: Chem.Mol = Chem.MolFromSmiles(smiles)
mol.SetProp('_Name', mol_name)
display(mol)

In [None]:
#@markdown Let's look at its partial charges
from rdkit.Chem.Draw import SimilarityMaps
from rdkit.Chem import AllChem
#@markdown via `rdkit.Chem.AllChem.ComputeGasteigerCharges`
AllChem.ComputeGasteigerCharges(mol)
contribs = [a.GetDoubleProp('_GasteigerCharge') for a in mol.GetAtoms()]
#@markdown PS. If you dislike the colours,
#@markdown choose a different [colorMap](https://matplotlib.org/stable/tutorials/colors/colormaps.html)
fig = SimilarityMaps.GetSimilarityMapFromWeights(alt_mol, contribs, colorMap='jet', contourLines=10)

display(fig)
#@markdown **Q**: In addition to Marsili-Gasteiger partial charges, there is another form of partial charges, what is it?

In [None]:
## Conformer generation

In [None]:
#@markdown Test: distort molecule and Constrained MMFF
import nglview as nv
from rdkit.Chem import rdMolAlign
from io import StringIO


from rdkit.Geometry import Point3D

def shift_atom_x(conf: Chem.Conformer, atom_idx:0, x_offset: float=0, y_offset: float=0, z_offset: float=0):
    """
    Shifts atom indexed ``atom_idx`` in ``conf`` by ``(atom_idx)``
    """
    p: Point3D = conf.GetAtomPosition(atom_idx)
    # let's shift it by 2Å on atom 0 on axis x
    new_p: Point3D = Point3D(p.x + x_offset, p.y + y_offset, p.z + z_offset)
    conf.SetAtomPosition(atom_idx, new_p)

# make 3D
mol2 = AllChem.AddHs(mol)
AllChem.EmbedMolecule(mol2)
conf = mol2.GetConformer()

#@markdown Run this and the next cell a few times and change the constraints!
# Please tinker with these values:
shift_atom_x(conf, atom_idx=0, x_offset=2)
shift_atom_x(conf, atom_idx=1, y_offset=1)


## ----------------------------------------------------------------------------

Chem.SanitizeMol(mol2)

p = AllChem.MMFFGetMoleculeProperties(mol2, 'MMFF94')
if p is None:
    raise ValueError(f'MMFF cannot work on a molecule that has errors!')

ff = AllChem.MMFFGetMoleculeForceField(mol2, p)
# restrain
for atom in mol.GetAtomsMatchingQuery(Chem.rdqueries.HasPropQueryAtom('Fixed', negate=False)):
    i = atom.GetIdx()
    # Atom cannot move beyond 2 Aangstroems
    ff.MMFFAddPositionConstraint(i, 2, 10)
pre: float = ff.CalcEnergy()
outcomes = {-1: 'MMFF Minisation could not be started',
            0: 'MMFF Minisation was successful',
            1: 'MMFF Minisation was run, but the minimisation was not unsuccessful'}
try:
    m: int = ff.Minimize()
    print(outcomes.get(m, "Iä! Iä! Cthulhu fhtagn! Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn"))
except RuntimeError as error:
    print(f'MMFF minimisation failed {error.__class__.__name__}: {error}')

post: float = ff.CalcEnergy()
print(f'∆G went from {pre} to {post} kcal/mol')
rdMolAlign.AlignMol(mol2, mol)
view = nv.show_rdkit(mol)
fh = StringIO(Chem.MolToPDBBlock(mol2))
view.add_component(fh, ext='mol')

display(view)
#@markdown **Q**: Why is there a drop in Gibbs free energy (a potential)?
#@markdown **Q**: How does this relate to entropy of the system (think chelate effect)?
#@markdown **Q**: What does `AllChem.EmbedMolecule` do?
#@markdown **Q**: What is the difference between a Chem.Mol instance and its Chem.Conformer(s)? Why does the latter have atomic positions?

In [None]:
#@title Download off Fragalysis
#@markdown Choose a target
target_name = '👾👾👾'   #@param {type:"string"}

from rdkit import Chem
from IPython.display import display
from typing import Dict
import DTC_compchem_practical as dtc

#@markdown This will add the variables `pdb_filename`, `metadata_filename` and `sdf_filename`.
filenames: Dict[str, str] = dtc.download_fragalysis(target_name, 'input')
pdb_filename: str = filenames['reference.pdb']
metadata_filename: str = filenames['metadata.csv']
sdf_filename: str = filenames['combined.sdf']

In [None]:
import nglview as nv
import io
import numpy as np

max_show = 100
df = mol_df
#df = mol_df.drop_duplicates('site_name')  # filter if wanted
    
view = nv.NGLWidget()

view.add_component(io.StringIO(pdb_block), ext='pdb')
for mol in df.ROMol[:max_show]:
    fh = io.StringIO(Chem.MolToMolBlock(mol))
    view.add_component(fh, ext='mol')
#view.control.zoom(1.)
view

In [None]:
from misc_funs import calc_distances, distance_heatmap
from rdkit.Chem import PandasTools, rdShapeHelpers, rdmolops, Descriptors
import plotly.graph_objects as go

# filter if wanted
df = mol_df
#df = mol_df.drop_duplicates('site_name')

# this is not working!
#from misc_funs import distance_heatmap
#distance_heatmap( df )

mols = df.ROMol.apply(rdmolops.AddHs)
matrix: np.array = calc_distances(mols)
go.Figure(data=go.Heatmap(
                            x = mols.apply(lambda m: m.GetProp('_Name')).tolist(),
                            y = mols.apply(lambda m: m.GetProp('_Name')).tolist(),
                            z = matrix,
                            colorscale = 'hot'
                        )
         )



## Questions

* How many small molecules are there?
* How many sites?
* If you have a dimer, what do you see as a problem?
* What data would you like to see in the above table and why?
* The above simply gets the molecular replacement template as the target. Is that wise?
* The table has a `site_name` column. What would be a good approach to choose what sites to focus on?