For the analysis, we will use fragment screen data from [Fragalysis](https://fragalysis.diamond.ac.uk/),
the app that provides an interface to the various datasets in XChem, prof Frank von Delft's group at Diamond.
For what is what consult [this table](https://github.com/matteoferla/munged-Fragalysis-targets/blob/main/targets.md).
In this practical we will be using it for the data, but you are welcome to explore it.
You will be shown it properly in the Diamond visit.
Additionally, a key idea is fragment binding sites are no way of equal important to a researcher,
i.e. designing an inhibitor for an enzyme requires knowledge of where and how catalysis occurs.
This is also beyond the scope of this practical but worth keeping in mind.

## Jupyter basics
A notebook is composed of 'cells'.
This is a markdown cell, it is for words.
The code cells can be run by pressing the play button at the top or shift+Enter.
You can freely change things as you will have to in the next cell.

In [None]:
# What target did you pick?
target_name = 'MID2A'

# =====================================================================
from typing import Dict
from rdkit import Chem
from rdkit.Chem import PandasTools
import pandas as pd
from misc_funs import download_fragalysis  # misc_funs is the local module misc_funs.py

# download all molecules of that target
filenames: Dict[str, str] = download_fragalysis(target_name, 'input')
pdb_filename: str = filenames['reference.pdb']
metadata_filename: str = filenames['metadata.csv']
sdf_filename: str = filenames['combined.sdf']
# make a combined table
# Fragalysis does not give attributes in the sdf entries. This is instead stored in metadata.csv.

mol_df = pd.concat([PandasTools.LoadSDF(sdf_filename).set_index('ID'),
                       pd.read_csv(metadata_filename, index_col=0).set_index('crystal_name')
                      ], axis=1)
mol_df.to_pickle(f'input/{target_name}_df.p')
mol_df

In [None]:
with open(pdb_filename) as fh:
    pdb_block:str = fh.read()
    
pdb_block = '\n'.join(filter(lambda l: 'HETATM' not in l , pdb_block.split('\n')))

with open(f'input/{target_name}_reference.clean.pdb', 'w') as fh:
    fh.write(pdb_block)

In [None]:
import nglview as nv
import io
import numpy as np

max_show = 100
df = mol_df
#df = mol_df.drop_duplicates('site_name')  # filter if wanted
    
view = nv.NGLWidget()

view.add_component(io.StringIO(pdb_block), ext='pdb')
for mol in df.ROMol[:max_show]:
    fh = io.StringIO(Chem.MolToMolBlock(mol))
    view.add_component(fh, ext='mol')
#view.control.zoom(1.)
view

In [None]:
from misc_funs import calc_distances, distance_heatmap
from rdkit.Chem import PandasTools, rdShapeHelpers, rdmolops, Descriptors
import plotly.graph_objects as go

# filter if wanted
df = mol_df
#df = mol_df.drop_duplicates('site_name')

# this is not working!
#from misc_funs import distance_heatmap
#distance_heatmap( df )

mols = df.ROMol.apply(rdmolops.AddHs)
matrix: np.array = calc_distances(mols)
go.Figure(data=go.Heatmap(
                            x = mols.apply(lambda m: m.GetProp('_Name')).tolist(),
                            y = mols.apply(lambda m: m.GetProp('_Name')).tolist(),
                            z = matrix,
                            colorscale = 'hot'
                        )
         )



In [None]:
## Questions

* How many molecules are there?
* How many sites?
* If you have a dimer, what do you see as a problem?
* What data would you like to see in the above table and why?
* The above simply gets the molecular replacement template as the target. Is that wise?
* The table has a `site_name` column. What would be a good approach to choose what sites to focus on?