## Bonus
This section will have likely broken with newer versions of Fragalysis.
If it does see [the 2021 XChem practical](https://github.com/xchem/strucbio_practical)

## Visit Fragalysis
[Fragalysis](https://fragalysis.diamond.ac.uk/) is a site run by XChem @ Diamond that shows publicly and interactively fragment screen data. We will visit the synchrotron next Thursday.

For the next part you will need to choose a target.

In [None]:
#@markdown Let's look at a summary of the targets from
#@markdown https://github.com/matteoferla/munged-Fragalysis-targets/blob/main/targets.csv
#@markdown Note that many viruses express their protein as a polyprotein that gets cleaved off.
import requests
from io import StringIO
import pandas as pd

response: requests.Response = requests.get('https://github.com/matteoferla/munged-Fragalysis-targets/raw/main/targets.csv')
response.raise_for_status()

targets: pd.DataFrame = pd.read_csv( StringIO(response.text) )

In [None]:
#@markdown Let's find the col_name w/ number of hits for sorting in pandas.
#@markdown (Note: pandas and panDDA are different things)
col_name: str = 'ðŸ‘¾ðŸ‘¾ðŸ‘¾' #@param {type:"string"}
if local_debug:
    col_name = 'N_hits'

targets.sort_values(col_name, ascending=False)

In [None]:
#@title Download off Fragalysis
#@markdown Choose a target
target_name = 'ðŸ‘¾ðŸ‘¾ðŸ‘¾'   #@param {type:"string"}

from rdkit import Chem
from IPython.display import display
from typing import Dict
import DTC_compchem_practical as dtc

#@markdown This will add the variables `pdb_filename`, `metadata_filename` and `sdf_filename`.
filenames: Dict[str, str] = dtc.download_fragalysis(target_name, 'input')
pdb_filename: str = filenames['reference.pdb']
metadata_filename: str = filenames['metadata.csv']
sdf_filename: str = filenames['combined.sdf']

In [None]:
import nglview as nv
import io
import numpy as np
import DTC_compchem_practical as dtc

max_show = 50
mols =list(Chem.SDMolSupplier(sdf_filename))[:max_show]
dtc.display_mols(mols)

# ----------------------------------

view = nv.NGLWidget()

view.add_component(pdb_filename, ext='pdb')
for mol in mols:
    fh = io.StringIO(Chem.MolToMolBlock(mol))
    view.add_component(fh, ext='mol')
#view.control.zoom(1.)
view

In [None]:
from rdkit.Chem import PandasTools, rdShapeHelpers, rdmolops, Descriptors
import plotly.graph_objects as go

mol_series = pd.Series({mol.GetProp('_Name'): mol for mol in mols})
matrix: np.array = dtc.calc_distance_heatmap(mol_series.apply(rdmolops.AddHs))
go.Figure(data=go.Heatmap(
                            x = mols.apply(lambda m: m.GetProp('_Name')).tolist(),
                            y = mols.apply(lambda m: m.GetProp('_Name')).tolist(),
                            z = matrix,
                            colorscale = 'hot'
                        )
         )

In [None]:
## Questions

> How many small molecules are there?

ðŸ‘¾ðŸ‘¾ðŸ‘¾

> How many sites?

ðŸ‘¾ðŸ‘¾ðŸ‘¾

> If you have a dimer, what do you see as a problem?

ðŸ‘¾ðŸ‘¾ðŸ‘¾

> What data would you like to see in the above table and why?

ðŸ‘¾ðŸ‘¾ðŸ‘¾


> What does Google say the RDKit command to do so is? (Remember than with a `pd.Series` you have the `apply` method, eg. `df.ROMol.apply(Descriptors.ExactMolWt)`

ðŸ‘¾ðŸ‘¾ðŸ‘¾

> The above simply gets the molecular replacement template as the target. Is that wise?

ðŸ‘¾ðŸ‘¾ðŸ‘¾

> The table has a `site_name` column. What would be a good approach to choose what sites to focus on? (Not for now, i.e. remember the adage 'a week in the lab, saves you an hour in the library').

ðŸ‘¾ðŸ‘¾ðŸ‘¾