# Docking basics

This a notebook intended to be run in Colab. This is notebook 2.

1. Intro to RDKit: [![colab demo](https://img.shields.io/badge/Run_RDKit_intro-f9ab00?logo=googlecolab)](https://colab.research.google.com/github/matteoferla/DTC-compchem-practical/blob/main/1-basics.ipynb) — Overview of RDKit functionality
2. Intro to Forcefields & docking: [![colab demo](https://img.shields.io/badge/Run_Docking_intro-f9ab00?logo=googlecolab)](https://colab.research.google.com/github/matteoferla/DTC-compchem-practical/blob/main/2-redocking.ipynb) — Overview of forcefields in PyRosetta and redocking
3. Merging: [![colab demo](https://img.shields.io/badge/Run_RDKit_intro-f9ab00?logo=googlecolab)](https://colab.research.google.com/github/matteoferla/DTC-compchem-practical/blob/main/3-merging.ipynb) — Expample of merging

## Overview
In this notebook we will use PyRosetta to glean the basics of forcefields and understand what happens in a docking protocol.
Finally we will dock small molecules with known bound structure in order to compare the results with the empirical data.

For the analysis, we will use fragment screen data from [Fragalysis](https://fragalysis.diamond.ac.uk/),
the app that provides an interface to the various datasets in XChem, prof Frank von Delft's group at Diamond.
For what is what consult [this table](https://github.com/matteoferla/munged-Fragalysis-targets/blob/main/targets.md).
In this practical we will be using it for the data, but you are welcome to explore it.
You will be shown it properly in the Diamond visit.
Additionally, a key idea is fragment binding sites are no way of equal important to a researcher,
i.e. designing an inhibitor for an enzyme requires knowledge of where and how catalysis occurs.
This is also beyond the scope of this practical but worth keeping in mind.

In [None]:
#@title Installation
local_debug = True
if local_debug:
    raise Exception('CURRENTLY IN DEBUG MODE.... REMEMBER TO CLEAR ALL CELLS!')
#@markdown Press the play button on the top right hand side of this cell
#@markdown once you have checked the settings.
#@markdown You will be notified that this notebook is not from Google, that is normal.

## Install all requirements and get some goodies
!pip install git+https://github.com/matteoferla/DTC-compchem-practical.git
# this will be called as:
# import DTC_compchem_practical as dtc

## Jupyter lab? use `trident-chemwidgets`
!pip install git+https://github.com/matteoferla/JSME_notebook_hack.git
!pip install --upgrade plotly

# The next line is only valid for today without the Odin+Eduroam network
# ie. your IP address is one of these https://help.it.ox.ac.uk/ip-addresses#collapse2202811
!pip install https://www.stats.ox.ac.uk/~ferla/pyrosetta-2022.46+release.f0c6fca0e2f-cp39-cp39-linux_x86_64.whl
# Normally you have different ways of installing pyrosetta, e.g.
# pip install pyrosetta_help
# PYROSETTA_USERNAME=👾👾👾 PYROSETTA_PASSWORD=👾👾👾 install_pyrosetta

from google.colab import output  # noqa (It's a colaboratory specific repo)
output.enable_custom_widget_manager()

In [None]:
#@title Download off Fragalysis
#@markdown Choose a target
target_name = '👾👾👾'   #@param {type:"string"}
if local_debug:
    target_name = 'MID2A'

from rdkit import Chem
from IPython.display import display
from typing import Dict
import DTC_compchem_practical as dtc

#@markdown This will add the variables `pdb_filename`, `metadata_filename` and `sdf_filename`.
filenames: Dict[str, str] = dtc.download_fragalysis(target_name, 'input')
pdb_filename: str = filenames['reference.pdb']
metadata_filename: str = filenames['metadata.csv']
sdf_filename: str = filenames['combined.sdf']

In [None]:
#@title Make an apo structure
#@markdown Next we crudely remove HETATM record lines to get an apo structure.
#@markdown [PDB file format](https://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html)
#@markdown **Q**: what is stored in a HETATM?
#@markdown **Q**: what is an "apo structure"?
#@markdown This is quick, but not great approach.
#@markdown **Q**: Why is this so, and what could be done to fix it?
from io import StringIO
with open(pdb_filename) as fh:
    pdb_block:str = fh.read()

apo_block = '\n'.join(filter(lambda l: 'HETATM' not in l , pdb_block.split('\n')))

with open(f'input/{target_name}_reference.clean.pdb', 'w') as fh:
    fh.write(apo_block)

# This is w/o ligand
import nglview as nv

view = nv.NGLWidget()
# change `apo_block` to `pdb_block` for the original:
view.add_component(StringIO(apo_block), ext='pdb')
view

In [None]:
#@title Make a combined table
#@markdown Fragalysis does not give attributes in the sdf entries. This is instead stored in `metadata.csv`.

from rdkit import Chem
from rdkit.Chem import PandasTools
import pandas as pd

mol_df = pd.concat([PandasTools.LoadSDF(sdf_filename).set_index('ID'),
                       pd.read_csv(metadata_filename, index_col=0).set_index('crystal_name')
                      ], axis=1)
mol_df.to_pickle(f'input/{target_name}_df.p')

mol_df

In [None]:
#@title Initialise Pyrosetta
import pyrosetta, logging
import pyrosetta_help as ph

import types
prn: types.ModuleType = pyrosetta.rosetta.numeric
prc: types.ModuleType = pyrosetta.rosetta.core
prp: types.MethodType = pyrosetta.rosetta.protocols


# capture to log
logger = ph.configure_logger()
logger.handlers[0].setLevel(logging.ERROR)  # logging.WARNING = 30
extra_options = ph.make_option_string(no_optH=False,
                                      ex1=None,
                                      ex2=None,
                                      mute='all',
                                      ignore_unrecognized_res=False,
                                      load_PDB_components=False,
                                      ignore_waters=True)
pyrosetta.init(extra_options=extra_options)

pose = pyrosetta.Pose()
pyrosetta.rosetta.core.import_pose.pose_from_pdbstring(pose, apo_block)

In [None]:
#@title Residue Topology
#@markdown As seen previously a molecule is a graph network where the nodes (atoms) may be connected by edges (bonds),
#@markdown And the nodes/atoms have a partial charges property.
#@markdown In molecular mechanics, blocks of atoms are called 'residues', be they ligands or polymer units.
#@markdown When dealing with several algorithms, such as those using forcefields, the residue needs to be "prepared",
#@markdown by adding how it bonds and its charges. Autodock uses a pdbqt format, which extends the PDB format with partial charge and atom types.
#@markdown While other tools have different formats. Rosetta has `.param` files, which adds atom types and the relationship between
#@markdown atoms in dihedral space, not cartesian.
#@markdown A reside type / topology is the universal definition of a residue, not a specific residue.

#@markdown This cell outputs the params file for the molecule 'CO' (methanol).
#@markdown The format is specific to this toolkit, but the idea is common:
#@markdown for an atom you need an atomname and a partial charge and... an 'atomtype'.
#@markdown an atomtype combines element, hybridisation, VdW radius, etc. Similarly to a residuetype, it is a universal and not a specific residue.
from rdkit_to_params import Params

topo = Params.from_smiles('CO', name='LIG')
display(topo)

#@markdown **Q**: Why does the partial charge reside in an atom of a residue type not an atom type?
#@markdown **Q**: Why is bond order often absent in residue types/topologies?

NameError: name 'Chem' is not defined

In [None]:
#@title Atomtype inspection
#@markdown Let's have a gander of what AtomTypes look like
import importlib_resources

print(
    importlib_resources.read_text('pyrosetta.database.chemical.atom_type_sets.fa_standard', 'atom_properties.txt')
)
#@markdown **Q**: Why is bond order often absent in residue types/topologies?

In [None]:
#@title Forcefields
#@markdown A forcefield is made of various terms. A key one is [Lenard-Jones term](https://en.wikipedia.org/wiki/Lennard-Jones_potential).
import DTC_compchem_practical as dtc
import numpy as np
import pandas as pd

combined_scores = {}
for offset in np.arange(0,10, 0.1):
    test: pyrosetta.Pose = pyrosetta.pose_from_sequence('Z[NA]')
    xyz = prn.xyzVector_double_t(test.residues[1].xyz(1))
    xyz.x +=offset
    dtc.add_mod_cl(test,
                   gasteiger=-1,
                   xyz = xyz)
    scorefxn = pyrosetta.get_fa_scorefxn()
    scores = {st.name: scorefxn.score_by_scoretype(test, st, True) for st in scorefxn.get_nonzero_weighted_scoretypes()}
    scores['distance'] = (test.residue(1).xyz(1) - test.residue(2).xyz(1)).norm()
    combined_scores[offset] = scores

df = pd.DataFrame.from_dict(combined_scores, orient='index').round(2)
ndf=(df-df.min())/(df.max()-df.min())
#ndf.columns = map(ph.weights.term_meanings, ndf.columns.values)
import plotly.express as px

px.line(df)

In [None]:
#@markdown for a dictionary
ph.weights.term_meanings

In [None]:
#@title Place molecule
import types
prc: types.ModuleType = pyrosetta.rosetta.core
from io import StringIO

# the first molecule
mol = mol_df.ROMol[0]
# add it to the pose
# let's pretend by magic:
combined = dtc.add_mol_in_pose(pose, mol )
# don't worry about these few lines,
# except for the word MC
lig_i = [i+1 for i, r in enumerate(combined.residues) if r.name3() == 'LIG'][-1]
combined.pdb_info().set_resinfo(res=lig_i, chain_id='B', pdb_res=1)
combined.remove_constraints()
pyrosetta.rosetta.protocols.docking.setup_foldtree(combined, 'A_B', pyrosetta.Vector1([1]))
scorefxn = pyrosetta.create_score_function('ligand')
docking = pyrosetta.rosetta.protocols.docking.DockMCMProtocol()
docking.set_scorefxn(scorefxn)
docking.apply(combined)

# combined

import nglview as nv

view = nv.show_rosetta(combined, color='gainsboro')
view.component0.add_representation('hyperball', '[LIG]', colorValue='#F8766D')
view.add_component(StringIO(Chem.MolToMolBlock(mol)), colorValue='#00B4C4')
display(view)
#@markdown **Q**: What is a Monte Carlo method? Hint: it is not a method written in monégasque.

In [None]:
#@title Create docking algorithm
#@markdown ...
#@markdown ...

#@markdown **Q**: What is a Monte Carlo method? Hint: it is not a method written in monégasque.