# Drug Design: Relative Binding Energy Prediction

The FXR nuclear receptor forms a heterodimer with RXR when activated, and binds to hormone response elements on DNA, leading to up- or down-regulation of the expression of certain genes. FXR agonists are regarded as potential therapeutics for dyslipidemia and diabetes.

Here, we will calculate the relative binding energy with the FXR nuclear recptor between the following two ligands.

## 0. Install additional dependent packages

To build the system, we need more dependent packages, which is not installed by default. If you have installed these packages, just ignore this step

In [1]:
#!conda install -c rdkit rdkit
#!conda install -c openbabel openbabel
#!pip install XpongeLib
#!pip install pyscf
#!pip install geometric

## 1. Download the original pdb file and preprocess it manually

Here, the file in the directory `1kjyp-AA_2017-1-9_tight.pdb` is downloaded from [Drug Design Data Resource challenge 2](https://drugdesigndata.org/about/grand-challenge-2/fxr). There are some water molecules in the pdb file, we only need the protein (lines which start with "ATOM") and the original ligand (lines whose residue name is "LIG")

In [2]:
!cat 1kjyp-AA_2017-1-9_tight.pdb | grep -E "ATOM  |TER  " > protein.pdb

In [3]:
!cat 1kjyp-AA_2017-1-9_tight.pdb | grep -E "CONECT|LIG L" > ligand.pdb

## 2. Get the unprocessed mol2 files of two ligands

Here, for each ligand, we

1. use `openbabel` to convert the smiles to the mol2 files
2. set pH to 7.0
3. rename the residue


In [4]:
!obabel -:"OC(=O)c1ccc(CN2C(=O)C3(CCN(CC3)S(=O)(=O)c4ccccc4Cl)c5cc(Br)ccc25)cc1" --gen3d -omol2 -O FXR_12.mol2
!obabel FXR_12.mol2 -O FXR_12.mol2 -p 7.0
!sed -i 's/UNL1/LIG/g' FXR_12.mol2
!obabel -:"OC(=O)c1ccc(CN2C(=O)C3(CCN(CC3)S(=O)(=O)c4ccccc4F)c5cc(Br)ccc25)cc1" --gen3d -omol2 -O FXR_84.mol2
!obabel FXR_84.mol2 -O FXR_84.mol2 -p 7.0
!sed -i 's/UNL1/F84/g' FXR_84.mol2

1 molecule converted
1 molecule converted
1 molecule converted
1 molecule converted


## 3. Preprocess the mol2 files

Here, for each ligand, we

1. rename the atoms so that every atom has a unique name
2. assign the GAFF atom type for each atom
3. calculate the RESP partial charge

The third step may take a long time because it asks to do a Hartree-Folk optimization, and here I ignore the step and directly use the charge from `obabel`

In [5]:
import mindsponge.toolkits as Xponge
Xponge.source("mindsponge.toolkits.forcefield.amber.gaff")
FXR_12 = Get_Assignment_From_Mol2("FXR_12.mol2")
FXR_12.Add_Index_To_Name()
FXR_12.Determine_Atom_Type("gaff")
eq_atoms = FXR_12.Determine_Equal_Atoms()
#FXR_12.Calculate_Charge("RESP", opt = True, extra_equivalence = eq_atoms, charge = int(round(sum(FXR_12.charge))))
FXR_12_res_type = FXR_12.To_ResidueType("LIG")
Save_Mol2(FXR_12_res_type)

FXR_84 = Get_Assignment_From_Mol2("FXR_84.mol2")
FXR_84.Add_Index_To_Name()
FXR_84.Determine_Atom_Type("gaff")
eq_atoms = FXR_84.Determine_Equal_Atoms()
#FXR_84.Calculate_Charge("RESP", opt = True, extra_equivalence = eq_atoms, charge = -1)
FXR_84_res_type = FXR_84.To_ResidueType("F84")
Save_Mol2(FXR_84_res_type)

Reference for gaff:
  Wang, J., Wolf, R.M., Caldwell, J.W., Kollman, P.A. and Case, D.A.
    Development and testing of a general amber force field.
    Journal of Computational Chemistry 2004 25, 1157-1174
    DOI: 10.1002/jcc.20035



## 4. Build the pdb files to run

1. Rename the atom names in the file `ligand.pdb`, to match the names in LIG.mol2.
2. Use `Xponge` to add water molecules and ions.


In [6]:
!python -m mindsponge.toolkits name2name -h

usage: Xponge name2name [-h] -fformat {mol2,pdb,gaff_mol2} -ffile FROM_FILE
                        [-fres FROM_RESIDUE] -tformat {mol2,pdb,gaff_mol2}
                        -tfile TO_FILE [-tres TO_RESIDUE] -oformat
                        {mol2,pdb,mcs_pdb} -ofile OUT_FILE [-ores OUT_RESIDUE]
                        [-tmcs TMCS]

optional arguments:
  -h, --help            show this help message and exit
  -fformat {mol2,pdb,gaff_mol2}, -from_format {mol2,pdb,gaff_mol2}
                        the format of the file which is needed to change from
  -ffile FROM_FILE, -from_file FROM_FILE
                        the name of the file which is needed to change from
  -fres FROM_RESIDUE, -from_residue FROM_RESIDUE
                        the residue name in ffile if fformat == pdb
  -tformat {mol2,pdb,gaff_mol2}, -to_format {mol2,pdb,gaff_mol2}
                        the format of the file which is needed to change to
  -tfile TO_FILE, -to_file TO_FILE
                 

In [7]:
!python -m mindsponge.toolkits name2name -fformat pdb -ffile ligand.pdb -tformat gaff_mol2 -tfile LIG.mol2 -oformat mcs_pdb -ofile ligand_renamed.pdb -ores LIG

Reference for gaff:
  Wang, J., Wolf, R.M., Caldwell, J.W., Kollman, P.A. and Case, D.A.
    Development and testing of a general amber force field.
    Journal of Computational Chemistry 2004 25, 1157-1174
    DOI: 10.1002/jcc.20035



In [8]:
Xponge.source("mindsponge.toolkits.forcefield.amber.ff14sb")
Xponge.source("mindsponge.toolkits.forcefield.amber.tip3p")
protein = load_pdb("protein.pdb")
protein.Add_Missing_Atoms()
ligand = load_pdb("ligand_renamed.pdb")
ligand.Add_Missing_Atoms()
protein_ligand = ligand | protein

add_solvent_box(protein_ligand, WAT, 10)
c1 = int(round(protein_ligand.charge))
Solvent_Replace(protein_ligand, WAT, {CL:20 + c1, K:20})
Save_PDB(protein_ligand, "protein_ligand_water.pdb")

add_solvent_box(ligand, WAT, 10)
c2 = int(round(ligand.charge))
Solvent_Replace(ligand, WAT, {CL:10 + c2, K:10})
Save_PDB(ligand, "ligand_water.pdb")

Reference for ff14SB:
  James A. Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E. Hauser, and Carlos Simmerling
    ff14SB: Improving the accuracy of protein side chain and backbone parameters from ff99SB
    Journal of Chemical Theory and Computation 2015 11 (8), 3696-3713
    DOI: 10.1021/acs.jctc.5b00255

Reference for tip3p:
1. Water:
  William L. Jorgensen, Jayaraman Chandrasekhar, and Jeffry D. Madura
    Comparison of simple potential functions for simulating liquid water
    The Journal of Chemical Physics 1983 79, 926-935, 
    DOI: 10.1063/1.445869

2. Li+, Na+, K+, Rb+, Cs+, F-, Cl-, Br-, I-:
  In Suk Joung and Thomas E. Cheatham
    Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solvated Biomolecular Simulations
    The Journal of Physical Chemistry B 2008 112 (30), 9020-9041
    DOI: 10.1021/jp8001614

3. Ag+, Tl+, Cu+:
  Pengfei Li, Lin Frank Song, and Kenneth M. Merz
    Systematic Parameterization of Monovalen

## 5. Do FEP

Finally, we should call ``Xponge`` to do the FEP calculation both for the protein bound to the ligand and the ligand itself. The output in the notebook is interrupted by the keyboard to avoid the file is too large to view on the website. This step may take a very long time.

In [9]:
!python -m mindsponge.toolkits mol2rfe -h

usage: Xponge mol2rfe [-h] [-do [todo [todo ...]]] -pdb PDB -r2 R2 -r1 R1
                      [-r0 [R0 [R0 ...]]] [-ri 0] [-nl 20] [-dohmr] [-ff FF]
                      [-mi [MI [MI ...]]] [-pi PI] [-ei EI] [-ai AI]
                      [-method {TI}] [-temp TMP] [-tmcs 10] [-dt dt]
                      [-msteps MSTEPS MSTEPS MSTEPS MSTEPS MSTEPS MSTEPS]
                      [-pstep pre_equilibrium_step] [-estep 500000]
                      [-thermostat middle_langevin]
                      [-barostat andersen_barostat]

optional arguments:
  -h, --help            show this help message and exit
  -do [todo [todo ...]]
                        the things need to do, should be one or more of
                        'build', 'min', 'pre_equilibrium', 'equilibrium',
                        'analysis'
  -pdb PDB              the initial conformation given by the pdb file
  -r2 R2, -residuetype2 R2
                        molecule mutated to by an Xponge ResidueType

In [10]:
!mkdir -p protein
!cd protein && python -m mindsponge.toolkits mol2rfe -pdb ../protein_ligand_water.pdb -r1 ../LIG.mol2 -r2 ../F84.mol2

Reference for gaff:
  Wang, J., Wolf, R.M., Caldwell, J.W., Kollman, P.A. and Case, D.A.
    Development and testing of a general amber force field.
    Journal of Computational Chemistry 2004 25, 1157-1174
    DOI: 10.1002/jcc.20035

Reference for ff14SB:
  James A. Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E. Hauser, and Carlos Simmerling
    ff14SB: Improving the accuracy of protein side chain and backbone parameters from ff99SB
    Journal of Chemical Theory and Computation 2015 11 (8), 3696-3713
    DOI: 10.1021/acs.jctc.5b00255

Reference for tip3p:
1. Water:
  William L. Jorgensen, Jayaraman Chandrasekhar, and Jeffry D. Madura
    Comparison of simple potential functions for simulating liquid water
    The Journal of Chemical Physics 1983 79, 926-935, 
    DOI: 10.1063/1.445869

2. Li+, Na+, K+, Rb+, Cs+, F-, Cl-, Br-, I-:
  In Suk Joung and Thomas E. Cheatham
    Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solv

In [11]:
!mkdir -p ligand
!cd ligand && python -m mindsponge.toolkits mol2rfe -pdb ../ligand_water.pdb -r1 ../LIG.mol2 -r2 ../F84.mol2

Reference for gaff:
  Wang, J., Wolf, R.M., Caldwell, J.W., Kollman, P.A. and Case, D.A.
    Development and testing of a general amber force field.
    Journal of Computational Chemistry 2004 25, 1157-1174
    DOI: 10.1002/jcc.20035

Reference for ff14SB:
  James A. Maier, Carmenza Martinez, Koushik Kasavajhala, Lauren Wickstrom, Kevin E. Hauser, and Carlos Simmerling
    ff14SB: Improving the accuracy of protein side chain and backbone parameters from ff99SB
    Journal of Chemical Theory and Computation 2015 11 (8), 3696-3713
    DOI: 10.1021/acs.jctc.5b00255

Reference for tip3p:
1. Water:
  William L. Jorgensen, Jayaraman Chandrasekhar, and Jeffry D. Madura
    Comparison of simple potential functions for simulating liquid water
    The Journal of Chemical Physics 1983 79, 926-935, 
    DOI: 10.1063/1.445869

2. Li+, Na+, K+, Rb+, Cs+, F-, Cl-, Br-, I-:
  In Suk Joung and Thomas E. Cheatham
    Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solv

## 6. See the results

The free energy difference of every simulation is stored in the file `free_energy.txt`.The relative binding energy equals to the free energy difference of Protein-Ligand(aq) - the free energy difference of Ligand(aq), which is 6.33 - 4.71 = 1.62 (kcal/mol) here (The data you get may be not absolute the same as mine, because this is just an example, and the simulation time may not be enough). The experimental value is 2.599 kcal/mol, which the difference is less than 1 kcal/mol so that the error is acceptable.

In [12]:
!cat ligand/free_energy.txt

lambda_state	FE(i+1)-FE(i)[kcal/mol]	FE(i+1)-FE(0)[kcal/mol]
0		0.08			0.08
1		0.13			0.20
2		0.21			0.42
3		0.22			0.64
4		0.21			0.85
5		0.13			0.99
6		0.13			1.12
7		0.21			1.33
8		0.16			1.49
9		0.21			1.69
10		0.29			1.99
11		0.30			2.29
12		0.31			2.60
13		0.33			2.93
14		0.33			3.26
15		0.37			3.63
16		0.32			3.95
17		0.27			4.22
18		0.25			4.48
19		0.23			4.71

In [13]:
!cat protein/free_energy.txt

lambda_state	FE(i+1)-FE(i)[kcal/mol]	FE(i+1)-FE(0)[kcal/mol]
0		0.21			0.21
1		0.14			0.34
2		0.13			0.47
3		0.27			0.74
4		0.21			0.94
5		0.29			1.23
6		0.33			1.57
7		0.29			1.86
8		0.04			1.90
9		0.06			1.96
10		0.25			2.21
11		0.28			2.49
12		0.35			2.85
13		0.40			3.25
14		0.42			3.67
15		0.44			4.11
16		0.58			4.68
17		0.56			5.25
18		0.51			5.76
19		0.57			6.33