# Pharmacophores

## Aim of this lab

To use different machine learning alogorithms for Quantitative Structure Activity Relationship Modeling. 

### Objectives

* Use a variety of machine learning algorithms to develop QSAR models


## Background


Background on QSAR

### Descriptors

Similar to calculating chemical similiarity, PCA can be done on any set of descriptors or fingerprints.  However, just like we saw before its important to standardize if your descriptors are not already on the same range.  

In [1]:
from rdkit import Chem 
from rdkit.Chem import PandasTools
import pandas as pd

In [3]:
df = PandasTools.LoadSDF('data/DIAZEPAM_w_name.sdf')
df.head(3)

Unnamed: 0,Name,MolSmiles,Bio_Activity,ID,ROMol
0,Mol_0,CC(C)(C)OC(=O)c1c2n(cn1)-c3ccccc3C(=O)N(C2)C,-1.28,,
1,Mol_1,CN1Cc2c(ncn2-c3ccc(cc3C1=O)Cl)C(=O)OC,-0.62,,
2,Mol_2,CCCOC(=O)c1c2n(cn1)-c3ccc(cc3C(=O)N(C2)C)Cl,-0.13,,


In [4]:
from rdkit.ML.Descriptors import MoleculeDescriptors
from rdkit.Chem import AllChem
from rdkit.Chem import MACCSkeys
from rdkit.Chem import Descriptors

def calc_descriptors_from_mol(mol):
    """
    Encode a molecule from a RDKit Mol into a set of descriptors.

    Parameters
    ----------
    mol : RDKit Mol
        The RDKit molecule.

    Returns
    -------
    list
        The set of chemical descriptors as a list.

    """
    calc = MoleculeDescriptors.MolecularDescriptorCalculator([desc[0] for desc in Descriptors.descList])
    return list(calc.CalcDescriptors(mol))

In [5]:
desc_list = []


for mol in df.ROMol.tolist():
    desc = calc_descriptors_from_mol(mol)
    desc_list.append(desc)

    
desc_frame = pd.DataFrame(desc_list)

#### Standardizing Descriptors

In [6]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

desc_frame_std = pd.DataFrame(scaler.fit_transform(desc_frame))

desc_frame_std.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,198,199,200,201,202,203,204,205,206,207
0,-0.229804,-0.700378,-0.229804,-0.490113,0.400245,-0.84443,-0.945733,-0.839924,-0.427852,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.27735,0.0,0.0
1,-0.289638,-0.294427,-0.289638,0.485646,0.362171,-1.039932,-0.961257,-1.048096,-1.365407,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.27735,0.0,0.0
2,-0.196264,-0.003103,-0.196264,0.40207,0.944074,-0.321673,-0.318295,-0.326432,-0.427852,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.27735,0.0,0.0
3,-0.191039,-0.115072,-0.191039,0.457694,0.759743,-0.321673,-0.318295,-0.326432,-0.427852,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.27735,0.0,0.0
4,-0.149571,0.202131,-0.149571,0.290525,0.861595,-0.014158,0.003185,-0.017492,-0.115334,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.27735,0.0,0.0


#### QSAR Modeling

##### Classification

Regression

##### Evaluation Metrics