# Traditional Machine Learning model

This model is based on Gradient Boosting on Decision Trees classifier (GB on DTs), using the CatBoost Classifier.

The model takes SMILES of any query molecules as input and returns the predicted probability of binding to each non-orphan TAS2R receptor.

***

<div class="alert alert-block alert-warning">
Before running the following code, please make sure to have <b>all the required libraries</b>. Instruction how to obtain the full environment are present in the <b>README file</b> of this repository
</div>

***

Import the libraries and the functions from the main script

In [4]:
import pandas as pd
import ast
import matplotlib.pyplot as plt
import numpy as np
import sys

In [2]:
# if you are encountering an error with enchant, uncomment the following line
# customise the path to your enchant library (/opt/homebrew/opt/enchant/lib/libenchant-2.dylib)
# to find the location of the enchant library on MacOS, run `brew --prefix enchant`
%env PYENCHANT_LIBRARY_PATH=/opt/homebrew/opt/enchant/lib/libenchant-2.dylib

env: PYENCHANT_LIBRARY_PATH=/opt/homebrew/opt/enchant/lib/libenchant-2.dylib


In [11]:
# import the evaluation function for the TML model (folder in ../TML)
import os
sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'TML/'))

import TML_Eval_v2
from TML_Eval_v2 import eval_smiles

[17:14:43] Initializing Normalizer


Insert the input molecule

In [12]:
smiles = 'CC(CC1=CC2=C(C=C1)OCO2)NC'

Should the model check if any of the input molecules is already present in the dataset?
<br>
If True, for any known pair the results will show the ground truth (0 or 1) and not model's prediction.

In [13]:
GT = True # TRUE for Ground Truth Check

Run the evaluation task over the input molecule for every non-orphan receptors with the trained model

In [14]:
f_df = eval_smiles(smiles,ground_truth=GT, verbose=True)

[INFO  ] Standardizing molecules
[INFO  ] Checking Applicability Domain
[INFO  ] Calculating descriptors
[INFO  ] Adding Receptor features
[INFO  ] Making predictions
[INFO  ] Wrapping up results


## Results

Show the table with the <b> final results </b>

Every prediction displayed is the probability of the bind of each molecule to each receptor, from 0 (no-bind) to 1 (bind)

The Applicability Domain column shows if the input molecule is similar enough to the ones in the training dataset. If the check returns FALSE it is strongly advised to not consider the prediction for that molecule as reliable

In [15]:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(f_df)

Unnamed: 0,Standardized SMILES,Ground Truth,Check AD,1,3,4,5,7,8,9,10,13,14,16,38,39,40,41,42,43,44,46,47,49,50
0,CNC(C)Cc1ccc2c(c1)OCO2,Absent,1.0,0.09,0.0,0.05,0.0,0.01,0.0,0.0,0.61,0.0,0.84,0.01,0.07,0.07,0.01,0.0,0.0,0.02,0.0,0.5,0.01,0.01,0.0
