# Advanced use of the evaluation functions outside EvoMol

## Declaration

The dictionary-based objective function declaration can be used outside EvoMol to evaluate any molecule. Let's first declare a function that is defined to have highest values for molecules with approximatively 30% heteroatoms and a high [QED](https://www.nature.com/articles/nchem.1243) value. 

In [1]:
from rdkit.Chem import Lipinski, MolFromSmiles

def hetero_atoms_proportion(smiles):
    return Lipinski.NumHeteroatoms(MolFromSmiles(smiles)) / Lipinski.HeavyAtomCount(MolFromSmiles(smiles))

eval_function_d = {
    "obj_function":{
        "type": "mean",
        "functions": [
            {
                "type": "gaussian",
                "function": (hetero_atoms_proportion, "hetero_atoms_proportion"), # Using our custom function and naming the sub-objective
                "mu": 0.3,
                "sigma": 0.1,
                "normalize": True
            },
            "qed"
        ]
    }
}

## Obtaining the actual evaluation function

This function could be given to *evomol.run_model* in order to find some of its maximizers, but we want here to obtain a python object that will allow us to evaluate any SMILES.

In [2]:
from evomol import get_objective_function_instance

eval_function = get_objective_function_instance(eval_function_d)

The object obtained is an instance of *evomol.evaluation.MeanEvaluationStrategyComposite* since it corresponds 
to the highest level multi-objective function. But as all evaluation functions that can be returned, it is 
also an instance of *evomol.evaluation.EvaluationStrategyComposant*. 

In [3]:
from evomol.evaluation import EvaluationStrategyComposant

print(type(eval_function))
print(isinstance(eval_function, EvaluationStrategyComposant))

<class 'evomol.evaluation.MeanEvaluationStrategyComposite'>
True


## The evomol.evaluation.EvaluationStrategyComposant class

The instance returned possesses two main functions :
* **eval_smi**, that can evaluate any SMILES (with or without the scores of the properties contained)
* **keys**, that returns a list of text descriptions for all contained scores.

Let's first evaluate the values for a few molecules.

In [4]:
smiles_list = ["C", "c1ccccc1", "CC(=O)NC1=CC=C(C=C1)O", "CC(=O)OC1=CC=CC=C1C(=O)O", 
               "NCC(C(Br)=CBr)c1cc2c(o1)C2"]

for smi in smiles_list:
    print(smi + " : " + str(eval_function.eval_smi(smi)))

C : 0.1854469672111062
c1ccccc1 : 0.22686868421880352
CC(=O)NC1=CC=C(C=C1)O : 0.7792595857135765
CC(=O)OC1=CC=CC=C1C(=O)O : 0.7735837945483528
NCC(C(Br)=CBr)c1cc2c(o1)C2 : 0.9719504841236775


Now let's use the **get_subscores** parameter to obtain the details about the sub-scores

In [5]:
for smi in smiles_list:
    print(smi + " : " + str(eval_function.eval_smi(smi, get_subscores=True)))

C : (0.1854469672111062, array([0.18544697, 0.011109  , 0.        , 0.35978494]))
c1ccccc1 : (0.22686868421880352, array([0.22686868, 0.011109  , 0.        , 0.44262837]))
CC(=O)NC1=CC=C(C=C1)O : (0.7792595857135765, array([0.77925959, 0.96349297, 0.27272727, 0.5950262 ]))
CC(=O)OC1=CC=CC=C1C(=O)O : (0.7735837945483528, array([0.77358379, 0.99704579, 0.30769231, 0.5501218 ]))
NCC(C(Br)=CBr)c1cc2c(o1)C2 : (0.9719504841236775, array([0.97195048, 0.99704579, 0.30769231, 0.94685518]))


The subscores are given in the same order as the output of the **keys** method, that allows here to map each value with its corresponding property.

In [6]:
print(eval_function.keys())

['X̅(Gaussian(hetero_atoms_proportion); qed)', 'Gaussian(hetero_atoms_proportion)', 'hetero_atoms_proportion', 'qed']


Now let's write a final function that allow us to obtain the total score and each subscore for each molecule.

In [7]:
def print_all_scores(smiles_list, eval_function):
    
    for smi in smiles_list:
        
        total_score, sub_scores = eval_function.eval_smi(smi, get_subscores=True)
        keys = eval_function.keys()
        
        print(smi)
        print("total score : " + "{:.2f}".format(total_score))
        for i in range(len(sub_scores)):
            print(keys[i] + " : " + "{:.2f}".format(sub_scores[i]))
        print()
            
        

In [8]:
print_all_scores(smiles_list, eval_function)

C
total score : 0.19
X̅(Gaussian(hetero_atoms_proportion); qed) : 0.19
Gaussian(hetero_atoms_proportion) : 0.01
hetero_atoms_proportion : 0.00
qed : 0.36

c1ccccc1
total score : 0.23
X̅(Gaussian(hetero_atoms_proportion); qed) : 0.23
Gaussian(hetero_atoms_proportion) : 0.01
hetero_atoms_proportion : 0.00
qed : 0.44

CC(=O)NC1=CC=C(C=C1)O
total score : 0.78
X̅(Gaussian(hetero_atoms_proportion); qed) : 0.78
Gaussian(hetero_atoms_proportion) : 0.96
hetero_atoms_proportion : 0.27
qed : 0.60

CC(=O)OC1=CC=CC=C1C(=O)O
total score : 0.77
X̅(Gaussian(hetero_atoms_proportion); qed) : 0.77
Gaussian(hetero_atoms_proportion) : 1.00
hetero_atoms_proportion : 0.31
qed : 0.55

NCC(C(Br)=CBr)c1cc2c(o1)C2
total score : 0.97
X̅(Gaussian(hetero_atoms_proportion); qed) : 0.97
Gaussian(hetero_atoms_proportion) : 1.00
hetero_atoms_proportion : 0.31
qed : 0.95

