# Molecular Generator Evaluation using TUPOR, SESY and ASER Metrics

🔹 **Objective**  
This notebook evaluates molecular generators by computing four key metrics:  
   - **TUPOR**: scaffold recall metrics  
   - **SESY**: scaffold hopping potencial  
   - **ASER**: chemical space exploration

🔹 **Workflow**  
1️⃣ **Compute Metrics**: The script calculates TUPOR, SESY and ASER for different molecular generators.  
2️⃣ **Merge Data**: Results from multiple generators are combined into a single Pandas DataFrame.  
3️⃣ **Normalize Values**: The computed metrics are normalized using Min-Max scaling for comparison.  
4️⃣ **Save Outputs**: Processed data is stored in CSV files for further analysis.  

🔹 **Data Structure**  
- The calculations are performed for different **scaffold types** (`csk`, `murcko`) and **cluster types** (`dis`, `sim`).  
- Results are computed for multiple **generators** (`Molpher`, `REINVENT`, `DrugEx`, `GB_GA`, etc.).  
- The analysis is conducted for a specific **biological target receptor**, such as the **Glucocorticoid receptor**.

This notebook allows us to compare the performance of various molecular generators in terms of structural diversity, similarity to known bioactive compounds, and synthetic feasibility.

# Loading required libraries

In [13]:
from src import metrics # Importing custom metric functions
import importlib as imp
imp.reload(metrics)

<module 'src.metrics' from '/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/src/metrics.py'>

# Function to calculate metrics

In [14]:
def calculate_metrics(type_cluster, type_scaffold,generator, receptor, ncpus = 1):
    """
    Function to calculate molecular generation metrics.
    
    Parameters:
    - scaffold_type: Type of scaffold (e.g., 'csk' or 'murcko')
    - type_cluster: Cluster type  (e.g., 'dis' or 'sim') dis = Dissimilarity split; sim = Similarity split
    - generator: Name of the molecular generator
    - receptor: Target receptor for drug design

    Returns:
    - Computed metrics
    """
    mt = metrics.Metrics(type_cluster, type_scaffold, generator, receptor, ncpus)     
    result = mt.calculate_metrics()
    display(result)
    return result

# Define parameters for metric calculations

In [3]:
type_cluster = 'sim' #options: 'dis'|'sim' 
type_scaffold = 'csk' #options: 'csk'|'murcko'
generator = 'Molpher' #options: 'Molpher'|'DrugEx'|'REINVENT'|'addcarbon'
receptor = 'Leukocyte_elastase' #options: 'Glucocorticoid_receptor'|'Leukocyte_elastase'

calculate_metrics(type_cluster,type_scaffold,generator,receptor, ncpus = 10)

NUMBER:  0


[10:03:34] Explicit valence for atom # 12 C, 5, is greater than permitted
[10:03:49] Explicit valence for atom # 28 C, 5, is greater than permitted
[10:03:49] Explicit valence for atom # 28 C, 5, is greater than permitted
[10:03:49] Explicit valence for atom # 27 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 7 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 7 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 7 C, 5, 

NUMBER:  1


[10:08:55] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:08:55] Explicit valence for atom # 23 C, 5, is greater than permitted
[10:08:55] Explicit valence for atom # 23 C, 5, is greater than permitted
[10:08:55] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:08:55] Explicit valence for atom # 23 C, 5, is greater than permitted
[10:09:48] Explicit valence for atom # 11 C, 5, is greater than permitted
[10:09:48] Explicit valence for atom # 11 C, 5, is greater than permitted
[10:09:48] Explicit valence for atom # 11 C, 5, is greater than permitted
[10:10:01] Explicit valence for atom # 2 C, 5, is greater than permitted
[10:10:01] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:10:01] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:10:07] Explicit valence for atom # 7 C, 5, is greater than permitted
[10:10:48] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:12:09] Explicit valence for atom # 1

NUMBER:  2


[10:15:35] Explicit valence for atom # 10 C, 5, is greater than permitted
[10:16:46] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:16:46] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 6 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 6 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 25 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 6 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 6 C, 5, is greater than permitted
[10:17:53] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:17:58] Explicit valence for atom # 1 C, 5, is greater than permitted
[10:17:58] Explicit valence for atom # 1 C, 5, is greater than permitted
[10:17:58] Explicit valence for atom # 1 C, 5, is greater than permitted
[10:18:11] Explicit valence for atom # 7 C, 5

NUMBER:  3


[10:21:36] Explicit valence for atom # 10 C, 5, is greater than permitted
[10:21:54] Explicit valence for atom # 1 C, 5, is greater than permitted
[10:21:56] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:21:56] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:23:58] Explicit valence for atom # 26 C, 5, is greater than permitted
[10:23:58] Explicit valence for atom # 25 C, 5, is greater than permitted
[10:23:58] Explicit valence for atom # 25 C, 5, is greater than permitted
[10:24:35] Explicit valence for atom # 31 C, 5, is greater than permitted
[10:24:35] Explicit valence for atom # 26 C, 5, is greater than permitted
[10:24:35] Explicit valence for atom # 27 C, 5, is greater than permitted
[10:24:42] Explicit valence for atom # 29 C, 5, is greater than permitted
[10:25:23] Explicit valence for atom # 15 C, 5, is greater than permitted


NUMBER:  4


[10:27:38] Explicit valence for atom # 25 C, 5, is greater than permitted
[10:27:38] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:27:38] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 16 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 14 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:21] Explicit valence for atom # 21 C, 5, is greater than permitted
[10:28:21] Explicit valence for atom # 21 C, 5, is greater than permitted
[10:28:21] Explicit valence for atom #

Unnamed: 0,name,type_cluster,scaffold,SSo,TUPOR_,TUPOR,SESY,ASER
0,Molpher_0,sim,csk,1624700.0,43/46,0.934783,0.138021,0.03143
1,Molpher_1,sim,csk,1662823.0,38/44,0.863636,0.130204,0.009258
2,Molpher_2,sim,csk,1546835.0,36/42,0.857143,0.12128,0.007056
3,Molpher_3,sim,csk,1749302.0,32/42,0.761905,0.123391,0.011643
4,Molpher_4,sim,csk,1722174.0,32/42,0.761905,0.123599,0.003529
5,Molpher_mean,sim,csk,1661166.8,-,0.835874,0.127299,0.012583


Unnamed: 0,name,type_cluster,scaffold,SSo,TUPOR_,TUPOR,SESY,ASER
0,Molpher_0,sim,csk,1624700.0,43/46,0.934783,0.138021,0.03143
1,Molpher_1,sim,csk,1662823.0,38/44,0.863636,0.130204,0.009258
2,Molpher_2,sim,csk,1546835.0,36/42,0.857143,0.12128,0.007056
3,Molpher_3,sim,csk,1749302.0,32/42,0.761905,0.123391,0.011643
4,Molpher_4,sim,csk,1722174.0,32/42,0.761905,0.123599,0.003529
5,Molpher_mean,sim,csk,1661166.8,-,0.835874,0.127299,0.012583


# Execute metric calculation function

In [16]:
for receptor in ['Glucocorticoid_receptor']:
    for type_scaffold in ['csk','murcko']:
        for type_cluster in ['dis','sim']:
            #for subset in ['','_500k', '_250k', '_125k', '_62.5k']:
            for subset in ['']:
                ncpus = 10
                
                # Define generator names with different epsilon values
                generators_name_list = [
                    #f"Molpher{subset}",
                    #f"REINVENT{subset}",
                    #f"DrugEx_GT_epsilon_0.1{subset}",
                    #f"DrugEx_GT_epsilon_0.6{subset}",
                    #f"DrugEx_RNN_epsilon_0.1{subset}",
                    #f"DrugEx_RNN_epsilon_0.6{subset}",
                    f"GB_GA_new_mut_r_0.01{subset}",
                    #f"GB_GA_mut_r_0.5{subset}",
                    #f"addcarbon{subset}"
                ]
                for generator in generators_name_list:
                    print(generator)
                    calculate_metrics(type_cluster,type_scaffold,generator,receptor,ncpus = ncpus)

GB_GA_new_mut_r_0.01
NUMBER:  0
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_0_one_column.csv
NUMBER:  1
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_1_one_column.csv
NUMBER:  2
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_2_one_column.csv
NUMBER:  3
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_3_one_column.csv
NUMBER:  4
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_4_one_column.csv


Unnamed: 0,name,type_cluster,scaffold,SSo,TUPOR_,TUPOR,SESY,ASER
0,GB_GA_new_mut_r_0.01_0,dis,csk,973311.0,30/40,0.75,0.055589,0.011361
1,GB_GA_new_mut_r_0.01_1,dis,csk,969287.0,17/23,0.73913,0.051912,0.002051
2,GB_GA_new_mut_r_0.01_2,dis,csk,974337.0,27/40,0.675,0.058665,0.00223
3,GB_GA_new_mut_r_0.01_3,dis,csk,970872.0,24/37,0.648649,0.051188,0.028458
4,GB_GA_new_mut_r_0.01_4,dis,csk,961926.0,35/43,0.813953,0.052912,0.0345
5,GB_GA_new_mut_r_0.01_mean,dis,csk,969946.6,-,0.725347,0.054053,0.01572


GB_GA_new_mut_r_0.01
NUMBER:  0
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_0_one_column.csv
NUMBER:  1
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_1_one_column.csv
NUMBER:  2
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_2_one_column.csv
NUMBER:  3
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_3_one_column.csv
NUMBER:  4
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_4_one_column.csv


Unnamed: 0,name,type_cluster,scaffold,SSo,TUPOR_,TUPOR,SESY,ASER
0,GB_GA_new_mut_r_0.01_0,sim,csk,980527.0,34/38,0.894737,0.067437,0.028965
1,GB_GA_new_mut_r_0.01_1,sim,csk,963595.0,32/38,0.842105,0.052476,0.040969
2,GB_GA_new_mut_r_0.01_2,sim,csk,970465.0,28/37,0.756757,0.052846,0.018659
3,GB_GA_new_mut_r_0.01_3,sim,csk,968654.0,33/35,0.942857,0.053712,0.017323
4,GB_GA_new_mut_r_0.01_4,sim,csk,969662.0,29/35,0.828571,0.050415,0.00564
5,GB_GA_new_mut_r_0.01_mean,sim,csk,970580.6,-,0.853005,0.055377,0.022311


GB_GA_new_mut_r_0.01
NUMBER:  0
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_0_one_column.csv
NUMBER:  1
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_1_one_column.csv
NUMBER:  2
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_2_one_column.csv
NUMBER:  3
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_3_one_column.csv
NUMBER:  4
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_dis_4_one_column.csv


Unnamed: 0,name,type_cluster,scaffold,SSo,TUPOR_,TUPOR,SESY,ASER
0,GB_GA_new_mut_r_0.01_0,dis,murcko,973311.0,15/92,0.163044,0.231842,0.003068
1,GB_GA_new_mut_r_0.01_1,dis,murcko,969287.0,14/34,0.411765,0.236653,0.000258
2,GB_GA_new_mut_r_0.01_2,dis,murcko,974337.0,11/64,0.171875,0.269065,0.001123
3,GB_GA_new_mut_r_0.01_3,dis,murcko,970872.0,27/76,0.355263,0.250007,0.022223
4,GB_GA_new_mut_r_0.01_4,dis,murcko,961926.0,30/98,0.306122,0.231026,0.005287
5,GB_GA_new_mut_r_0.01_mean,dis,murcko,969946.6,-,0.281614,0.243719,0.006392


GB_GA_new_mut_r_0.01
NUMBER:  0
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_0_one_column.csv
NUMBER:  1
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_1_one_column.csv
NUMBER:  2
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_2_one_column.csv
NUMBER:  3
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_3_one_column.csv
NUMBER:  4
/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/data/output_sets/Glucocorticoid_receptor/GB_GA_new_mut_r_0.01/cOS_GB_GA_new_mut_r_0.01_sim_4_one_column.csv


Unnamed: 0,name,type_cluster,scaffold,SSo,TUPOR_,TUPOR,SESY,ASER
0,GB_GA_new_mut_r_0.01_0,sim,murcko,980527.0,54/115,0.469565,0.306623,0.018382
1,GB_GA_new_mut_r_0.01_1,sim,murcko,963595.0,42/100,0.42,0.222738,0.021185
2,GB_GA_new_mut_r_0.01_2,sim,murcko,970465.0,21/58,0.362069,0.242199,0.001758
3,GB_GA_new_mut_r_0.01_3,sim,murcko,968654.0,28/51,0.54902,0.233539,0.003531
4,GB_GA_new_mut_r_0.01_4,sim,murcko,969662.0,20/40,0.5,0.235843,0.000611
5,GB_GA_new_mut_r_0.01_mean,sim,murcko,970580.6,-,0.460131,0.248188,0.009093


## Combining and Normalizing Metrics

The following cell runs functions that:

- merge the mean values of all metrics into a single `pandas.DataFrame` (using `connect_mean_value`)
- apply Min-Max normalization to scale the values (using `connect_mean_value_normalized`)


In [17]:
from src import metrics_connection # Importing custom metric functions
imp.reload(metrics_connection)

<module 'src.metrics_connection' from '/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/src/metrics_connection.py'>

In [24]:
for receptor in ['Glucocorticoid_receptor']:
    for type_scaffold in ['csk', 'murcko']:
        for type_cluster in ['dis', 'sim']:  # Different cluster types
            for subset in ['']:
            
                # Define generator names with different epsilon values
                generators_name_list = [
                    #f"Molpher{subset}",
                    #f"REINVENT{subset}",
                    #f"DrugEx_GT_epsilon_0.1{subset}",
                    #f"DrugEx_GT_epsilon_0.6{subset}",
                    #f"DrugEx_RNN_epsilon_0.1{subset}",
                    #f"DrugEx_RNN_epsilon_0.6{subset}",
                    f"GB_GA_mut_r_0.01{subset}",
                    f"GB_GA_new_mut_r_0.01{subset}",
                    #f"GB_GA_mut_r_0.5{subset}",
                    #f"addcarbon{subset}"
                ]
    
                # Connect and process mean values
                df = metrics_connection.connect_mean_value(type_cluster, type_scaffold, generators_name_list, receptor, subset)
                df1 = metrics_connection.connect_mean_value_normalized(type_cluster, type_scaffold, generators_name_list, receptor, subset)
                
                display(df[['name','type_cluster','scaffold','TUPOR','SESY','ASER']])
                display(df1[['name','type_cluster','scaffold','TUPOR','SESY','ASER']])

Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,GB_GA_mut_r_0.01_mean,dis,csk,0.506975,0.155468,0.003135
1,GB_GA_new_mut_r_0.01_mean,dis,csk,0.725347,0.054053,0.01572


Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,GB_GA_mut_r_0.01_mean,dis,csk,0.0,1.0,0.0
1,GB_GA_new_mut_r_0.01_mean,dis,csk,1.0,0.0,1.0


Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,GB_GA_mut_r_0.01_mean,sim,csk,0.557163,0.158923,0.006467
1,GB_GA_new_mut_r_0.01_mean,sim,csk,0.853005,0.055377,0.022311


Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,GB_GA_mut_r_0.01_mean,sim,csk,0.0,1.0,0.0
1,GB_GA_new_mut_r_0.01_mean,sim,csk,1.0,0.0,1.0


Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,GB_GA_mut_r_0.01_mean,dis,murcko,0.097065,0.40087,0.000358
1,GB_GA_new_mut_r_0.01_mean,dis,murcko,0.281614,0.243719,0.006392


Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,GB_GA_mut_r_0.01_mean,dis,murcko,0.0,1.0,0.0
1,GB_GA_new_mut_r_0.01_mean,dis,murcko,1.0,0.0,1.0


Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,GB_GA_mut_r_0.01_mean,sim,murcko,0.157545,0.406021,0.001954
1,GB_GA_new_mut_r_0.01_mean,sim,murcko,0.460131,0.248188,0.009093


Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,GB_GA_mut_r_0.01_mean,sim,murcko,0.0,1.0,0.0
1,GB_GA_new_mut_r_0.01_mean,sim,murcko,1.0,0.0,1.0
