# Molecular Generator Evaluation using TUPOR, SESY and ASER Metrics

🔹 **Objective**  
This notebook evaluates molecular generators by computing four key metrics:  
   - **TUPOR**: scaffold recall metrics  
   - **SESY**: scaffold hopping potencial  
   - **ASER**: chemical space exploration

🔹 **Workflow**  
1️⃣ **Compute Metrics**: The script calculates TUPOR, SESY and ASER for different molecular generators.  
2️⃣ **Merge Data**: Results from multiple generators are combined into a single Pandas DataFrame.  
3️⃣ **Normalize Values**: The computed metrics are normalized using Min-Max scaling for comparison.  
4️⃣ **Save Outputs**: Processed data is stored in CSV files for further analysis.  

🔹 **Data Structure**  
- The calculations are performed for different **scaffold types** (`csk`, `murcko`) and **cluster types** (`dis`, `sim`).  
- Results are computed for multiple **generators** (`Molpher`, `REINVENT`, `DrugEx`, `GB_GA`, `AddCarbon`).  
- The analysis is conducted for a specific **biological target receptor**, such as the **Glucocorticoid receptor**.

This notebook allows us to compare the performance of various molecular generators in terms of structural diversity, similarity to known bioactive compounds, and synthetic feasibility.

# Loading required libraries

In [38]:
from src import metrics # Importing custom metric functions
import importlib as imp
imp.reload(metrics)

<module 'src.metrics' from '/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/src/metrics.py'>

# Function to calculate metrics

In [39]:
def calculate_metrics(type_cluster, type_scaffold,generator, receptor, ncpus = 1):
    """
    Function to calculate molecular generation metrics.
    
    Parameters:
    - scaffold_type: Type of scaffold (e.g., 'csk' or 'murcko')
    - type_cluster: Cluster type  (e.g., 'dis' or 'sim') dis = Dissimilarity split; sim = Similarity split
    - generator: Name of the molecular generator (e.g. 'Molpher', 'REINVENT', 'DrugEx_GT_epsilon_0.1', 'DrugEx_GT_epsilon_0.6'
        'DrugEx_RNN_epsilon_0.1', 'DrugEx_RNN_epsilon_0.6', 'GB_GA_mut_r_0.01', 'GB_GA_mut_r_0.5', 'GB_GA_log_p_mut_r_0.01','GB_GA_log_p_mut_r_0.5', 'addcarbon')
    - receptor: Target receptor for drug design (e.g. 'Glucocorticoid_receptor', 'Leukocyte_elastase')

    Returns:
    - Computed metrics
    """
    mt = metrics.Metrics(type_cluster, type_scaffold, generator, receptor, ncpus)     
    result = mt.calculate_metrics()
    display(result)
    return result

# Define parameters for metric calculations

In [3]:
type_cluster = 'sim' #options: 'dis'|'sim' 
type_scaffold = 'csk' #options: 'csk'|'murcko'
generator = 'Molpher' #options: 'Molpher'|'DrugEx_GT_epsilon_0.1'|'REINVENT'|'addcarbon' |'GB_GA_mut_r_0.5'|'GB_GA_log_p_mut_r_0.01'
receptor = 'Leukocyte_elastase' #options: 'Glucocorticoid_receptor'|'Leukocyte_elastase'

calculate_metrics(type_cluster,type_scaffold,generator,receptor, ncpus = 10)

NUMBER:  0


[10:03:34] Explicit valence for atom # 12 C, 5, is greater than permitted
[10:03:49] Explicit valence for atom # 28 C, 5, is greater than permitted
[10:03:49] Explicit valence for atom # 28 C, 5, is greater than permitted
[10:03:49] Explicit valence for atom # 27 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 7 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 3 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 7 C, 5, is greater than permitted
[10:04:05] Explicit valence for atom # 7 C, 5, 

NUMBER:  1


[10:08:55] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:08:55] Explicit valence for atom # 23 C, 5, is greater than permitted
[10:08:55] Explicit valence for atom # 23 C, 5, is greater than permitted
[10:08:55] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:08:55] Explicit valence for atom # 23 C, 5, is greater than permitted
[10:09:48] Explicit valence for atom # 11 C, 5, is greater than permitted
[10:09:48] Explicit valence for atom # 11 C, 5, is greater than permitted
[10:09:48] Explicit valence for atom # 11 C, 5, is greater than permitted
[10:10:01] Explicit valence for atom # 2 C, 5, is greater than permitted
[10:10:01] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:10:01] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:10:07] Explicit valence for atom # 7 C, 5, is greater than permitted
[10:10:48] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:12:09] Explicit valence for atom # 1

NUMBER:  2


[10:15:35] Explicit valence for atom # 10 C, 5, is greater than permitted
[10:16:46] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:16:46] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 6 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 6 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 25 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 6 C, 5, is greater than permitted
[10:17:34] Explicit valence for atom # 6 C, 5, is greater than permitted
[10:17:53] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:17:58] Explicit valence for atom # 1 C, 5, is greater than permitted
[10:17:58] Explicit valence for atom # 1 C, 5, is greater than permitted
[10:17:58] Explicit valence for atom # 1 C, 5, is greater than permitted
[10:18:11] Explicit valence for atom # 7 C, 5

NUMBER:  3


[10:21:36] Explicit valence for atom # 10 C, 5, is greater than permitted
[10:21:54] Explicit valence for atom # 1 C, 5, is greater than permitted
[10:21:56] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:21:56] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:23:58] Explicit valence for atom # 26 C, 5, is greater than permitted
[10:23:58] Explicit valence for atom # 25 C, 5, is greater than permitted
[10:23:58] Explicit valence for atom # 25 C, 5, is greater than permitted
[10:24:35] Explicit valence for atom # 31 C, 5, is greater than permitted
[10:24:35] Explicit valence for atom # 26 C, 5, is greater than permitted
[10:24:35] Explicit valence for atom # 27 C, 5, is greater than permitted
[10:24:42] Explicit valence for atom # 29 C, 5, is greater than permitted
[10:25:23] Explicit valence for atom # 15 C, 5, is greater than permitted


NUMBER:  4


[10:27:38] Explicit valence for atom # 25 C, 5, is greater than permitted
[10:27:38] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:27:38] Explicit valence for atom # 24 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 16 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 14 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:17] Explicit valence for atom # 15 C, 5, is greater than permitted
[10:28:21] Explicit valence for atom # 21 C, 5, is greater than permitted
[10:28:21] Explicit valence for atom # 21 C, 5, is greater than permitted
[10:28:21] Explicit valence for atom #

Unnamed: 0,name,type_cluster,scaffold,SSo,TUPOR_,TUPOR,SESY,ASER
0,Molpher_0,sim,csk,1624700.0,43/46,0.934783,0.138021,0.03143
1,Molpher_1,sim,csk,1662823.0,38/44,0.863636,0.130204,0.009258
2,Molpher_2,sim,csk,1546835.0,36/42,0.857143,0.12128,0.007056
3,Molpher_3,sim,csk,1749302.0,32/42,0.761905,0.123391,0.011643
4,Molpher_4,sim,csk,1722174.0,32/42,0.761905,0.123599,0.003529
5,Molpher_mean,sim,csk,1661166.8,-,0.835874,0.127299,0.012583


Unnamed: 0,name,type_cluster,scaffold,SSo,TUPOR_,TUPOR,SESY,ASER
0,Molpher_0,sim,csk,1624700.0,43/46,0.934783,0.138021,0.03143
1,Molpher_1,sim,csk,1662823.0,38/44,0.863636,0.130204,0.009258
2,Molpher_2,sim,csk,1546835.0,36/42,0.857143,0.12128,0.007056
3,Molpher_3,sim,csk,1749302.0,32/42,0.761905,0.123391,0.011643
4,Molpher_4,sim,csk,1722174.0,32/42,0.761905,0.123599,0.003529
5,Molpher_mean,sim,csk,1661166.8,-,0.835874,0.127299,0.012583


# Execute metric calculation function

In [None]:
for receptor in ['Glucocorticoid_receptor', 'Leukocyte_elastase']:
    for type_scaffold in ['csk','murcko']:
        for type_cluster in ['dis','sim']:
            for subset in ['_500k', '_250k', '_125k', '_62.5k']:
            #for subset in ['_62.5k']:
                ncpus = 10
                
                # Define generator names with different epsilon values
                generators_name_list = [
                    #f"Molpher{subset}",
                    #f"REINVENT{subset}",
                    #f"DrugEx_GT_epsilon_0.1{subset}",
                    #f"DrugEx_GT_epsilon_0.6{subset}",
                    #f"DrugEx_RNN_epsilon_0.1{subset}",
                    #f"DrugEx_RNN_epsilon_0.6{subset}",
                    f"GB_GA_mut_r_0.01{subset}",
                    f"GB_GA_mut_r_0.5{subset}",
                    f"GB_GA_log_p_mut_r_0.01{subset}",
                    f"GB_GA_log_p_mut_r_0.5{subset}",
                    #f"addcarbon{subset}"
                ]
                for generator in generators_name_list:
                    print(generator)
                    calculate_metrics(type_cluster,type_scaffold,generator,receptor,ncpus = ncpus)

GB_GA_mut_r_0.01_500k
NUMBER:  0
NUMBER:  1
NUMBER:  2
NUMBER:  3
NUMBER:  4


Unnamed: 0,name,type_cluster,scaffold,SSo,TUPOR_,TUPOR,SESY,ASER
0,GB_GA_mut_r_0.01_500k_0,dis,csk,486596.0,27/40,0.675,0.073186,0.011313
1,GB_GA_mut_r_0.01_500k_1,dis,csk,484589.0,16/23,0.695652,0.068559,0.00206
2,GB_GA_mut_r_0.01_500k_2,dis,csk,487086.0,23/40,0.575,0.077147,0.002164
3,GB_GA_mut_r_0.01_500k_3,dis,csk,485464.0,23/37,0.621622,0.067663,0.028089
4,GB_GA_mut_r_0.01_500k_4,dis,csk,480842.0,34/43,0.790698,0.070058,0.034716
5,GB_GA_mut_r_0.01_500k_mean,dis,csk,484915.4,-,0.671594,0.071323,0.015668


GB_GA_mut_r_0.5_500k
NUMBER:  0


[22:54:33] Explicit valence for atom # 6 C, 5, is greater than permitted
[22:54:35] Explicit valence for atom # 5 C, 5, is greater than permitted
[22:54:39] Explicit valence for atom # 7 C, 5, is greater than permitted
[22:54:40] Explicit valence for atom # 12 C, 5, is greater than permitted
[22:54:49] Explicit valence for atom # 13 C, 5, is greater than permitted
[22:54:51] Explicit valence for atom # 26 C, 5, is greater than permitted
[22:54:53] Explicit valence for atom # 1 C, 5, is greater than permitted
[22:54:57] Explicit valence for atom # 15 C, 5, is greater than permitted
[22:54:57] Explicit valence for atom # 6 C, 5, is greater than permitted
[22:54:57] Explicit valence for atom # 8 C, 5, is greater than permitted
[22:54:59] Explicit valence for atom # 20 C, 5, is greater than permitted
[22:55:03] Explicit valence for atom # 5 C, 5, is greater than permitted
[22:55:09] Explicit valence for atom # 9 C, 5, is greater than permitted
[22:55:12] Explicit valence for atom # 4 C, 5,

NUMBER:  1


[22:56:19] Explicit valence for atom # 6 C, 5, is greater than permitted
[22:56:22] Explicit valence for atom # 29 C, 5, is greater than permitted
[22:56:26] Explicit valence for atom # 3 C, 5, is greater than permitted
[22:56:37] Explicit valence for atom # 25 C, 5, is greater than permitted
[22:56:51] Explicit valence for atom # 3 C, 5, is greater than permitted
[22:56:58] Explicit valence for atom # 19 C, 5, is greater than permitted
[22:57:03] Explicit valence for atom # 11 C, 5, is greater than permitted
[22:57:04] Explicit valence for atom # 3 C, 5, is greater than permitted


NUMBER:  2


[22:57:46] Explicit valence for atom # 15 C, 5, is greater than permitted
[22:57:46] Explicit valence for atom # 16 C, 5, is greater than permitted


## Combining and Normalizing Metrics

The following cell runs functions that:

- merge the mean values of all metrics into a single `pandas.DataFrame` (using `connect_mean_value`)
- apply Min-Max normalization to scale the values (using `connect_mean_value_normalized`)


In [42]:
from src import metrics_connection # Importing custom metric functions
imp.reload(metrics_connection)

<module 'src.metrics_connection' from '/home/filv/phd_projects/iga_2023/git_reccal/new/recall_metrics/src/metrics_connection.py'>

In [58]:
for receptor in ['Leukocyte_elastase']:
    for type_scaffold in ['murcko']:
        for type_cluster in ['sim']:  # Different cluster types
            for subset in ['']:
            
                # Define generator names with different epsilon values
                generators_name_list = [
                    f"Molpher{subset}",
                    f"REINVENT{subset}",
                    f"DrugEx_GT_epsilon_0.1{subset}",
                    f"DrugEx_GT_epsilon_0.6{subset}",
                    f"DrugEx_RNN_epsilon_0.1{subset}",
                    f"DrugEx_RNN_epsilon_0.6{subset}",
                    f"GB_GA_mut_r_0.01{subset}",
                    f"GB_GA_mut_r_0.5{subset}",
                    f"GB_GA_log_p_mut_r_0.01{subset}",
                    f"GB_GA_log_p_mut_r_0.5{subset}",
                    f"addcarbon{subset}"
                ]
    
                # Connect and process mean values
                df = metrics_connection.connect_mean_value(type_cluster, type_scaffold, generators_name_list, receptor, subset)
                df1 = metrics_connection.connect_mean_value_normalized(type_cluster, type_scaffold, generators_name_list, receptor, subset)
                print(receptor)
                display(df[['name','type_cluster','scaffold','TUPOR','SESY','ASER']])
                display(df1[['name','type_cluster','scaffold','TUPOR','SESY','ASER']])

Leukocyte_elastase


Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,Molpher_mean,sim,murcko,0.465611,0.304545,0.004538
1,REINVENT_mean,sim,murcko,0.292322,0.569795,0.001211
2,DrugEx_GT_epsilon_0.1_mean,sim,murcko,0.427641,0.574657,0.005542
3,DrugEx_GT_epsilon_0.6_mean,sim,murcko,0.645077,0.614167,0.008375
4,DrugEx_RNN_epsilon_0.1_mean,sim,murcko,0.104826,0.418176,0.018123
5,DrugEx_RNN_epsilon_0.6_mean,sim,murcko,0.562998,0.501719,0.011521
6,GB_GA_mut_r_0.01_mean,sim,murcko,0.476428,0.259681,0.019232
7,GB_GA_mut_r_0.5_mean,sim,murcko,0.559051,0.228702,0.021894
8,GB_GA_log_p_mut_r_0.01_mean,sim,murcko,0.067379,0.256611,0.003444
9,GB_GA_log_p_mut_r_0.5_mean,sim,murcko,0.071259,0.257604,0.005037


Unnamed: 0,name,type_cluster,scaffold,TUPOR,SESY,ASER
0,Molpher_mean,sim,murcko,0.689343,0.491603,0.160832
1,REINVENT_mean,sim,murcko,0.389378,0.927141,0.0
2,DrugEx_GT_epsilon_0.1_mean,sim,murcko,0.623617,0.935126,0.209374
3,DrugEx_GT_epsilon_0.6_mean,sim,murcko,1.0,1.0,0.346341
4,DrugEx_RNN_epsilon_0.1_mean,sim,murcko,0.064821,0.678185,0.817648
5,DrugEx_RNN_epsilon_0.6_mean,sim,murcko,0.857922,0.815362,0.498441
6,GB_GA_mut_r_0.01_mean,sim,murcko,0.708068,0.417936,0.871257
7,GB_GA_mut_r_0.5_mean,sim,murcko,0.851088,0.367069,1.0
8,GB_GA_log_p_mut_r_0.01_mean,sim,murcko,0.0,0.412895,0.107938
9,GB_GA_log_p_mut_r_0.5_mean,sim,murcko,0.006715,0.414526,0.184953
