# Composite Score Calculation for Analog Compounds

### Objective:
The objective is to calculate a **composite score** for each **analog compound** based on the activity, similarity, and rank of related compounds.

### Components of the Composite Score:
The composite score is based on three key aspects:
1. **Number of Binders (Aspect 1)**:
   - For each analog compound, we count the number of **cache challenge compounds** that have a **KD (M) < 1**.

   
2. **Position Based on Similarity (Aspect 2)**:
   - We rank the **cache challenge compounds** associated with each analog compound by **similarity**. The **cache challenge compound** with the highest similarity is given **position 1**, the second-highest receives **position 2**, and so on. 


3. **Total Similarity (Aspect 3)**:
   - The **sum of the similarity values** for all **cache challenge compounds** associated with the analog compound is computed. This gives an overall measure of how similar the compounds are to the analog compound.

## Composite Score Formula:
The **composite score** for each analog compound is calculated using the following formula:

$$
\text{Composite Score} = w_1 \cdot N_{\text{binders}} + w_2 \cdot P_{\text{binders}} + w_3 \cdot \text{Total Similarity}
$$


Where:
-  N is the number of binders (cache challenge compounds with **KD < 1**),
-  P_binders is the **position value** based on similarity (higher similarity compounds are prioritized),
-  Total Similarity is the sum of all **similarity values** for the cache challenge compounds associated with the analog.


The weights \( w_1 \), \( w_2 \), and \( w_3 \) are set to **1** by default, meaning each aspect has equal influence in the final composite score.

### Output:
- The **composite score** for each analog compound.


In [1]:
import pandas as pd
import numpy as np

# Load the Excel file
file_path = 'top5_similar_per_analog_cleaned.xlsx'  # Your file name
Top_5_similar_compounds_per_MU3116_analogs = pd.read_excel(file_path)

# 1. **Define the function to calculate composite score**

def calculate_composite_score(df):
    # Initialize an empty list to hold the composite scores
    composite_scores = []
    
    # Iterate through each unique analog compound
    for analog in df['Analog Compound'].unique():
        # Filter the rows corresponding to the current analog compound
        analog_df = df[df['Analog Compound'] == analog]
        
        # Aspect 1: Count the number of binders (KD < 1)
        num_binders = (analog_df['Cache KD (M)'] < 1).sum()

        # Aspect 2: Calculate position based on similarity (higher similarity -> higher priority)
        analog_df_sorted = analog_df.sort_values(by='Similarity', ascending=False)
        analog_df_sorted['Position'] = np.arange(1, len(analog_df_sorted) + 1)
        position_value = 1 / analog_df_sorted['Position'].iloc[0]  # First position has highest similarity
        
        # Aspect 3: Sum of similarity values
        total_similarity = analog_df_sorted['Similarity'].sum()
        
        # Calculate composite score (weights are assumed to be equal, adjust as necessary)
        w1, w2, w3 = 1, 1, 1
        composite_score = w1 * num_binders + w2 * position_value + w3 * total_similarity
        
        composite_scores.append((analog, composite_score))
    
    # Create a DataFrame with the composite scores
    composite_score_df = pd.DataFrame(composite_scores, columns=['Analog Compound', 'Composite Score'])
    
    # Merge the composite scores back into the original dataframe to get one entry per analog
    df_composite = df.drop_duplicates(subset='Analog Compound').merge(composite_score_df, on='Analog Compound', how='left')
    
    return df_composite

# 2. **Calculate composite scores**
df_composite_scores = calculate_composite_score(Top_5_similar_compounds_per_MU3116_analogs)

# 3. **View the resulting DataFrame with composite scores**
print(df_composite_scores)


    Analog Compound                           Cache Challenge Compound  \
0          Analog 1  Cc1ccc2c(c1)c(cc(c1cccs1)n2)C(=O)Nc1cccc(c1)C(...   
1         Analog 10  CCNC(=O)NCc1ccc(cc1)NC(=O)c1cc(c2ccccc2)nc2ccc...   
2        Analog 100           CC(C)c1cc(C(=O)N2CC(C2)C(=O)N)c2ccccc2n1   
3        Analog 101  C1COCCN1c1ccccc1NC(=O)c1cc(c2ccccc2)nc2c1cnn2C...   
4        Analog 102  C1CC1NS(=O)(=O)c1ccc(cc1)NC(=O)c1cc(c2ccccc2)n...   
..              ...                                                ...   
269       Analog 95  C(c1cccc(c1)F)n1c2c(cn1)c(cc(c1ccccc1)n2)C(=O)...   
270       Analog 96  CCNC(=O)NCc1ccc(cc1)NC(=O)c1cc(c2ccccc2)nc2ccc...   
271       Analog 97  CC(=O)N1CCc2cc(ccc12)NC(=O)c1cc(c2ccc(C)o2)nc2...   
272       Analog 98  Cc1ccc(c2cc(C(=O)Nc3cccc(c3)C(=O)NCCC(=O)N)c3c...   
273       Analog 99   Cc1cc(NC(=O)c2cc(c3ccccc3)nc3c2cnn3Cc2cccnc2)no1   

     Cache KD (M)  Similarity  Composite Score  
0        1.000000    1.000000         5.443163  
1        1.00

In [4]:
# Optionally, you can save the resulting DataFrame to a new Excel file
df_composite_scores.to_excel('top_5_analog_with_composite_scores.xlsx', index=False)


## Example Code on Analog 1

In [5]:
# Print out individual components for Analog 1
analog_df = Top_5_similar_compounds_per_MU3116_analogs[Top_5_similar_compounds_per_MU3116_analogs['Analog Compound'] == 'Analog 1']

# Aspect 1: Number of binders
num_binders = (analog_df['Cache KD (M)'] < 1).sum()
print(f"Number of Binders: {num_binders}")

# Aspect 2: Position value (based on similarity)
analog_df_sorted = analog_df.sort_values(by='Similarity', ascending=False)
analog_df_sorted['Position'] = np.arange(1, len(analog_df_sorted) + 1)
position_value = 1 / analog_df_sorted['Position'].iloc[0]  # Highest similarity gets position 1
print(f"Position Value: {position_value}")

# Aspect 3: Sum of similarities
total_similarity = analog_df_sorted['Similarity'].sum()
print(f"Total Similarity: {total_similarity}")

# Composite score calculation
composite_score = num_binders + position_value + total_similarity
print(f"Composite Score: {composite_score}")


Number of Binders: 1
Position Value: 1.0
Total Similarity: 3.4431628493083504
Composite Score: 5.443162849308351


# Ranking the Analogs based on composite score

In [7]:
# Step 1: Load the Excel file
file_path = 'top_5_analog_with_composite_scores.xlsx'  # Update the path if needed
df_composite_scores = pd.read_excel(file_path)

# Step 2: Sort the DataFrame based on the 'Composite Score' in descending order
df_sorted = df_composite_scores.sort_values(by='Composite Score', ascending=False)

# Step 3: Optionally, save the sorted DataFrame back to an Excel file
df_sorted.to_excel('top_5_analog_sorted_by_composite_scores.xlsx', index=False)

# Display the sorted DataFrame (optional)
df_sorted.head()  # Show the first few rows of the sorted data


Unnamed: 0,Analog Compound,Cache Challenge Compound,Cache KD (M),Similarity,Composite Score
0,Analog 1,Cc1ccc2c(c1)c(cc(c1cccs1)n2)C(=O)Nc1cccc(c1)C(...,1.0,1.0,5.443163
35,Analog 130,Cc1c(ccc(c2cccs2)n1)C(=O)Nc1cccc(c1)C(=O)N(C)C...,1.0,0.625,5.428763
171,Analog 253,Cc1c(ccc(c2cccs2)n1)C(=O)Nc1cccc(c1)C(=O)N(C)C...,1.0,0.52,5.426065
255,Analog 82,Cc1ccc2c(c1)c(cc(c1cccs1)n2)C(=O)Nc1cccc(c1)C(...,1.0,0.833333,5.316755
60,Analog 153,Cc1c(ccc(c2cccs2)n1)C(=O)Nc1cccc(c1)C(=O)N(C)C...,1.0,0.571429,5.291518
