# Drug Synergy Prediction Task

**Definition:** Synergy is a dimensionless measure of deviation of an observed drug combination response from the expected effect of non-interaction. Synergy can be calculated using different models such as the Bliss model, Highest Single Agent (HSA), Loewe additivity model and Zero Interaction Potency (ZIP). Another relevant metric is CSS which measures the drug combination sensitivity and is derived using relative IC50 values of compounds and the area under their dose- response curves.

**Impact:** Drug combination therapy offers enormous potential for expanding the use of existing drugs and in improving their efficacy. For instance, the simultaneous modulation of multiple targets can address the common mechanisms of drug resistance in the treatment of cancers. However, experimentally exploring the entire space of possible drug combinations is not a feasible task. Computational models that can predict the therapeutic potential of drug combinations can thus be immensely valuable in guiding this exploration.

**Generalization:** It is important for model predictions to be able to adapt to varying underlying biology as captured through different cell lines drawn from multiple tissues of origin. Dosage is also an important factor that can impact model generalizability.

**Product:** Small-molecule.

**Pipeline:** Activity.


In [1]:
import pandas as pd
import numpy as np

In [None]:
# Dataset Split
from tdc.multi_pred import DrugSyn # https://tdcommons.ai/multi_pred_tasks/drugsyn
data = DrugSyn(name = 'OncoPolyPharmacology')
split = data.get_split()

Found local copy...
Loading...
Done!


## 1. Initial exploration and pre-processing
- Review all available documentation about the dataset.
- Load the dataset and perform an exploratory analysis of it.
- Perform the necessary steps for data preparation and preprocessing, including possibly generating features, selecting them, handling any missing values, etc.

### Goals:
- Describe and characterize the assigned data according to the available documentation/literature;
- Describe the characteristics of the available data based on the initial exploratory analysis;
- Describe the steps taken for data preparation and preprocessing, justifying the choices;
- Include initial exploratory graphs that illustrate the mains characteristics of the data.

In [5]:
# Checking what is data and what is split
print(type(data))
print(type(split))
print(split.keys())

<class 'tdc.multi_pred.antibodyaff.AntibodyAff'>
<class 'dict'>
dict_keys(['train', 'valid', 'test'])


In [8]:
# Access specific splits
train_data = split['train'] # Training data
val_data = split['valid']   # Validation data
test_data = split['test']   # Test data

# Convert each split into DataFrame
df_train = pd.DataFrame(train_data) # Training dataframe
df_val = pd.DataFrame(val_data)     # Validation dataframe
df_test = pd.DataFrame(test_data)   # Test dataframe

In [10]:
df_train.head()

Unnamed: 0,Antibody_ID,Antibody,Antigen_ID,Antigen,Y
0,4i2x,['EVKLQQSGPELVKPGASVKISCKASGYSFTSYYIHWVKQRPGQG...,signal-regulatory protein gamma,EEELQMIQPEKLLLVTVGKTATLHCTVTSLLPVGPVLWFRGVGPGR...,1.2e-06
1,5vyf,['EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMSWVRQAPGKG...,chimera of major allergen i polypeptide chain ...,VKMAETCPIFYDVFFAVANGNELLLDLSLTKVAATEPERTAMKKIQ...,1.87e-10
2,3eob,['EVQLVESGGGLVQPGGSLRLSCAASGYSFTGHWMNWVRQAPGKG...,integrin alpha-l,GNVDLVFLFDGSMSLQPDEFQKILDFMKDVMKKLSNTSYQFAAVQF...,2.2e-09
3,2r56,['QVSLRESGGGLVQPGRSLRLSCTASGFTFRHHGMTWVRQAPGKG...,beta-lactoglobulin,LIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDAQSAPLRVYVEEL...,1.3e-09
4,4yhz,['EVQLVETGGGVVQPGRSLRLSCTASGFTFRDYWMSWVRQAPGKG...,h3k4me3 peptide,ARTKQTARKSTG,1.1e-08


In [12]:
df_train.describe()

Unnamed: 0,Y
count,345.0
mean,1.330135e-06
std,1.209138e-05
min,4e-13
25%,7.9e-10
50%,7.8e-09
75%,6.9e-08
max,0.0002


In [96]:
print(df_train.columns)
print(df_train.shape)
print(df_train.size, "\n")

print(df_val.columns)
print(df_val.shape)
print(df_val.size, "\n")

print(df_test.columns)
print(df_test.shape)
print(df_test.size)

Index(['Antibody_ID', 'Antibody', 'Antigen_ID', 'Antigen', 'Y'], dtype='object')
(345, 5)
1725 

Index(['Antibody_ID', 'Antibody', 'Antigen_ID', 'Antigen', 'Y'], dtype='object')
(49, 5)
245 

Index(['Antibody_ID', 'Antibody', 'Antigen_ID', 'Antigen', 'Y'], dtype='object')
(99, 5)
495


## 2. Unsupervised Analysis
- Use dimensionality reduction and visualization techniques appropriate for the data;
- Apply clustering methods that considered suitable for the data.

### Goals:
- Report/analyze the results obtained from dimensionality reduction and data visualization techniques;
- Report/analyze the results obtained from the clustering algorithms.

## 3. Machine Learning
- Compare the performance of various Machine Learning models/algorithms on the dataset.

### Goals:
- Analyze the performance of the algorithms by calculating appropriate error metrics and using suitable error estimation methods;
- Present the best achievable model for the available data, using all examples, and interpret it where possible.

## 4. Deep Learning
- Use Deep Learning methods in a similar manner to **Step 3**, comparing the results with the methods presented in that step.