
## Mechanisms of Action
("Mechanism of action", n.d.)

In pharmacology, the term Mechanism of Action (MoA) refers to the specific biochemical interaction through which a drug substance produces its pharmacological effect. A mechanism of action usually includes mention of the specific molecular targets to which the drug binds, such as an enzyme or receptor. Receptor sites have specific affinities for drugs based on the chemical structure of the drug, as well as the specific action that occurs there.

In this competition, the task is predicting multiple targets of the Mechanism of Action (MoA) responses of different samples. Samples are drugs profiled at different time points and doses. Dataset consists of various group of features and there are more than two hundred targets of enzymes and receptors.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_error, log_loss

from xgboost import XGBRegressor

In [None]:
sample = pd.read_csv("../input/lish-moa/sample_submission.csv")

test_features = pd.read_csv("../input/lish-moa/test_features.csv")
train_features = pd.read_csv("../input/lish-moa/train_features.csv")
train_score = pd.read_csv("../input/lish-moa/train_targets_scored.csv")

In [None]:
sample.head()

In [None]:
test_features

In [None]:
train_features

In [None]:
train_score

## Features
* sig_id is the unique sample id
* Features with g- prefix are gene expression features and there are 772 of them (from g-0 to g-771)
* Features with c- prefix are cell viability features and there are 100 of them (from c-0 to g-99)
* cp_type is a binary categorical feature which indicates the samples are treated with a compound or with a control perturbation (trt_cp or ctl_vehicle)
* cp_time is a categorical feature which indicates the treatment duration (24, 48 or 72 hours)
* cp_dose is a binary categorical feature which indicates the dose is low or high (D1 or D2)

In [None]:
g_features = [feature for feature in train_features.columns if feature.startswith('g-')]
c_features = [feature for feature in train_features.columns if feature.startswith('c-')]
other_features = [feature for feature in train_features.columns if feature not in g_features and feature not in c_features]
                                                            

print(f'Number of g- Features: {len(g_features)}')
print(f'Number of c- Features: {len(c_features)}')
print(f'Number of Other Features: {len(other_features)} ({other_features})')

In [None]:
def preprocess(df):
    df = df.copy()
    df.loc[:, 'cp_type'] = df.loc[:, 'cp_type'].map({'trt_cp': 0, 'ctl_vehicle': 1})
    df.loc[:, 'cp_dose'] = df.loc[:, 'cp_dose'].map({'D1': 0, 'D2': 1})
    del df['sig_id']
    return df

train = preprocess(train_features)
test = preprocess(test_features)
del train_score['sig_id']

In [None]:
test

In [None]:
def metric(y_true, y_pred):
    metrics = []
    metrics.append(log_loss(y_true, y_pred.astype(float), labels=[0,1]))
    return np.mean(metrics)

In [None]:
cols = train_score.columns
submission = sample.copy()
submission.loc[:,train_score.columns] = 0
#test_preds = np.zeros((test.shape[0], train_score.shape[1]))
N_SPLITS = 5
oof_loss = 0

for c, column in enumerate(cols,1):
    y = train_score[column]
    total_loss = 0
    
    for fn, (trn_idx, val_idx) in enumerate(KFold(n_splits = N_SPLITS, shuffle = True).split(train)):
        print('Fold: ', fn+1)
        X_train, X_val = train.iloc[trn_idx], train.iloc[val_idx]
        y_train, y_val = y.iloc[trn_idx], y.iloc[val_idx]
        
        model = XGBRegressor(tree_method = 'gpu_hist',
                         min_child_weight = 31.58,
                         learning_rate = 0.05,
                         colsample_bytree = 0.65,
                         gamma = 3.69,
                         max_delta_step = 2.07,
                         max_depth = 10,
                         n_estimators = 166,
                         subsample = 0.86)
    
        model.fit(X_train, y_train)
        pred = model.predict(X_val)
        #pred = [n if n>0 else 0 for n in pred]
        loss = metric(y_val,pred)
        total_loss += loss
        predictions = model.predict(test)
        #predictions = [n if n>0 else 0 for n in predictions]
        submission[column] += predictions/N_SPLITS
        
    #submission[column] = submission[column]/N_SPLITS
    oof_loss += total_loss/N_SPLITS
    print("Model "+str(c)+": Loss ="+str(total_loss/N_SPLITS))

In [None]:
oof_loss/206

In [None]:
submission

In [None]:
submission.loc[test['cp_type']==1, train_score.columns] = 0

In [None]:
submission.to_csv('submission.csv', index=False)