**Drug Discovery -- Mechanism of Action**

**Gene Expression**

Gene expression is the process by which information from a gene is used to synthesize a functional gene product, i.e. a protein. These proteins ultimately create a person's phenotype, which is the observable traits that are expressed through someone's genotype. With the correct molecular formulation, genetic transcription pathways can be inhibited or catalyzed by a given medication, and by manipulating these pathways humans are able to alter the chemistry within our bodies and fight cancer or treat hypertension for example.

Recording and cataloging gene expression data is especially important for pharmaceutical development since the goal of any medication is to modulate a transcriptional pathway, and through repetition or trials, trends may be observed to determine its safety or lack thereof within an in vitro and then in-human setting.

**Cell Viability**

Cell viability is a measurement of the total live, healthy cells within a given sample. Assays are used to quantify factors such as metabolic activity, presence of ATP and cell proliferation, and also the toxicity or markers signifying the death of a cell. When introducing investigational compounds within an in-vitro environment, possessing the ability to quantify the enhancement or inhibition of certain cellular processes becomes extremely important because these metrics are used to scale the effectiveness and/or harmfulness of the compound within the human body. Understanding how well a compound is absorbed may be of particular concern to clinical researchers, because a negative downstream effect could be blood toxicity due to compounds that cannot be metabolized, or a positive downstream effect from proper absorption could be the proliferation of healthy cells vs. harmful cells.

For example, a PD-L1 checkpoint inhibitor is a class of drugs meant to interupt the binding of PD-L1 to a PD-1 receptor. Cancer cells express the PD-L1 protein and they use it to bind to an immune cell's PD-1 receptor, thus helping the cancer cells avoid being detected as a foreign/threatening object. So PD-L1 receptors act as an inhibiting force, preventing the binding of the molecules and leaving the cancer cells open to immune system eradication.


In [None]:

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        

In [None]:
import pandas as pd 
import matplotlib.pyplot as plt
import numpy as np 

import pickle
import time

from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.model_selection import KFold
from sklearn.metrics import log_loss, accuracy_score


In [None]:
test_features = pd.read_csv('/kaggle/input/lish-moa/test_features.csv')
train_features = pd.read_csv('/kaggle/input/lish-moa/train_features.csv')
tt_nonscored = pd.read_csv('/kaggle/input/lish-moa/train_targets_nonscored.csv')
tt_scored = pd.read_csv('/kaggle/input/lish-moa/train_targets_scored.csv')
ss = pd.read_csv('/kaggle/input/lish-moa/sample_submission.csv')

In [None]:
g_cols = [col for col in train_features if 'g-' in col]
c_cols = [col for col in train_features if 'c-' in col]

tr_gene_df = train_features.loc[:, 'g-0':'g-771']
tr_cell_df = train_features.loc[:, 'c-0':]

tr_cols = train_features.loc[:, 'cp_type':]
tr_cols.shape, tt_scored.shape

test_cols = test_features.loc[:, 'cp_type':]
tr_cols.shape, tt_scored.shape, test_cols.shape

In [None]:
full_dfs = [train_features, test_features, tt_scored]
def col_drop(df):
    df = df.drop(columns=['sig_id'], axis=1, inplace=True)
    return df
    
    
for df in full_dfs:
    col_drop(df)

    
dfs = [tr_cols, test_cols]


def cleaner(df):
    df['cp_type'] = df['cp_type'].map({'ctl_vehicle': 0, 'trt_cp': 1})
    df['cp_time'] = df['cp_time'].map({24: 1, 48: 2, 72: 3})
    df['cp_dose'] = df['cp_dose'].map({'D1': 0 , 'D2': 1})
    return df


for df in dfs:
    cleaner(df)

In [None]:
tr_cols

In [None]:
#keep_idx_test = test_features[test_features.cp_type != 0].index
#keep_idx_train = train_features[train_features.cp_type != 0].index

#test_cols = test_cols.loc[keep_idx_test]
#tr_cols = tr_cols.loc[keep_idx_train]
#tt_scored = tt_scored.loc[keep_idx_train]

In [None]:
col_list = ['g-496', 'g-333', 'g-676', 'g-127', 'g-39', 'g-360', 'g-28', 'g-19', 'g-184', 'g-110', 'g-687', 'g-216',
            'g-15', 'g-626', 'g-393', 'g-667', 'g-164', 'g-688', 'g-754', 'g-557', 'g-363', 'g-132', 'g-435', 'g-536',
            'g-550', 'g-481','g-611', 'g-18', 'g-756', 'g-331', 'g-618', 'g-718', 'g-370', 'g-219','g-153','g-46','g-238',
            'g-23','g-707','g-213','g-307','g-104']
dfs = [tr_cols,test_cols]
 
def outlier_drop(df, col):
    df = df.drop([col], axis=1, inplace=True)
    return df
for col in col_list:
    for df in dfs:
        outlier_drop(df, col)

In [None]:
tr_cols.shape, test_cols.shape

In [None]:
X, y, test = np.array(tr_cols), np.array(tt_scored), np.array(test_cols)

In [None]:
model = pickle.load(open('../input/moa-train-model/OvR', 'rb'))

In [None]:
model

In [None]:
kf = KFold(n_splits=10, shuffle=True, random_state=22)

In [None]:

for k_f, (tr_idx, t_idx) in enumerate(kf.split(X, y)):
    fold_start = time.time()
    
    X_train, X_val = X[tr_idx], X[t_idx]
    y_train, y_val = y[tr_idx], y[t_idx]
    
    val_preds = model.predict_proba(X_val)
    val_preds = np.array(val_preds)
    
    loss = log_loss(np.ravel(y_val), np.ravel(val_preds))
    
    preds = model.predict_proba(test)
    
    fold_end = time.time()
    print('Fold ', k_f, ',', ' log loss: ', loss)
    print('fold time: ', fold_end - fold_start)

In [None]:
ss.iloc[:,1:] = preds
ss.to_csv('submission.csv', index=False)