# Drug

This notebook serves as an example of a use case of the `Drug` class. It can also be used to analyze the time taken to train a model and to optimize each of the methods used.

In [3]:
from classes import drug
from methods import fs
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
import config as c
from sklearn.feature_selection import SelectKBest, SelectPercentile, f_regression, mutual_info_regression, SelectFromModel, VarianceThreshold

In [2]:
gdsc_ge = pd.read_csv(c.dir + 'gdsc_cell_ge.csv').fillna(0).set_index('CCL')
ctrp_ge = pd.read_csv(c.dir + 'ctrp_cell_ge.csv').fillna(0).set_index('CCL')
gdsc_dr = pd.read_csv(c.dir + 'gdsc_poz_dr.csv').fillna(0)
ctrp_dr = pd.read_csv(c.dir + 'ctrp_poz_dr.csv').fillna(0)

In [13]:
%%time
aag = drug('17-AAG', {'ctrp': ctrp_ge, 'gdsc': gdsc_ge}, {'ctrp': ctrp_dr, 'gdsc': gdsc_dr})

CPU times: user 878 ms, sys: 106 ms, total: 985 ms
Wall time: 990 ms


In [14]:
%%time
aag.pre()

CPU times: user 467 ms, sys: 253 ms, total: 720 ms
Wall time: 722 ms


In [15]:
%%time
aag.combine()

CPU times: user 453 ms, sys: 344 ms, total: 796 ms
Wall time: 798 ms


In [16]:
%%time
aag.split()

CPU times: user 136 ms, sys: 102 ms, total: 238 ms
Wall time: 238 ms


In [17]:
%%time
aag.fs(f_regression, n=0.01)

After fs (276, 206) (827, 206)
CPU times: user 349 ms, sys: 167 ms, total: 516 ms
Wall time: 450 ms


In [18]:
%%time
aag.feda()

CPU times: user 902 ms, sys: 605 ms, total: 1.51 s
Wall time: 1.54 s


In [20]:
%%time
aag.train(DecisionTreeRegressor())

CPU times: user 253 ms, sys: 49.1 ms, total: 302 ms
Wall time: 302 ms


Here we look at how much relative importance is given to global vs local features (created by FEDA) by the trained model.

In [23]:
importances = {0:[],1:[],2:[]}
e=0
for j in aag.model.feature_importances_:
    importances[e].append(j*100)
    if e == 2:
        e = 0
    else:
        e += 1

In [24]:
pd.DataFrame.from_dict(importances).rename(columns={0:'Global', 1:'CTRP', 2:'GDSC'}).describe()

Unnamed: 0,Global,CTRP,GDSC
count,206.0,206.0,206.0
mean,0.282271,0.084822,0.118344
std,0.603475,0.352519,0.334161
min,0.0,0.0,0.0
25%,0.0,0.0,0.0
50%,0.004014,0.0,0.001011
75%,0.204789,0.0,0.051122
max,3.781251,2.732136,2.307366


In [25]:
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import median_absolute_error

aag.metrics([r2_score, mean_absolute_error, mean_squared_error, median_absolute_error])

{'r2_score': -1.0825312319940625,
 'mean_absolute_error': 0.34041308884958604,
 'mean_squared_error': 0.17420445096077616,
 'median_absolute_error': 0.2728263073787386}