# Comparing iwatobipen JAK3 results with our classifier#
    - Matt Robinson

https://iwatobipen.wordpress.com/2017/05/18/graph-convolution-classification-with-deepchem/

The prolific and well-known chemoinformatics blogger *iwatobipen* released his result using a graph convolutional network using DeepChem. Here we compare his results to those obtained with our own, much simpler gcn.

The dataset is JAK3 inhibitor activity data obtained from CHEMBL and availalbe on iwatobipen's github: 

https://github.com/iwatobipen/deeplearning

In [1]:
import sys
sys.path.insert(0, '../../')

import mygcn

In [2]:
%reload_ext autoreload
%autoreload 2

In [3]:
from mygcn import features
from mygcn import gcn
from mygcn import train
from mygcn import evaluation

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import torch
import torch.optim as optim
import torch.nn as nn
from torch.utils.data import DataLoader
import dgl

# Running our classifier: #

In [5]:
train_dl, valid_dl, test_dl = features.get_graph_data('jak3_activities.csv',
                                      smiles_field='CANONICAL_SMILES',
                                      labels_field='activity_class',
                                      train_size=0.8,
                                      valid_size=0.0,
                                      self_edges=True,
                                      edge_features=True,
                                      seed=1) # note results still not totally deterministic

In [9]:
num_features = features.get_num_atom_features() + features.get_num_bond_features()
learning_rate = 0.001

model = gcn.DeepChemGCNClassifier(n_inputs=num_features,
                                 n_hidden=64,
                                 n_hidden_layers=2,
                                 n_outputs=2,
                                 dropout=0.2)
loss_func = nn.CrossEntropyLoss()
opt = optim.Adam(model.parameters(), lr=learning_rate)

### Note that we will also train for 50 epochs, as is done in the post ###

In [10]:
%%time
train.fit(model, train_dl, valid_dl, loss_func, opt, n_epochs=50,
          report_valid_loss=False)

Epoch 0, train loss 0.6568 valid loss N/A
Epoch 5, train loss 0.5159 valid loss N/A
Epoch 10, train loss 0.4838 valid loss N/A
Epoch 15, train loss 0.4855 valid loss N/A
Epoch 20, train loss 0.4748 valid loss N/A
Epoch 25, train loss 0.4529 valid loss N/A
Epoch 30, train loss 0.4292 valid loss N/A
Epoch 35, train loss 0.4310 valid loss N/A
Epoch 40, train loss 0.4332 valid loss N/A
Epoch 45, train loss 0.4322 valid loss N/A
CPU times: user 4min 52s, sys: 3.16 s, total: 4min 56s
Wall time: 1min 43s


In [11]:
evaluation.evaluate_classifier(model, test_dl, loss_func)

test_loss:  0.4343380232652028
accuracy:  0.8055555555555556
classification report: 
               precision    recall  f1-score   support

           0       0.81      0.96      0.88       134
           1       0.76      0.35      0.48        46

   micro avg       0.81      0.81      0.81       180
   macro avg       0.79      0.66      0.68       180
weighted avg       0.80      0.81      0.78       180

roc-auc:  0.7978585334198572
bootstrapped roc-auc:  [0.7342825147976324, 0.8557754704737184]
