<a href="https://colab.research.google.com/github/pz-white/DrugBAN/blob/main/drugban_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DrugBAN Demo Running on Colab


## Setup

The first few blocks of code are necessary to set up the notebook execution environment. This checks if the notebook is running on Google Colab and installs required packages.

In [None]:
if 'google.colab' in str(get_ipython()):
    print('Running on CoLab')
    !pip uninstall --yes yellowbrick
    !pip install -U -q psutil
    !pip install dgl
    !pip install dgllife
    !pip install rdkit-pypi
    !pip install PrettyTable
    !pip install yacs
    !git clone https://github.com/pz-white/DrugBAN.git
    %cd DrugBAN
else:
    print('Not running on CoLab')

Running on CoLab
Found existing installation: yellowbrick 1.4
Uninstalling yellowbrick-1.4:
  Successfully uninstalled yellowbrick-1.4
[K     |████████████████████████████████| 281 kB 5.3 MB/s 
[?25hLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting dgl
  Downloading dgl-0.9.0-cp37-cp37m-manylinux1_x86_64.whl (6.2 MB)
[K     |████████████████████████████████| 6.2 MB 5.0 MB/s 
Installing collected packages: dgl
Successfully installed dgl-0.9.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting dgllife
  Downloading dgllife-0.3.0-py3-none-any.whl (220 kB)
[K     |████████████████████████████████| 220 kB 5.1 MB/s 
Collecting scikit-learn<1.0,>=0.22.2
  Downloading scikit_learn-0.24.2-cp37-cp37m-manylinux2010_x86_64.whl (22.3 MB)
[K     |████████████████████████████████| 22.3 MB 1.1 MB/s 
Installing collected packages: scikit-learn, dgllife
  Attempting uninstall: scik

## Import required modules.

In [None]:
from models import DrugBAN
from time import time
from utils import set_seed, graph_collate_func, mkdir
from configs import get_cfg_defaults
from dataloader import DTIDataset, MultiDataLoader
from torch.utils.data import DataLoader
from trainer import Trainer
from domain_adaptator import Discriminator
import torch
import argparse
import warnings, os
import pandas as pd

DGL backend not selected or invalid.  Assuming PyTorch for now.


Setting the default backend to "pytorch". You can change it in the ~/.dgl/config.json file or export the DGLBACKEND environment variable.  Valid options are: pytorch, mxnet, tensorflow (all lowercase)


## Configuration

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
cfg_path = "./configs/DrugBAN_Demo.yaml"
data = "bindingdb_sample"
comet_support = False

cfg = get_cfg_defaults()
cfg.merge_from_file(cfg_path)
cfg.freeze()

torch.cuda.empty_cache()
warnings.filterwarnings("ignore")
set_seed(cfg.SOLVER.SEED)
mkdir(cfg.RESULT.OUTPUT_DIR)
experiment = None
print(f"Config yaml: {cfg_path}")
print(f"Running on: {device}")
print(f"Hyperparameters:")
dict(cfg)

Config yaml: ./configs/DrugBAN_Demo.yaml
Running on: cpu
Hyperparameters:


{'DRUG': CfgNode({'NODE_IN_FEATS': 75, 'PADDING': True, 'HIDDEN_LAYERS': [128, 128, 128], 'NODE_IN_EMBEDDING': 128, 'MAX_NODES': 290}),
 'PROTEIN': CfgNode({'NUM_FILTERS': [128, 128, 128], 'KERNEL_SIZE': [3, 6, 9], 'EMBEDDING_DIM': 128, 'PADDING': True}),
 'BCN': CfgNode({'HEADS': 2}),
 'DECODER': CfgNode({'NAME': 'MLP', 'IN_DIM': 256, 'HIDDEN_DIM': 512, 'OUT_DIM': 128, 'BINARY': 1}),
 'SOLVER': CfgNode({'MAX_EPOCH': 1, 'BATCH_SIZE': 8, 'NUM_WORKERS': 0, 'LR': 5e-05, 'DA_LR': 0.001, 'SEED': 42}),
 'RESULT': CfgNode({'OUTPUT_DIR': './result/demo', 'SAVE_MODEL': True}),
 'DA': CfgNode({'TASK': False, 'METHOD': 'CDAN', 'USE': False, 'INIT_EPOCH': 10, 'LAMB_DA': 1, 'RANDOM_LAYER': False, 'ORIGINAL_RANDOM': False, 'RANDOM_DIM': None, 'USE_ENTROPY': True}),
 'COMET': CfgNode({'WORKSPACE': 'pz-white', 'PROJECT_NAME': 'DrugBAN', 'USE': False, 'TAG': None})}

## Data Loader

In [None]:
dataFolder = f'./datasets/{data}'
dataFolder = os.path.join(dataFolder, "random")

train_path = os.path.join(dataFolder, 'train.csv')
val_path = os.path.join(dataFolder, "val.csv")
test_path = os.path.join(dataFolder, "test.csv")
df_train = pd.read_csv(train_path)
df_val = pd.read_csv(val_path)
df_test = pd.read_csv(test_path)

train_dataset = DTIDataset(df_train.index.values, df_train)
val_dataset = DTIDataset(df_val.index.values, df_val)
test_dataset = DTIDataset(df_test.index.values, df_test)

params = {'batch_size': cfg.SOLVER.BATCH_SIZE, 'shuffle': True, 'num_workers': cfg.SOLVER.NUM_WORKERS, 'drop_last': True, 'collate_fn': graph_collate_func}
training_generator = DataLoader(train_dataset, **params)
params['shuffle'] = False
params['drop_last'] = False
val_generator = DataLoader(val_dataset, **params)
test_generator = DataLoader(test_dataset, **params)

## Setup Model and Optimizer

In [None]:
model = DrugBAN(**cfg).to(device)
opt = torch.optim.Adam(model.parameters(), lr=cfg.SOLVER.LR)
if torch.cuda.is_available():
  torch.backends.cudnn.benchmark = True

## Model Training

In [None]:
trainer = Trainer(model, opt, device, training_generator, val_generator, test_generator, opt_da=None, discriminator=None, experiment=experiment, **cfg)
result = trainer.train()
with open(os.path.join(cfg.RESULT.OUTPUT_DIR, "model_architecture.txt"), "w") as wf:
    wf.write(str(model))
print(f"Directory for saving result: {cfg.RESULT.OUTPUT_DIR}")

100%|██████████| 21/21 [00:38<00:00,  1.86s/it]


Training at Epoch 1 with training loss 0.7483742804754347
Validation at Epoch 1 with validation loss 0.6943950802087784  AUROC 0.6544117647058824 AUPRC 0.44206349206349205
Test at Best Model of Epoch 1 with test loss 0.6565468311309814  AUROC 0.4245614035087719 AUPRC 0.4018830588082055 Sensitivity 0.0 Specificity 1.0 Accuracy 0.3877551020408163 Thred_optim 0.42230069637298584
Directory for saving result: ./result/demo
