# GNNTox Team
![team](images/gnntox.png)

# Mission Statement: 
## We aim to reduce the cost and lead time of drug development to ensure faster delivery and lower costs of drugs for patients. 


# Problem:
![costs](images/drug_costs.svg)

Ned Pagliarulo / BioPharma Dive, data from "Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009-2018," JAMA 

 - Time to market for new pharmeceuticals is typically around 10 years (https://www.pharmtech.com/view/speeding-time-market-better-pharmaceutical-project-management)

 - "In fact, more than 30 percent of promising pharmaceuticals have failed in human clinical trials because they are determined to be toxic despite promising pre-clinical studies in animal models (Nat Rev Drug Discov. 2004;3(8):711–715)." (https://tripod.nih.gov/tox21/challenge/about.jsp)

# Dataset: Tox21 Challenge 
![name](images/tox21.png)

- https://tripod.nih.gov/tox21/challenge/about.jsphttps://tripod.nih.gov/tox21/challenge/about.jsp
- Created by NIH National Center for Advancing Translational Sciences in 2014.
- Goal: Predict whether candidate molecules are active in  12 biological pathways of interest
- Data: ~10,000 different labeled compounds to test

# Starting Point: DeepChem
<img src="images/deepchem_logo.png" style="width: 400px;"/>

- https://deepchem.io/
- Open source
- Provides convenient interface for Tox21 dataset
- Feature extractor for converting chemical formulas to graphs (nodes are atoms, edges are chemical bonds)
- Provides starting point for several different types of models, including GNN's

# Product: GNNTox
- GNN Model: validation roc_auc score: 0.72

## Client side: 
- Convenient python API
- Apply DeepChem featurizer to SMILES-encoded representations of various chemical compounds
- Convert graph representation to json-compatible string which can be sent via http
- Send input to server to run model inference
- Display results or save to csv


## Server side:
- Currently running locally, but will be uploaded to AWS EC2 instance
- Dockerized Flask app- scalable, reproducible
  * convert request from JSON object to graph
  * run model inference
  * send results over http

## Pre-demo: module imports

In [13]:
import pickle
import deepchem as dc
import sys
sys.path.insert(0,'../')
import src
import requests
import json
import pandas as pd
import numpy as np

from src.dc_utils import data_to_json, format_request, response_to_csv

# Demo 1: get metadata

Retrieve product information. Right now, just indicates release version.

In [14]:
host1 = 'http://localhost:5000/'

response = requests.get(host1)
print('metadata:\n{}'.format(response.text))


metadata:
{"Release":"beta"}



# Demo 2: Predict on data from DeepChem
- Data loaded from deepchem directly
- Encode dataset to JSON 
- send to server 
- get predictions
- Display results in plain text

In [15]:
tox21_tasks_2, tox21_datasets_2, transformers_2 = dc.molnet.load_tox21(featurizer='GraphConv')
train_dataset_2, valid_dataset_2, test_dataset_2 = tox21_datasets_2

In [16]:
print('Number of samples: {}'.format(test_dataset_2.X.shape))

Number of samples: (784,)


In [35]:
test_data = data_to_json(test_dataset_2)
data_json = format_request(test_data)

host2 = f'{host1}/predict'
response = requests.post(host2, data=data_json)
response_list = json.loads(response.text)
for i in range(3):
    print('{}: {}'.format(i, test_dataset_2.ids[i]))
    print(response_list[i])
    print('\n\n')

0: CC1(C)S[C@@H]2[C@H](NC(=O)Cc3ccccc3)C(=O)N2[C@H]1C(=O)O.CC1(C)S[C@@H]2[C@H](NC(=O)Cc3ccccc3)C(=O)N2[C@H]1C(=O)O.c1ccc(CNCCNCc2ccccc2)cc1
['estrogen receptor alpha, LBD (ER, LBD): inactive', 'estrogen receptor alpha, full (ER, full): inactive', 'aromatase: inactive', 'aryl hydrocarbon receptor (AhR): inactive', 'androgen receptor, full (AR, full): inactive', 'androgen receptor, LBD (AR, LBD): inactive', 'peroxisome proliferator-activated receptor gamma (PPAR-gamma): inactive', 'nuclear factor (erythroid-derived 2)-like 2/antioxidant responsive element (Nrf2/ARE): inactive', 'heat shock factor response element (HSE): inactive', 'ATAD5: inactive', 'mitochondrial membrane potential (MMP): inactive', 'p53: inactive']



1: CC(C)(c1ccc(Oc2ccc3c(c2)C(=O)OC3=O)cc1)c1ccc(Oc2ccc3c(c2)C(=O)OC3=O)cc1
['estrogen receptor alpha, LBD (ER, LBD): active', 'estrogen receptor alpha, full (ER, full): inactive', 'aromatase: active', 'aryl hydrocarbon receptor (AhR): active', 'androgen receptor, full (AR

# Demo 3: Predict on molecule from SMILES representation

In [36]:
molecules = ['CC(C)(c1ccc(Oc2ccc3c(c2)C(=O)OC3=O)cc1)c1ccc(Oc2ccc3c(c2)C(=O)OC3=O)cc1',
       'Cc1cc(C(C)(C)C)c(O)c(C)c1Cn1c(=O)n(Cc2c(C)cc(C(C)(C)C)c(O)c2C)c(=O)n(Cc2c(C)cc(C(C)(C)C)c(O)c2C)c1=O',
       'Cc1nnc(-c2ccccc2)c(=O)n1N', 'N=C(N)NCC1COc2ccccc2O1',
       'Cc1cccc(C)c1NC(=O)NC1=CCCN1C', 'c1csc(C2(N3CCCCC3)CCCCC2)c1',]

featurizer = dc.feat.ConvMolFeaturizer()
conv_mols = featurizer(molecules)
dataset = dc.data.NumpyDataset(conv_mols, ids=molecules, n_tasks=12)
dataset.tasks = tox21_tasks_2

dataset_json = format_request(data_to_json(dataset))
response_2 = requests.post(host2, data=dataset_json)

In [37]:
response_to_csv(response_2, molecules, 'results.csv')

# Future Plans
- Try to improve model performance (better architecture, more data, better features, etc)
- Front-end app development (GUI/more features)
- Expand classifier to include more pathways, or generate higher level information (ie biocompatible vs not)
- Expand classifier to work with more complex molecules ie MABs