## GraphDTA: prediction of drug–target binding affinity using graph convolutional networks

ABSTRACT: While the development of new drugs is costly, time consuming, and often accompanied with safety issues, drug repurposing, where old drugs with established safety are used for medical conditions other than originally developed, is an attractive alternative. Then, how the old drugs work on new targets becomes a crucial part of drug repurposing and gains much of interest. Several statistical and machine learning models have been proposed to estimate drug–target binding affinity and deep learning approaches have been shown to be among state-of-the-art methods. However, drugs and targets in these models were commonly represented in 1D strings, regardless the fact that molecules are by nature formed by the chemical bonding of atoms. In this work, we propose GraphDTA to capture the structural information of drugs, possibly enhancing the predictive power of the affinity. In particular, unlike competing methods, drugs are represented as graphs and graph convolutional networks are used to learn drug–target binding affinity. We trial our method on two benchmark drug–target binding affinity datasets and compare the performance with state-of-the-art models in the field. The results show that our proposed method can not only predict the affinity better than non-deep learning models, but also outperform competing deep learning approaches. This demonstrates the practical advantages of graph-based representation for molecules in providing accurate prediction of drug–target binding affinity. The application may also include any recommendation systems where either or both of the user and product-like sides can be represented in graphs.

Link to paper: https://www.biorxiv.org/content/10.1101/684662v3.full.pdf

Credit: https://github.com/thinng/GraphDTA

Google Colab: https://colab.research.google.com/drive/1k0xpdhMEiW_e23qy5LFvS9szUgKmT6O3?usp=sharing

In [1]:
# Clone the repository and cd into directory
!git clone https://github.com/thinng/GraphDTA.git
%cd GraphDTA

Cloning into 'GraphDTA'...
remote: Enumerating objects: 133, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (7/7), done.[K
remote: Total 133 (delta 2), reused 0 (delta 0), pack-reused 126[K
Receiving objects: 100% (133/133), 69.03 MiB | 13.66 MiB/s, done.
Resolving deltas: 100% (65/65), done.
/content/GraphDTA


In [None]:
# Install RDKit
!pip install rdkit-pypi==2021.3.1.5

In [None]:
# Install requisite libraries
!pip install torch==1.4.0
!pip install torch-scatter==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
!pip install torch-sparse==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
!pip install torch-cluster==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
!pip install torch-spline-conv==latest+cu101 -f https://pytorch-geometric.com/whl/torch-1.4.0.html
!pip install torch-geometric

In [6]:
## 1. Create data in PyTorch format
!python create_data.py

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Converting SMILES to graph: 14712/19709
Converting SMILES to graph: 14713/19709
Converting SMILES to graph: 14714/19709
Converting SMILES to graph: 14715/19709
Converting SMILES to graph: 14716/19709
Converting SMILES to graph: 14717/19709
Converting SMILES to graph: 14718/19709
Converting SMILES to graph: 14719/19709
Converting SMILES to graph: 14720/19709
Converting SMILES to graph: 14721/19709
Converting SMILES to graph: 14722/19709
Converting SMILES to graph: 14723/19709
Converting SMILES to graph: 14724/19709
Converting SMILES to graph: 14725/19709
Converting SMILES to graph: 14726/19709
Converting SMILES to graph: 14727/19709
Converting SMILES to graph: 14728/19709
Converting SMILES to graph: 14729/19709
Converting SMILES to graph: 14730/19709
Converting SMILES to graph: 14731/19709
Converting SMILES to graph: 14732/19709
Converting SMILES to graph: 14733/19709
Converting SMILES to graph: 14734/19709
Converting SMIL

This returns <code>kiba_train.csv, kiba_test.csv, davis_train.csv,</code> and <code>davis_test.csv</code>, saved in <code>data/</code> folder. These files are in turn input to create data in pytorch format, stored at <code>data/processed/</code>, consisting of <code>kiba_train.pt, kiba_test.pt, davis_train.pt,</code> and <code>davis_test.pt</code>.

In [7]:
## 2. Train a prediction model
!python training.py 0 0 0

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Make prediction for 5010 samples...
0.2666617 No improvement since epoch  162 ; best_mse,best_ci: 0.24826041 0.885748119323684 GINConvNet davis
Training on 25046 samples...
Make prediction for 5010 samples...
0.2529792 No improvement since epoch  162 ; best_mse,best_ci: 0.24826041 0.885748119323684 GINConvNet davis
Training on 25046 samples...
Make prediction for 5010 samples...
0.27317867 No improvement since epoch  162 ; best_mse,best_ci: 0.24826041 0.885748119323684 GINConvNet davis
Training on 25046 samples...
Make prediction for 5010 samples...
0.2609555 No improvement since epoch  162 ; best_mse,best_ci: 0.24826041 0.885748119323684 GINConvNet davis
Training on 25046 samples...
Make prediction for 5010 samples...
0.2737306 No improvement since epoch  162 ; best_mse,best_ci: 0.24826041 0.885748119323684 GINConvNet davis
Training on 25046 samples...
Make prediction for 5010 samples...
0.29111 No improvement since epoc

where the first argument is for the index of the datasets, 0/1 for 'davis' or 'kiba', respectively; the second argument is for the index of the models, 0/1/2/3 for GINConvNet, GATNet, GAT_GCN, or GCNNet, respectively; and the third argument is for the index of the cuda, 0/1 for 'cuda:0' or 'cuda:1', respectively. Note that your actual CUDA name may vary from these, so please change the following code accordingly:

In [1]:
import sys

cuda_name = "cuda:0"
if len(sys.argv)>3:
    cuda_name = "cuda:" + str(int(sys.argv[3])) 

This returns the model and result files for the modelling achieving the best MSE for testing data throughout the training. For example, it returns two files <code>model_GATNet_davis.model</code> and <code>result_GATNet_davis.csv</code> when running GATNet on Davis data.

In [None]:
## 3. Train a prediction model with validation
!python training_validation.py 0 0 0

This returns the model achieving the best MSE for validation data throughout the training and performance results of the model on testing data. For example, it returns two files <code>model_GATNet_davis.model</code> and <code>result_GATNet_davis.csv</code> when running GATNet on Davis data.