## MolTrans: Molecular Interaction Transformer for Drug Target Interaction Prediction

<b>Motivation:</b> Drug target interaction (DTI) prediction is a foundational task
for in silico drug discovery, which is costly and time-consuming due to the need
of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the
following challenges are still open: (1) the sole data-driven molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce
results that are less accurate and difficult to explain; (2) existing methods focus
on limited labeled data while ignoring the value of massive unlabelled molecular
data.

<b>Results:</b> We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (1) knowledge inspired sub-structural pattern mining
algorithm and interaction modeling module for more accurate and interpretable
DTI prediction; (2) an augmented transformer encoder to better extract and
capture the semantic relations among substructures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real world data and show it
improved DTI prediction performance compared to state-of-the-art baselines.

Link to paper: https://arxiv.org/pdf/2004.11424v1.pdf

Credit: https://github.com/kexinhuang12345/MolTrans

In [1]:
# Clone the repository and cd into directory
!git clone https://github.com/kexinhuang12345/MolTrans.git
%cd MolTrans

Cloning into 'MolTrans'...
remote: Enumerating objects: 97, done.[K
remote: Counting objects: 100% (97/97), done.[K
remote: Compressing objects: 100% (62/62), done.[K
remote: Total 97 (delta 47), reused 75 (delta 33), pack-reused 0[K
Unpacking objects: 100% (97/97), done.
Checking out files: 100% (45/45), done.
/content/MolTrans


In [None]:
# Install requirements / dependencies
!pip install -r requirements.txt

### Datasets

In the dataset folder, we provide all three processed datasets used in MolTrans: BindingDB, DAVIS, and BIOSNAP. In BIOSNAP folder, there is full dataset for the main experiment, and also missing data experiment (70%, 80%, 90%, 95%) and unseen drug and unseen protein datasets.

### Run

You can directly run `python train.py --task ${task_name}` to run the experiments. `${task_name}` could either be `biosnap`,`bindingdb` , and `davis`. For the BindingDB and DAVIS, please refer this [Page](https://zitniklab.hms.harvard.edu/TDC/multi_pred_tasks/dti/) for more details.

In [None]:
# run the biosnap experiment
!python train.py --task biosnap

In [None]:
# run the bindingdb experiment
!python train.py --task bindingdb

In [None]:
# run the davis experiment
!python train.py --task davis