## MAT: Molecule Attention Transformer

ABSTRACT: Designing a single neural network architecture that performs competitively across a range
of molecule property prediction tasks remains
largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transformer using inter-atomic distances and the molecular graph structure. Experiments show that MAT performs competitively on a diverse set of molecular prediction tasks. Most importantly, with a simple self-supervised pretraining, MAT requires tuning of only a few hyperparameter values to achieve state-of-the-art performance on downstream tasks. Finally, we show that attention weights learned by MAT are interpretable from the chemical point of view.

Link to paper: https://arxiv.org/pdf/2002.08264v1.pdf

Credit: https://github.com/ardigen/MAT

Google Colab: https://colab.research.google.com/drive/1285XO7B0BEJ4gZkb1TP_SqMnGIDvGw3B?usp=sharing

In [3]:
# Clone the repository and cd into directory
!git clone https://github.com/ardigen/MAT.git
%cd MAT/src

/content/MAT/src


### Example of loading pretrained weights into MA

#### Prepare Data Set

First, a data set is loaded. Function <code>load_data_from_df</code> automatically saves calculated features to the provided data directory (unless <code>use_data_saving</code> is set to <code>False</code>). Every next run will use the saved features.

In [4]:
import os
import pandas as pd
import torch
os.chdir('MAT/src')

In [None]:
# Install RDKit 
!pip install rdkit-pypi==2021.3.1.5

In [7]:
from featurization.data_utils import load_data_from_df, construct_loader

In [8]:
batch_size = 64

# Formal charges are one-hot encoded to keep compatibility with the pre-trained weights.
# If you do not plan to use the pre-trained weights, we recommend to set one_hot_formal_charge to False.
X, y = load_data_from_df('../data/freesolv/freesolv.csv', one_hot_formal_charge=True)
data_loader = construct_loader(X, y, batch_size)

You can use your data, but the CSV file should contain two columns as shown below:

In [9]:
pd.read_csv('../data/freesolv/freesolv.csv').head()

Unnamed: 0,smiles,y
0,CN(C)C(=O)c1ccc(cc1)OC,-1.874467
1,CS(=O)(=O)Cl,-0.277514
2,CC(C)C=C,1.465089
3,CCc1cnccn1,-0.428367
4,CCCCCCCO,-0.105855


#### Prepare Model

In [10]:
from transformer import make_model

In [11]:
d_atom = X[0][0].shape[1]  # It depends on the used featurization.

model_params = {
    'd_atom': d_atom,
    'd_model': 1024,
    'N': 8,
    'h': 16,
    'N_dense': 1,
    'lambda_attention': 0.33, 
    'lambda_distance': 0.33,
    'leaky_relu_slope': 0.1, 
    'dense_output_nonlinearity': 'relu', 
    'distance_matrix_kernel': 'exp', 
    'dropout': 0.0,
    'aggregation_type': 'mean'
}

model = make_model(**model_params)

### Load Pretrained Weights (optional)

If you want to use the pre-trained weights to train your model, <b>you should not change model parameters in the cell above</b>.

First, download the pretrained weights: https://drive.google.com/file/d/11-TZj8tlnD7ykQGliO9bCrySJNBnYD2k/view

In [13]:
pretrained_name = '../pretrained_weights.pt'
pretrained_state_dict = torch.load(pretrained_name)

In [14]:
model_state_dict = model.state_dict()

for name, param in pretrained_state_dict.items():
    if 'generator' in name:
         continue
    if isinstance(param, torch.nn.Parameter):
        param = param.data
    model_state_dict[name].copy_(param)

#### Run Training/Evaluation Loop

In [16]:
model.cuda()

for batch in data_loader:
    adjacency_matrix, node_features, distance_matrix, y = batch
    batch_mask = torch.sum(torch.abs(node_features), dim=-1) != 0
    output = model(node_features, batch_mask, adjacency_matrix, distance_matrix, None)
    ...