Molformer

Introduction

This is the repository for our Molformer.

Intsallation

# Install packages
pip install pytorch scikit-learn mendeleev
pip install rdkit-pypi

Dataset

We test our model in three different domains: quantum chemistry, physiology and biophysics. We also provide information of datasets regarding the material science used in the preceding 3D-Transformer. You can download the raw datasets in the following links.

Quantum Chemistry

QM7 Dataset
Download (Official Website): http://quantum-machine.org/datasets/
Download (DeepChem): https://github.com/deepchem/deepchem/blob/master/deepchem/molnet/load_function/qm7_datasets.py#L30-L107
Discription (DeepChem): https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html#qm7-datasets
QM8 Dataset
Download (DeepChem): https://github.com/deepchem/deepchem/blob/master/deepchem/molnet/load_function/qm8_datasets.py
Discription (DeepChem): https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html?highlight=qm7#qm8-datasets
QM9 Dataset Download (Official Website): https://ndownloader.figshare.com/files/3195389
Download (Deep Chem): https://github.com/deepchem/deepchem/blob/master/deepchem/molnet/load_function/qm9_datasets.py
Download (Atom3D): https://www.atom3d.ai/smp.html
Download (MPNN Supplement): https://drive.google.com/file/d/0Bzn36Iqm8hZscHFJcVh5aC1mZFU/view?resourcekey=0-86oyPL3e3l2ZTiRpwtPDBg
Download (Schnet): https://schnetpack.readthedocs.io/en/stable/tutorials/tutorial_02_qm9.html#Loading-the-data
GEOM-QM9 Dataset
Download (Official Website): https://doi.org/10.7910/DVN/JNGTDF
Tutorial of usage: https://github.com/learningmatter-mit/geom/blob/master/tutorials/01_loading_data.ipynb

Physiology

BBBP
Download and Description (from Moleculnet): https://moleculenet.org/datasets-1
Download (from Glambard): https://github.com/GLambard/Molecules_Dataset_Collection
Description (From Deepchem): https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html
Description (From DGL-sci): https://lifesci.dgl.ai/api/data.html#tox21
ClinTox
Download and Description (from Moleculnet): https://moleculenet.org/datasets-1
Download (from Glambard): https://github.com/GLambard/Molecules_Dataset_Collection
Description (From Deepchem): https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html
Description (From DGL-sci): https://lifesci.dgl.ai/api/data.html#tox21

Biophysics

PDBbind
Atom3d: https://github.com/drorlab/atom3d
1. install atom3D pip install atom3d
2. download 'split-by-sequence-identity-30' dataset from https://www.atom3d.ai/
3. preprocess the data by running python pdbbind/dataloader_pdb.py
BACE
Download and Description (from Moleculnet): https://moleculenet.org/datasets-1
Download (from Glambard): https://github.com/GLambard/Molecules_Dataset_Collection
Description (From Deepchem): https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html
Description (From DGL-sci): https://lifesci.dgl.ai/api/data.html#tox21

Material Science

COREMOF
Download (Baidu Drive): https://pan.baidu.com/s/12N8gM8_TQ1mpBGx6gdkAog (password：l41s)
Reproduction of PointNet++: python coremof/reproduce/main_pn_coremof.py
Reproduction of MPNN: python coremof/reproduce/main_mpnn_coremof.py
Repredoction of SchNet:
1. load COREMOF python coremof/reproduce/main_sch_coremof.py
2. run SchNet spk_run.py train schnet custom ../../coremof.db ./coremof --split 900 100 --property LCD --features 16 --batch_size 20 --cuda
  (Note: official script of Schnet cannot be reproduced successfully due to the memory limitation.)

Models

models/tr_spe: 3D-Transformer with Sinusoidal Position Encoding (SPE)
models/tr_cpe: 3D-Transformer with Convolutional Position Encoding (CPE)
models/tr_msa: 3D-Transformer with Multi-scale Self-attention (MSA)
models/tr_afps: 3D-Transformer with Attentive Farthest Point Sampling (AFPS)
models/tr_full: 3D-Transformer with CPE + MSA + AFPS

Quick Tour

Model Usage

After processing the dataset, it is time to establish the model. Suppose there are N types of atoms, and n downstream multi-tasks. If you only need to predict a single property, set n = 1. For multi-scale self-attenion, a dist_bar is needed to define the different scales of local regions, such as dist_bar=[1, 3, 5]. You can also specify the number of attention heads, the number of encodes, the dimension size, the dropout rate, and etc, There we only adopt the defaults.

>>> import torch 
>>> from model.tr_spe import build_model
 
# initialize the model 
>>> model = build_model(N, n).cuda()

# take a 4-atom molecule for example
>>> x = torch.tensor([[1, 1, 6, 8]]).cuda()
>>> pos = torch.tensor([[[7.356203877, 9.058198382, 3.255188164],
                         [5.990730587, 3.951633382, 9.784664946],
                         [1.048332315, 3.912215133, 9.827313903],
                         [2.492201352, 9.097616820, 3.297837121]]]).cuda()
>>> mask = (x != 0).unsqueeze(1)
>>> out = model(x.long(), mask, pos)

>>> import torch 
>>> from model.tr_msa import build_model
 
# initialize the model 
>>> model = build_model(N, n, dist_bar).cuda()

# take a 4-atom molecule for example
>>> x = torch.tensor([[1, 1, 6, 8]]).cuda()
>>> pos = torch.tensor([[[7.356203877, 9.058198382, 3.255188164],
                         [5.990730587, 3.951633382, 9.784664946],
                         [1.048332315, 3.912215133, 9.827313903],
                         [2.492201352, 9.097616820, 3.297837121]]]).cuda()
>>> mask = (x != 0).unsqueeze(1)
>>> dist = torch.cdist(pos, pos).float()
>>> out = model(x.long(), mask, dist)

Motif Extraction

We reply on RDKit to extract motifs in small molecules. Given the SMILES representation of any molecule, we can manually define the substructures using Smarts.

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles(smiles)
>>> pattern = Chem.MolFromSmarts('C(=O)')
>>> mol.HasSubstructMatch(pattern) # check whether the molecule has the motif 'C(=O)'
>>> mol.GetSubstructMatches(pattern) # get atoms that belong to the motif 'C(=O)'

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
model		model
Readme.md		Readme.md
model.png		model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model

model

Readme.md

Readme.md

model.png

model.png

Repository files navigation

Molformer

Introduction

Intsallation

Dataset

Quantum Chemistry

Physiology

Biophysics

Material Science

Models

Quick Tour

Model Usage

Motif Extraction

About

Releases

Packages

Languages

smiles724/Molformer

Folders and files

Latest commit

History

Repository files navigation

Molformer

Introduction

Intsallation

Dataset

Quantum Chemistry

Physiology

Biophysics

Material Science

Models

Quick Tour

Model Usage

Motif Extraction

About

Resources

Stars

Watchers

Forks

Languages