GitHub

Introduction

This is the offcial implementation of Paper "Enhancing Molecular Property Prediction with Chemical Priors by Fractional Denoising"

Requirements

The environment is composed of the following packages and versions:

pytorch-lightning   1.8.6
torch               1.13.1+cu116
torch-cluster       1.6.0+pt113cu116
torch-geometric     2.3.0
torch-scatter       2.1.0+pt113cu116
torch-sparse        0.6.17+pt113cu116
torch-spline-conv   1.2.1+pt113cu116
torchmetrics        0.11.4
wandb               0.15.3
numpy               1.22.4
scikit-learn        1.2.2
scipy               1.8.1
deepchem            2.7.1
ogb                 1.3.6
omegaconf           2.3.0
tqdm                4.66.2

The basic software and environment include Python 3.8, CUDA 11.6, Ubuntu 20.04.2 with OS version 9.4.0-1ubuntu1~20.04.2, and Linux kernel version 5.4.0-177-generic.

We ran all experiments on a server equipped with 8 NVIDIA A100-PCIE-40GB GPUs.

Additionally, we have updated a Conda environment package available at google drive. You can download the environment package and unzip it into the 'envs' directory of Conda.

Quick Start

To leverage Frad's fine-tuned model for predicting molecular quantum properties, follow these steps:

Prepare the molecular SMILES in format like smiles.lst

smiles
CC(C)n(c1c2CN(Cc3cn(C)nc3C)CC1)nc2-c1ncc(C)o1
C=CCN1C(SCc2nc(cccc3)c3[nH]2)=Nc(cccc2)c2C1=O
COc1ccc(Cn2c(C(O)=O)c(CNC3CCCC3)c3c2cccc3)cc1
O=C(CCc1ccc(CC(CC2)CCN2C(c2cscc2)=O)cc1)NC1CC1
Cc1nc(CCC2)c2c(N2CC(CN(C(C=C3)=O)N=C3c3ccncc3)C2)n1
CC(C1)OC(C)CN1c1nc(Nc2cc(OC)ccc2)c(cnn2C)c2n1

Generate coordinates for Molecular SMILES

python convert_smiles_pos.py --smiles_file=smiles.lst --output_file smiles_coord.lst

The generated coordinates and atom types for the input SMILES will be stored in smiles_coord.lst

Utilize the fine-tuned model for prediction

Download the fine-tuned model for either the gap property from this URL or the lumo property from this URL.

Execute the following command for property prediction. The prediction results will be stored in results.csv:

CUDA_VISIBLE_DEVICES=0 python scripts/test.py --conf examples/ET-QM9-FT_dw_0.2_long.yaml --dataset TestData --dataset-root smiles_coord.lst --train-size 1 --val-size 1 --layernorm-on-vec whitened --job-id gap{or lumo}_inference --dataset-arg gap{or lumo} --pretrained-model $finetuned-model --output-file results.csv

Reproduce

Assets

Dataset	Reference
PCQM4Mv2	OGB Stanford, Figshare
QM9	Figshare
MD17	SGDML
MD22	SGDML
ISO17	Quantum Machine
LBA	Zenodo

Additionally, we offer the download link for the processed finetuned data at the following URL: google drive

Pre-trained models

All pre-trained models are uploaded to Zenodo: Zenodo Link

Alternatively, individual pre-trained models can be accessed via Google Drive:

Pretrained model for QM9: google drive
Pretrained model for Force Predictioin(MD17, MD22, ISO17): google drive
Pretrained model for LBA: google drive

Finetuning

Finetune on QM9

Below is the script for fine-tuning the QM9 task. Ensure to replace pretrain_model_path with the actual model path. In this script, the subtask is set to 'homo', but it can be replaced with other subtasks as well.

CUDA_VISIBLE_DEVICES=0 python -u scripts/train.py --conf examples/ET-QM9-FT_dw_0.2_long.yaml --layernorm-on-vec whitened --job-id frad_homo --dataset-arg homo  --denoising-weight 0.1 --dataset-root $datapath --pretrained-model $pretrain_model_path

Finetune on MD17

Below is the script for fine-tuning the MD17 task. Replace pretrain_model_path with the actual model path. In this script, the subtask is set to 'aspirin', but it can be replaced with other subtasks such as {'benzene', 'ethanol', 'malonaldehyde', 'naphthalene', 'salicylic_acid', 'toluene', 'uracil'}.

CUDA_VISIBLE_DEVICES=0 python -u scripts/train.py --conf examples/ET-MD17_FT-angle_9500.yaml  --job-id frad_aspirin --dataset-arg aspirin --pretrained-model $pretrain_model_path --dihedral-angle-noise-scale 20 --position-noise-scale 0.005 --composition true --sep-noisy-node true --train-loss-type smooth_l1_loss

Finetuning the MD22

Below is the script for fine-tuning the MD22 task. Replace pretrain_model_path with the actual model path. In this script, the subtask is set to 'AT-AT-CG-CG' by --dataset-arg, but it can be replaced with other subtasks such in ("AT-AT-CG-CG" "AT-AT" "Ac-Ala3-NHMe" "DHA" "buckyball-catcher" "double-walled_nanotube" "stachyose").

CUDA_VISIBLE_DEVICES=0 python -u scripts/train.py --conf examples/ET-MD22.yaml --batch-size 32 --inference-batch-size 32 --num-epochs 100 --lr 1e-3 --log-dir md22-AT-AT-CG-CG --dataset-arg AT-AT-CG-CG --ngpus 1 --job-id md22-AT-AT-CG-CG --pretrained-model $$pretrain_model_path --lr-schedule cosine_warmup --save-top-k 1 --save-interval 1 --test-interval 1 --seed 666 --md17 true --train-loss-type smooth_l1_loss

Finetuning the ISO17

Below is the script for fine-tuning the ISO17 task.

CUDA_VISIBLE_DEVICES=0 python -u scripts/train.py --conf examples/ET-ISO17.yaml --batch-size 256 --job-id iso17 --inference-batch-size 256 --pretrained-model $pretrain_model_path --num-epochs 50 --lr 2e-4 --log-dir iso-energy --ngpus 1  --save-top-k 1 --save-interval 1 --test-interval 1 --seed 666 --lr-schedule cosine_warmup --md17 true --train-loss-type smooth_l1_loss

Finetuning the LBA

Below is the script for fine-tuning the LBA task.

CUDA_VISIBLE_DEVICES=0 python -u scripts/train.py --conf examples/ET-LBA-FT_long_f2d.yaml --layernorm-on-vec whitened --job-id LBA --dataset-root $LBA_DATA_PATH --pretrained-model $pretrain_model_path

Pretraining

Model for the QM9

CUDA_VISIBLE_DEVICES=0 python -u scripts/train.py --conf examples/ET-PCQM4MV2_dih_var0.04_var2_com_re.yaml --layernorm-on-vec whitened --job-id frad_pretraining --num-epochs 8

Model for Atomic Forces Tasks like md17, md22, iso17

CUDA_VISIBLE_DEVICES=0 python -u scripts/train.py --conf examples/ET-PCQM4MV2_var0.4_var2_com_re_md17.yaml --layernorm-on-vec whitened --job-id frad_pretraining_force --num-epochs 8

The above script is for pre-training the model using RN noise. To switch to VRN noise, add the option --bat-noise true.
For the LBA task, we incorporate angular information into the molecular geometry embedding to better model the complexity of the input protein-ligand complex. Add the option --model equivariant-transformerf2d to apply the custom model for LBA.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.vscode		.vscode
examples		examples
scripts		scripts
torchmdnet		torchmdnet
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
convert_smiles_pos.py		convert_smiles_pos.py
rdkit_generate_pcqm.py		rdkit_generate_pcqm.py
results.csv		results.csv
smiles.lst		smiles.lst
smiles_coord.lst		smiles_coord.lst
torsion_utils.py		torsion_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Requirements

Quick Start

Reproduce

Assets

Pre-trained models

Finetuning

Finetune on QM9

Finetune on MD17

Finetuning the MD22

Finetuning the ISO17

Finetuning the LBA

Pretraining

About

Releases

Packages

Contributors 3

Languages

License

fengshikun/FradNMI

Folders and files

Latest commit

History

Repository files navigation

Introduction

Requirements

Quick Start

Reproduce

Assets

Pre-trained models

Finetuning

Finetune on QM9

Finetune on MD17

Finetuning the MD22

Finetuning the ISO17

Finetuning the LBA

Pretraining

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages