Energy-Motivated Equivariant Pretraining for 3D Molecular Graphs (AAAI 2023)

Code for Energy-Motivated Equivariant Pretraining for 3D Molecular Graphs (3D-EMGP).

Dependencies

python==3.7.10
torch==1.7.0
torch-geometric==1.6.3

Data Preparation

The raw data of GEOM can be downloaded from the official website.

One can download the dataset and unpack it into the dataset folder as follows:

|-- datasets
    |-- GEOM
	|-- rdkit_folder
            |-- drugs
            |-- qm9
            |-- summary_drugs.json
            |-- summary_qm9.json

To preprocess the GEOM data for pretraining:

python data/geom.py --base_path datasets/GEOM/rdkit_folder/ --datasets qm9 --output blocks --val_num 500 --conf_num 10 --block_size 100000 --test_smiles data/filter_smiles.txt

Generated data blocks are listed as follows:

|-- datasets
    |-- GEOM
        |-- rdkit_folder
        |-- blocks
            |-- summary.json
            |-- val_block.pkl
            |-- train_block_i.pkl

Pretraining

3D-EMGP pretraining can be conducted via the following commands.

export CUDA_VISIBLE_DEVICES=0
python -u script/pretrain_3dmgp.py --config_path config/pretrain_3dmgp.yml

One can also pretrain the model in a multi-GPU mode.

export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m torch.distributed.launch --nproc_per_node=4 --master_port <port> script/pretrain_3dmgp.py --config_path config/pretrain_3dmgp.yml

The pretrained model will be saved in checkpoints/pretrain/3dmgp, which can be modified in config/pretrain_3dmgp.yml

One can also pretrain the 3D model via the re-implemented baseline methods via the following commands. Take AttrMask as an example:

export CUDA_VISIBLE_DEVICES=0
python -u script/baselines/train_attr_mask.py --config_path config/pretrain_baselines.yml

Finetuning

Finetuning on QM9 :

python -u script/finetune_qm9.py --config_path config/finetune_qm9.yml --restore_path <pretrained_checkpoint> --property <property>

The property should be chosen from

alpha, gap, homo, lumo, mu, Cv, G, H, r2, U, U0, zpve

Finetuning on MD17 :

python -u script/finetune_md17.py --config_path config/finetune_md17.yml --restore_path <pretrained_checkpoint> --molecule <molecule> --model_name <molecule>

The molecule should be chosen from

aspirin
benzene
ethanol
malonaldehyde
naphthalene
salicylic_acid
toluene
uracil

Note that the finetuning datasets will be automatically downloaded and preprocessed on the first run.

Citation

Please consider citing our work if you find it helpful:

@misc{jiao2022energy,
  url={https://arxiv.org/abs/2207.08824},
  author={Jiao, Rui and Han, Jiaqi and Huang, Wenbing and Rong, Yu and Liu, Yang},
  title={Energy-Motivated Equivariant Pretraining for 3D Molecular Graphs},
  publisher={arXiv},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
checkpoints		checkpoints
config		config
data		data
mgp		mgp
script		script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

checkpoints

checkpoints

config

config

data

data

mgp

mgp

script

script

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Energy-Motivated Equivariant Pretraining for 3D Molecular Graphs (AAAI 2023)

Dependencies

Data Preparation

Pretraining

Finetuning

Citation

About

Releases

Packages

Contributors 2

Languages

License

jiaor17/3D-EMGP

Folders and files

Latest commit

History

Repository files navigation

Energy-Motivated Equivariant Pretraining for 3D Molecular Graphs (AAAI 2023)

Dependencies

Data Preparation

Pretraining

Finetuning

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages