This is the code for "Self-supervised molecular pretraining strategy for low-resource reaction prediction scenarios" paper.
conda env create -f environment.yml
The data for training, dev and testing of Baeyer-Villiger reaction are provided in data/Baeyer-Villiger reaction
file.
The data for training, dev and testing of Heck prediction are provided in data/Heck reaction
file.
The data for training, dev and testing of C-C bond formation reaction are provided in data/ C-C bond formation reaction
file.
The data for training, dev and testing of Functional group interconversion reaction are provided in data/ Functional group interconversion reaction
file.
The data for reaction transfer learning are provided in data/ USPTO reaction
file.
The molecular ChEMBL data can be found https://www.ebi.ac.uk/chembl/
The molecular ZINC data can be found http://zinc.docking.org/tranches/home/
run train_transformer.py get a baseline model
run train_downstream.py get the molecule-pretrained-Mass model
run test.py