Skip to content

kyriemao/MGET

 
 

Repository files navigation

Molecular GET

This is the primary code for the "GET-LT1" model in "Molecular Graph Enhanced Transformer for Retrosynthesis Prediction". Our code is based on OpenNMT and DGL.

Install requirements

Create a new conda environment:

conda create -n mget python=3.7
source activate mget
conda install rdkit -c rdkit
conda install future six tqdm pandas

The code was tested for pytorch 0.4.1, to install it go on Pytorch. Select the right operating system and CUDA version and run the command, e.g.:

conda install pytorch=0.4.1 torchvision -c pytorch

Then,

pip install torchtext==0.3.1
pip install -e . 

Then, install DGL

pip install dgl

Besides, you have to replace three source files(batch.py, field.py, iterator.py) of the torchtext library in "anaconda3/envs/mget/python3.7/site-packages/torchtext/data" with the corrsponding three files contained in "replace_torchtext" since we have modified some codes in these files.

Preprocessing

bash pre.sh

Train the model

The "data2" contains UPSTO-50K without reaction type. To train the model,

bash train.sh

The parameter settings of the "transformer encoder" described in the paper can be found in "train.sh". You can modify the saving location of the model (default is experiments/checkpoints2).

Translation

To generate the output SMILES,

bash trans.sh

Default settings is to generate top-10 candidates.

Evaluation

To evaluate our model,

bash eval.sh

If you want to modify the preprocessing/training/translation settings, you can refer to http://opennmt.net/OpenNMT-py/ to modify "pre.sh", "train.sh" and "trans.sh".

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 87.5%
  • Perl 4.9%
  • Shell 4.1%
  • Emacs Lisp 2.6%
  • Smalltalk 0.3%
  • Ruby 0.3%
  • Other 0.3%