This is an official implementation of our paper Self-improved Retrosynthetic Planning (ICML 2021) This implemented on top of Retro*.
conda create -n sirp python=3.7 pytorch=1.5.1 cudatoolkit=10.1 torchvision -c pytorch
conda activate sirp
conda install pandas
conda install rdkit -c rdkit
conda install networkx
conda install graphviz
conda install python-graphviz
pip install tqdm
pip install -e retro_star/packages/mlp_retrosyn
pip install -e retro_star/packages/rdchiral
pip install -e .
Download dataset following Retro*.
Download 299202 target molecule dataset from link, and place the pkl file into ./retro_star/dataset/
For a reference backward reaction model, we use backward reaction model trained by authors of Retro* (./one_step_model/saved_rollout_state_1_2048.ckpt). We initialize parameters of a backward reaction model to that of the reference backward model.
We provide our pre-trained forward reaction model, in this link, which is trained by reaction dataset constructed by authors of Retro*. Place the forward.ckpt file in ./retro_star/one_step_model/forward/.
We provide checkpoint of bacward reaction model trainded by us in this link. See retro_star_zero_ours.ckpt and retro_star_value_ours.ckpt
Following scripts runs Retro*-0 + ours. If you want run Retro* + ours, add "--use_value_fn" option in retro_plan.py (at step 1 and 3)
python retro_plan.py \
--test_routes ${TARGET_MOL_DATASET} \
--mlp_model_dump ${BACKWARD_MODEL} \
--result_folder ${RESULT_FOLDER} \
--iteration 500
python proc_gold_rxn.py \
--plan \
${RESULT_FOLDER}/plan.pkl \
--save_path \
${RESULT_FOLDER}/gold.csv \
--tpl2prod_save_path \
${RESULT_FOLDER}/templates.dat \
--cut_off \
-thr 0.8 \
--aug_forward \
--forward_model ${FORWARD_MODEL} \
--fw_backward_validate
CUDA_VISIBLE_DEVICES=${GPU} \
python -m packages.mlp_retrosyn.mlp_retrosyn.mlp_train \
--template_path \
${RESULT_FOLDER}/templates.dat \
--template_path_test \
${RESULT_FOLDER}/templates.dat \
--template_rule_path
./one_step_model/template_rules_1.dat \
--model_dump_folder \
${MODEL_DUMP_FOLDER} \
--fp_dim 2048 \
--batch_size 1024 \
--dropout_rate 0.4 \
--learning_rate 0.0001 \
--train_path \
${RESULT_FOLDER}/gold.csv \
--test_path \
${RESULT_FOLDER}/gold.csv \
--train_from \
${BACKWARD_MODEL} \
--epochs 20
To parallelize the reaction pathway generation, we recommend to split the target molecule dataset.
mkdir ./retro_star/dataset/train_routes_shards
python preprocessing/split_big_dataset.py
We offer a script which can conduct whole procedure of our framework at once. (We assume that you have access to 4 GPU. Otherwise, change the GPU_NUM option, i.e., GPU_NUM=1)
For iteration 1,
./scripts/retro_star_zero.sh \
${backward model_path} \
${forward model_path} \
${iter} \
${shard_from} ${shard_to} \
${gpu_start} ${gpu_num}
e.x)
./scripts/retro_star_zero.sh \
one_step_model/saved_rollout_state_1_2048.ckpt \
./retro_star/one_step_model/forward/forward_model.ckpt \
1 \
0 11 \
0 4
If you want iterate one more (iteration 2) with the updated backward model, run the following script.
./scripts/retro_star_zero.sh \
${trained_backward_model} \
${forward_model_path} \
2 \
0 11 \
0 4
For iteration 1,
./scripts/retro_star_value.sh \
${backward model_path} \
${forward model_path} \
${iter} \
${shard_from} ${shard_to} \
${gpu_start} ${gpu_num}
python eval_one_step.py \
--model_path \
${trained_backward_model}
You can evaluate retrosynthetic planning on Retro*-0 + Ours with following script.
cd retro_star
python retro_plan.py \
--iteration 500 \
--result_folder \
./results/retro_star_zero/x1/multi-step/eval \
--mlp_model_dump ${trained_backward_model}
You can evaluate retrosynthetic planning on Retro* + Ours with following script.
cd retro_star
python retro_plan.py \
--iteration 500 \
--result_folder \
./results/retro_star_zero/x1/multi-step/eval \
--mlp_model_dump ${trained_backward_model} \
--use_value_fn
After that, you can measure length | time | cost from the log file (plan.pkl)
python evaluate.py ./results/retro_star_zero/x1/multi-step/eval/plan.pkl