Inorganic Reaction Representation Learning and Product Prediction.
Implementation of Predicting the outcomes of materials syntheses with deep learning [ArXiv].
See requirements.txt file
The raw dataset used for this work can be downloaded using the following command (linux):
mkdir data/datasets
wget -cO - https://ndownloader.figshare.com/files/17412674 > data/datasets/solid-state_dataset_2019-06-27.json
More recent versions of the dataset are released by the original authors here
The element embeddings used in this work are found here: Unsupervised word embeddings capture latent knowledge from materials science literature
preprocess.py
is used to generate the dataframes and supporting files from the raw data. The number of elements and precursors can be adjusted using optional arguments.
Using the default seed (0) gives the dataset splittings used in the paper.
train_action_rnn.py
is used for training the action sequence autoencoder.
train_reaction_graph.py
is used for training the reaction graph model without action sequences.
train_reaction_graph_with_actions.py
is used for training the reaction graph model with action sequences.
train_baseline.py
is used for training the baseline magpie model.
train_stoich.py
is used for training the stoichiometry prediction model.
Model dimensions and Hyperparameters can be set using argparse flags.
Preprocessing:
python preprocess.py --dataset data/datasets/solid-state_dataset_2019-06-27.json \
--max-prec 10 --min-prec 2 \
--ps _10_precs --seed 0
Training Action Autoencoder:
python train_action_rnn.py --train-path data/train_10_precs.pkl \
--test-path data/test_10_precs.pkl \
--action-path data/action_dict_10_precs.json
Note the --split-prec-amts
flag should be used to split out the data such that it can be
used with the baseline model.
Training product element prediction model (with actions):
python train_reaction_graph_with_actions.py --train-path data/train_10_precs.pkl \
--test-path data/test_10_precs.pkl \
--fea-path data/magpie_embed_10_precs.json \
--action-path data/action_dict_10_precs.json \
--elem-path data/elem_dict_10_precs.json \
--action-rnn models/checkpoint_rnn_f-0_s-0_t-1.pth.tar \
--train-rnn --mask --amounts \
--ensemble 5
Get reaction embeddings for full dataset (for training stoichiometry prediction)
python train_reaction_graph_with_actions.py --train-path data/train_10_precs.pkl \
--test-path data/test_10_precs.pkl \
--fea-path data/magpie_embed_10_precs.json \
--action-path data/action_dict_10_precs.json \
--elem-path data/elem_dict_10_precs.json \
--action-rnn models/checkpoint_rnn_f-0_s-0_t-1.pth.tar \
--train-rnn --mask --amounts \
--ensemble 5 \
--get-reaction-emb
Training the stoichiometry prediction model:
python train_stoich.py --train-path data/train_f-1_emb_reaction_graph_actions.pkl \
--test-path data/test_f-1_emb_reaction_graph_actions.pkl \
--elem-path data/elem_dict_10_precs.json \
--elem-fea-path data/embeddings/matscholar-embedding.json \
--use-correct-targets \
--ensemble 5
For end-to-end testing, use the --evaluate
flag on the trained product prediction model to obtain the element predictions, then the --evaluate
flag on the trained stoichiometry prediction model (removing the --use-correct-targets
flag in the example).
Please cite if you have found our work helpful:
@article{doi:10.1021/acs.chemmater.0c03885,
author = {Malik, Shreshth A. and Goodall, Rhys E. A. and Lee, Alpha A.},
title = {Predicting the Outcomes of Material Syntheses with Deep Learning},
journal = {Chemistry of Materials},
volume = {33},
number = {2},
pages = {616-624},
year = {2021},
doi = {10.1021/acs.chemmater.0c03885},
URL = {https://doi.org/10.1021/acs.chemmater.0c03885},
eprint = {https://doi.org/10.1021/acs.chemmater.0c03885}
}
This is research code shared without support or guarantee of quality. Please let me know however if there is anything wrong or that could be improved and I will try to solve it.