ALIGNN was trained for 1000 epochs using L1 loss. The model that performed best on the validation set was uploaded to Figshare as 2023-06-02-pbenner-best-alignn-model.pth.zip
and used for predictions. This required minor changes to the ALIGNN source code provided in alignn-2023.01.10.patch
- Fix use without test set (see ALIGNN #104). In this case, we forked a test set, but it might be better to use the entire data, as mentioned above, especially if the test set by chance contains some important outliers.
- The
Checkpoint
handler in ALIGNN does not define a score name (seetrain.py
), so it will just save the last two models during training. With this patch, also the best model in terms of accuracy on the validation set is saved, which is the one used to make predictions. This is important because I used a relatively largen_early_stopping
in case the validation accuracy shows a double descent (see Figure 10).
The changes in alignn-2023.01.10.patch
were applied to ALIGNN version 2023.01.10
.
To reproduce the alignn
package state used for this submission, run
pip install alignn==2023.01.10
alignn_dir=$(python -c "import alignn; print(alignn.__path__[0])")
cd $alignn_dir
git apply /path/to/alignn-2023.01.10.patch
Replace /path/to/
with the actual path to the patch file.
The directory contains the following files, which must be executed in the given order to reproduce the results:
train_alignn.py
: Train an ALIGNN model on all 154k MP computed structure entries. The resulting model checkpoint is saved to theout_dir
variable in that script and also uploaded towandb
from where it is publicly available for 3rd party reproducibility.test_alignn.py
: Test a trained ALIGNN model on the WBM data. Generated2023-06-03-mp-e-form-alignn-wbm-IS2RE.csv.gz
.