LERN:Local environment interaction-based machine learning framework for predicting molecular adsorption energy
This software package implements the Local environment ResNet (LERN) that takes an framework for predicting molecular adsorption energy.
The package provides two major functions:
- Train a LERN model with a customized dataset.
- Predict material properties of new molecular with a pre-trained LERN model.
The following paper describes the details of the LEI-framework:
Please cite the following work if you want to use LERN.
Li Y, Wu Y, Han Y, Lyu Q, Wu H, Zhang X, Shen L. Local environment interaction-based machine learning framework for predicting molecular adsorption energy. J Mater Inf 2024;4:[Accept]. http://dx.doi.org/10.20517/jmi.2023.41
This package requires:
If you are new to Python, the easiest way of installing the prerequisites is via conda. After installing conda, run the following command to create a new environment named lern
and install all prerequisites:
conda upgrade lern
conda create -n lern python=3.9 pytorch torchvision scikit-learn ase pymatgen -c pytorch -c conda-forge
This creates a conda environment for running LERN. Before using LERN, activate the environment by:
conda activate lern
To input Local Environment to LERN, you will need to define a customized dataset. Note that this is required for both training and predicting.
Before defining a customized dataset, you will need:
- CIF files recording the structure of the moleculars that you are interested in
- The target properties for each molecular (not needed for predicting, but you need to put some random numbers(eg."0") in
id_prop.csv
)
You can create a customized dataset by creating the following files:
-
id_prop.csv
: a CSV file with two columns. The first column recodes a uniqueID
for each crystal, and the second column recodes the value of target property. If you want to predict material properties withpredict.py
, you can put any number in the second column. (The second column is still needed.) -
cif
folder: a CIF file that recodes the molecular structure, where file name is the uniquename
for the molecular.
The structure of the dataset should be:
id_prop.csv
cif
├── name0.cif
├── name1.cif
├── ...
Before training a new LERN model, you will need to:
- Define a customized dataset to store the structure-property relations of interest.
Then, in directory lern
, you can train a LERN model for your customized dataset by:
python main.py
After training, you will get two files in lern
directory.
model.pth
: stores the LERN model with the best validation accuracy.results.csv
: stores thename
, target value, and predicted value for each molecular.
Before predicting the material properties, you will need to:
- Define a customized dataset for all the crystal structures that you want to predict.
- Obtain a pre-trained LERN model named
model.pth
.
Then, in directory lern
, you can predict the properties of the moleculars:
python predict.py
After predicting, you will get one file in lern
directory:
predict.csv
: stores thename
and predicted value for each molecular.
This software was primarily written by Li Yifan who was advised by Prof. Shen Lei.
LERN is released under the NUS License.