Skip to content

samuel-boobier/ML-MOFs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML for MOF Property Prediction

Getting Started

The code is written in Python 3.8. We recommend using Anaconda to build a virtual environment. Details of how to download Anaconda can be found here:

https://www.anaconda.com/

After cloning the repository, create a virtual environment by running the following commands in terminal or Anaconda prompt.

conda create --name ML_MOFs --file requirements.txt
conda activate ML_MOFs

The kaleido package is required to save graphs. Install this using pip.

pip install -U kaleido

Datasets

All the data used in this study can be found in ML_MOFs/Data/

MOF_data.csv - Target and descriptor values for the dataset used for initial 10-fold cross validation
MOF_data_test.csv - Target and descriptor values for the unseen test set

Dataset Analysis

To perform an analysis of the dataset run ML_MOFs/Analysis/data_analysis.py.

Basic statistics and pairwise descriptor correction are saved to ML_MOFs/Results/Analysis_results/.

Histograms of target and descriptor ranges are saved to ML_MOFs/Graphs/Analysis_graphs/.

Machine Learning

For 10-fold cross validation on the full dataset, run ML_MOFs/ML/ML_main.py. Predictions are saved to ML_MOFs/Results/ML_results/Classification and ML_MOFs/Results/ML_results/Regression.

Further analysis of the models produced can be generated by running ML_MOFs\ML\classification_analysis.py and ML_MOFs\ML\regression_analysis.py.

Additional Figures

Additional figures are generated by running ML_MOFs/figures.py and are saved in ML_MOFs/Graphs/Figures.

Running the model for your own training/test sets

Run ML_MOFs/ML/test_ML.py changing lines 84 and 85 to the location of your training and test sets respectively.

Note: you will need to calculate the descriptors as detailed in our publication prior to machine learning, using our datasets as a template for column names.

RASPA Input Files

Sample RASPA input files can be found in ML_MOFs/RASPA_Input_Files/

Data Curation Protocols

These can be found in ML_MOFs/Curation/. In this location there is also CIF files of structures which did pass curation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published