https://www.anaconda.com/
After cloning the repository, create a virtual environment by running the following commands in terminal or Anaconda prompt.
conda create --name ML_MOFs --file requirements.txt
conda activate ML_MOFs
The kaleido package is required to save graphs. Install this using pip.
pip install -U kaleido
MOF_data.csv - Target and descriptor values for the dataset used for initial 10-fold cross validation
MOF_data_test.csv - Target and descriptor values for the unseen test set To perform an analysis of the dataset run ML_MOFs/Analysis/data_analysis.py.
Basic statistics and pairwise descriptor correction are saved to ML_MOFs/Results/Analysis_results/.
Histograms of target and descriptor ranges are saved to ML_MOFs/Graphs/Analysis_graphs/. For 10-fold cross validation on the full dataset, run ML_MOFs/ML/ML_main.py. Predictions are saved to ML_MOFs/Results/ML_results/Classification and ML_MOFs/Results/ML_results/Regression.
Further analysis of the models produced can be generated by running ML_MOFs\ML\classification_analysis.py and ML_MOFs\ML\regression_analysis.py. Additional figures are generated by running ML_MOFs/figures.py and are saved in ML_MOFs/Graphs/Figures. Run ML_MOFs/ML/test_ML.py changing lines 84 and 85 to the location of your training and test sets respectively.
Note: you will need to calculate the descriptors as detailed in our publication prior to machine learning, using our datasets as a template for column names. Sample RASPA input files can be found in ML_MOFs/RASPA_Input_Files/ These can be found in ML_MOFs/Curation/. In this location there is also CIF files of structures which did pass curation.