# Building Classification Models

## Machine learning for discovery of MOFs for Carbon Capture

In this projetc, I will build machine learning classification models to identify promising metal–organic frameworks (MOFs) for carbon capture applications.

Carbon capture is a complex challenge involving multiple material considerations — including pore geometry, surface chemistry, mechanical and thermal stability, water stability, and economic feasibility.

To make the problem tractable, I will focus on two experimentally measurable properties that strongly influence performance:
1) CO₂ uptake at low pressure, which reflects adsorption capacity and selectivity.

2) Water stability, which determines a material’s long-term durability and reusability.

By the end of project, I’ll have trained and evaluated models that can distinguish between promising and non-promising MOFs based on these properties — demonstrating how data-driven methods can accelerate materials discovery.

# 0. Setup programming environment

### 0.1 Import packages we will need

In [None]:
# basics
import os
import numpy as np
import pprint as pp

# pandas is used to read/process data
import pandas as pd
from ydata_profiling import ProfileReport

# machine learning dependencies
# scaling of data
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler
# train/test split
from sklearn.model_selection import train_test_split
# model selection
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
# the KRR model
from sklearn.kernel_ridge import KernelRidge
# linear model
from sklearn.linear_model import LinearRegression, SGDRegressor
# pipeline to streamline modeling pipelines
from sklearn.pipeline import Pipeline
# principal component analysis
from sklearn.decomposition import PCA
# polynomial kernel
from sklearn.metrics.pairwise import polynomial_kernel
# Dummy model as baseline
from sklearn.dummy import DummyClassifier, DummyRegressor
# Variance Threshold for feature selection
from sklearn.feature_selection import VarianceThreshold, SelectFromModel
# metrics to measure model performance
from sklearn.metrics import (accuracy_score, precision_score, recall_score, f1_score,
                             mean_absolute_error, mean_squared_error, max_error, mean_absolute_percentage_error)
# confusion matrix
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, ConfusionMatrixDisplay
# xg boost classifer
from xgboost import XGBClassifier



# save/load models
import joblib

# For the permutation importance implementation
from joblib import Parallel
from joblib import delayed
from sklearn.metrics import check_scoring
from sklearn.utils import Bunch
from sklearn.utils import check_random_state
from sklearn.utils import check_array

# plotting
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline