# Step 1: Load the Dataset

**Description:**  
Load the dataset from the Excel file **"TFC_NF_membrane_data.xlsx"** and display the first five rows.




In [3]:
import pandas as pd

# Loaded the dataset from  Excel file 
file_path = "TFC_NF_membrane_data.xlsx"
df = pd.read_excel(file_path)

# Displaying the first 5 rows of the dataset
print("First 5 rows of the dataset:")
print(df.head())


First 5 rows of the dataset:
          Type  Size (nm)      Shape  Pore size (Ǻ)  Bond-ing*    Char-ge  \
0          NaN        NaN        NaN            NaN        NaN  (+, -, 0)   
1  NaA zeolite      100.0  Spherical            4.0        0.0          -   
2  NaA zeolite      100.0  Spherical            4.0        0.0          -   
3  NaA zeolite      100.0  Spherical            4.0        0.0          -   
4  NaA zeolite      100.0  Spherical            4.0        0.0          -   

  Phase**,  Loading     RR    RCA        CWP  CSP    RWP    RSP  
0  A or O       NaN    NaN    NaN  (LMH/bar)  (%)    NaN    NaN  
1        A  0.00004  1.015  0.903      0.767  6.5  1.023  0.892  
2        A  0.00010  0.956  0.857      0.767  6.5  1.174  0.908  
3        A  0.00040  0.931  0.824      0.767  6.5  1.343  0.892  
4        A  0.00100  0.952  0.713      0.767  6.5  1.488  0.969  


# Step 2: Inspect the Data

**Description:**  
Examine the dataset's structure, data types, and missing values.



In [6]:
# Step 2: Inspect the Data
print("Dataset Info:")
df.info()

print("\nMissing Values per Column:")
print(df.isnull().sum())


Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 189 entries, 0 to 188
Data columns (total 14 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Type           188 non-null    object 
 1   Size (nm)      188 non-null    float64
 2   Shape          188 non-null    object 
 3   Pore size (Ǻ)  188 non-null    float64
 4   Bond-ing*      188 non-null    float64
 5   Char-ge        189 non-null    object 
 6   Phase**,       189 non-null    object 
 7   Loading        188 non-null    float64
 8   RR             78 non-null     float64
 9   RCA            134 non-null    float64
 10  CWP            189 non-null    object 
 11  CSP            189 non-null    object 
 12  RWP            188 non-null    float64
 13  RSP            188 non-null    float64
dtypes: float64(8), object(6)
memory usage: 20.8+ KB

Missing Values per Column:
Type               1
Size (nm)          1
Shape              1
Pore size (Ǻ)      1
Bond-ing*   

# Step 3: Cleaning Column Names and Converting Target Columns

**Description:**  
Cleaning the column names by removing unwanted characters and spaces, then convert the target columns **CWP** and **CSP** from object to numeric type.



In [9]:
# Step 3: Cleaning Column Names
df.columns = df.columns.str.strip().str.replace('[^a-zA-Z0-9_]', '_').str.replace('__', '_')
print("Cleaned Column Names:")
print(df.columns)

# Converting target columns CWP and CSP to numeric (force errors to NaN)
df['CWP'] = pd.to_numeric(df['CWP'], errors='coerce')
df['CSP'] = pd.to_numeric(df['CSP'], errors='coerce')

# Checking conversion by printing data types of CWP and CSP
print("\nData Types for CWP and CSP after conversion:")
print(df[['CWP', 'CSP']].dtypes)


Cleaned Column Names:
Index(['Type', 'Size (nm)', 'Shape', 'Pore size (Ǻ)', 'Bond-ing*', 'Char-ge',
       'Phase**,', 'Loading', 'RR', 'RCA', 'CWP', 'CSP', 'RWP', 'RSP'],
      dtype='object')

Data Types for CWP and CSP after conversion:
CWP    float64
CSP    float64
dtype: object


# Step 4: Impute Missing Values for Small Gaps

**Description:**  
For columns with only a few missing values (e.g., **Size (nm)**, **Pore size (Ǻ)**, **Bond-ing***, **Loading**, **RWP**, and **RSP**), imputing the missing values using the median strategy.




In [93]:
from sklearn.impute import SimpleImputer

# Defining the columns with small missing gaps
small_gap_cols = ['Size (nm)', 'Pore size (Ǻ)', 'Bond-ing*', 'Loading', 'RWP', 'RSP']

# Initialize the imputer with median strategy
imputer = SimpleImputer(strategy='median')

# Impute the missing values for these columns
df[small_gap_cols] = imputer.fit_transform(df[small_gap_cols])

# Checking missing values after imputation
print("Missing values after imputing small gaps:")
print(df.isnull().sum())


Missing values after imputing small gaps:
Type             0
Size (nm)        0
Shape            0
Pore size (Ǻ)    0
Bond-ing*        0
Char-ge          0
Phase**,         0
Loading          0
RR               0
RCA              0
CWP              0
CSP              0
RWP              0
RSP              0
dtype: int64


# Step 5: Fill Missing Values for Categorical Columns

**Description:**  
Filling in missing values for the categorical columns **Type** and **Shape** using the mode (most frequent value).




In [96]:
# Step 5: Filling missing values in categorical columns
df['Type'] = df['Type'].fillna(df['Type'].mode()[0])
df['Shape'] = df['Shape'].fillna(df['Shape'].mode()[0])

# Checking missing values after filling Type and Shape
print("Missing values after filling Type and Shape:")
print(df.isnull().sum())


Missing values after filling Type and Shape:
Type             0
Size (nm)        0
Shape            0
Pore size (Ǻ)    0
Bond-ing*        0
Char-ge          0
Phase**,         0
Loading          0
RR               0
RCA              0
CWP              0
CSP              0
RWP              0
RSP              0
dtype: int64


 # Step 6: Impute Missing Values in Target Columns (CWP and CSP)

**Description:**  
Impute any missing values in the target columns **CWP** and **CSP** using the median value.



In [98]:
# Step 6: Imputeing Missing Values for Target Columns (CWP and CSP)
df['CWP'] = df['CWP'].fillna(df['CWP'].median())
df['CSP'] = df['CSP'].fillna(df['CSP'].median())

# Checking missing values after imputing CWP and CSP
print("Missing values after imputing CWP and CSP:")
print(df.isnull().sum())


Missing values after imputing CWP and CSP:
Type             0
Size (nm)        0
Shape            0
Pore size (Ǻ)    0
Bond-ing*        0
Char-ge          0
Phase**,         0
Loading          0
RR               0
RCA              0
CWP              0
CSP              0
RWP              0
RSP              0
dtype: int64


 # Step 7: Imputing Missing Values for Large Gaps (RR and RCA)

**Description:**  
Using predictive imputation with a Linear Regression model to fill missing values:
- First, impute **RR** using available numerical features.
- Then, impute **RCA** using the same features (including the now-complete **RR**).



In [23]:
from sklearn.linear_model import LinearRegression

# Step 7a: Impute missing values in "RR"
# Separate rows where RR is not missing
rr_not_null = df[df['RR'].notnull()]
rr_null = df[df['RR'].isnull()]

# Define features to use for predicting RR
features_for_rr = ['Size (nm)', 'Pore size (Ǻ)', 'Bond-ing*', 'Loading', 'RWP', 'RSP', 'CWP', 'CSP']

# Train a Linear Regression model on the rows with available RR
rr_model = LinearRegression()
rr_model.fit(rr_not_null[features_for_rr], rr_not_null['RR'])

# Predict missing RR values
predicted_rr = rr_model.predict(rr_null[features_for_rr])

# Fill the missing RR values with the predictions
df.loc[df['RR'].isnull(), 'RR'] = predicted_rr

# Check missing values for RR
print("Missing values after imputing RR:")
print(df['RR'].isnull().sum())


Missing values after imputing RR:
0


### Step 7b: Impute Missing Values for RCA

In [26]:
# Step 7b: Impute missing values in "RCA"
# Separate rows where RCA is not missing and where it is missing
rca_not_null = df[df['RCA'].notnull()]
rca_null = df[df['RCA'].isnull()]

# Define features to use for predicting RCA (include RR since it's now complete)
features_for_rca = ['Size (nm)', 'Pore size (Ǻ)', 'Bond-ing*', 'Loading', 'RWP', 'RSP', 'CWP', 'CSP', 'RR']

# Train a Linear Regression model on the rows with available RCA
rca_model = LinearRegression()
rca_model.fit(rca_not_null[features_for_rca], rca_not_null['RCA'])

# Predict missing RCA values
predicted_rca = rca_model.predict(rca_null[features_for_rca])

# Fill the missing RCA values with the predictions
df.loc[df['RCA'].isnull(), 'RCA'] = predicted_rca

# Check missing values for RCA
print("Missing values after imputing RCA:")
print(df['RCA'].isnull().sum())


Missing values after imputing RCA:
0


 # Step 8: Encoding Categorical Variables

**Description:**  
Standardize and encode the categorical columns:
- **Char-ge:** Convert to string, trim spaces, and map values (`(+, -, 0)` → 0, `-` → -1, `0` → 0, `+` → 1).
- **Phase:** Trim and map values (`A or O` → 0, `A` → 1, `O` → 2).
- **Type** and **Shape:** Apply label encoding.




In [29]:
# Step 8: Encode Categorical Variables
from sklearn.preprocessing import LabelEncoder

# First, let's inspect unique values for the categorical columns:
print("Unique values in 'Char-ge':", df['Char-ge'].unique())
print("Unique values in 'Phase**,':", df['Phase**,'].unique())
print("Unique values in 'Type':", df['Type'].unique())
print("Unique values in 'Shape':", df['Shape'].unique())

# Convert 'Char-ge' and 'Phase**,'
df['Char-ge'] = df['Char-ge'].astype(str).str.strip()
df['Phase**,'] = df['Phase**,'].astype(str).str.strip()

# Map 'Char-ge' to numeric values
charge_map = {
    '(+, -, 0)': 0,
    '-': -1,
    '0': 0,
    '+': 1
}
df['Char-ge'] = df['Char-ge'].map(charge_map)

# Map 'Phase**,'
phase_map = {
    'A or O': 0,
    'A': 1,
    'O': 2
}
df['Phase**,'] = df['Phase**,'].map(phase_map)

# For 'Type' and 'Shape', we'll use label encoding (if they're not numeric already)
le = LabelEncoder()
df['Type'] = le.fit_transform(df['Type'])
df['Shape'] = le.fit_transform(df['Shape'])

# Verify the transformations:
print("\nTransformed Categorical Columns:")
print(df[['Type', 'Shape', 'Char-ge', 'Phase**,']].head())


Unique values in 'Char-ge': ['(+, -, 0)' '-' 0 '+']
Unique values in 'Phase**,': ['A or O ' 'A' 'O']
Unique values in 'Type': ['NaA zeolite' 'P-8Phenyl' 'P-1NH2' 'P-1NH3' 'P-8NH3Cl' 'P-8NH2' 'P-8NH3'
 'P-8NH4' 'Carboxylic MWNT' 'Carbon Nanotubes with acidic group (CNTa)'
 'Graphene Oxide (GO)' 'CNTa +GO (7:3)' 'Graphene oxide sheets'
 'Graphene oxide' '\ufeffMontmorillonite (MMT) Sodium Aluminium'
 '\ufefflayered double hydroxide with magnesium/aluminium cations' 'TiO2'
 'GO' 'rGO/TiO2' 'rGO/TiO3' 'rGO/TiO4' 'rGO/TiO5' 'rGO/TiO6'
 '\ufeffhydrophobic zeolitic imidazolate framework-8'
 '\ufeffhydrophobic zeolitic imidazolate framework-9'
 '\ufeffhydrophobic zeolitic imidazolate framework-10'
 '\ufeffhydrophobic zeolitic imidazolate framework-11'
 'Non-porous spherical Silica NP' 'MCM-41 silica (Porous)' 'MIL-101 (Cr)'
 'Amine functionalized CNT' 'Linde Type A zeolite' 'Silicon dioxide'
 'MCM 48' 'MCM 49' 'MCM 50' 'MCM 51' 'MCM 52' 'MCM 53' 'MCM 54' 'MCM 55'
 'MCM 56' 'MCM 57' 'carbon dot

# Step 9: Separate Features and Targets

**Description:**  
Separate the dataset into a feature matrix **X** (all columns except **CWP** and **CSP**) and target vectors for **CWP** and **CSP**.




In [32]:
# Step 9: Separate Features and Target Variables
X = df.drop(columns=['CWP', 'CSP'])
y_cwp = df['CWP']
y_csp = df['CSP']

# Print the shapes to confirm
print("Features (X) shape:", X.shape)
print("CWP (target) shape:", y_cwp.shape)
print("CSP (target) shape:", y_csp.shape)


Features (X) shape: (189, 12)
CWP (target) shape: (189,)
CSP (target) shape: (189,)


 # Step 10: Train-Test Split for CWP

**Description:**  
Split the data for **CWP** prediction into an 80/20 training/testing split.




In [35]:
from sklearn.model_selection import train_test_split

# Split data for CWP prediction
X_train_cwp, X_test_cwp, y_train_cwp, y_test_cwp = train_test_split(X, y_cwp, test_size=0.2, random_state=42)

# Print the shapes to confirm the split
print("CWP Split:")
print("X_train_cwp shape:", X_train_cwp.shape)
print("X_test_cwp shape:", X_test_cwp.shape)
print("y_train_cwp shape:", y_train_cwp.shape)
print("y_test_cwp shape:", y_test_cwp.shape)


CWP Split:
X_train_cwp shape: (151, 12)
X_test_cwp shape: (38, 12)
y_train_cwp shape: (151,)
y_test_cwp shape: (38,)


# Step 11: Feature Scaling for CWP

**Description:**  
Scale the training and test feature sets for CWP using StandardScaler.




In [38]:
from sklearn.preprocessing import StandardScaler

# Initialize the scaler
scaler = StandardScaler()

# Fit and transform the training features, and transform the test features
X_train_cwp_scaled = scaler.fit_transform(X_train_cwp)
X_test_cwp_scaled = scaler.transform(X_test_cwp)

# Print shapes to confirm
print("Scaled X_train_cwp shape:", X_train_cwp_scaled.shape)
print("Scaled X_test_cwp shape:", X_test_cwp_scaled.shape)


Scaled X_train_cwp shape: (151, 12)
Scaled X_test_cwp shape: (38, 12)


 # Step 12: Train a Linear Regression Model for CWP

**Description:**  
Train a baseline Linear Regression model on the scaled CWP training data and evaluate on the test set.




In [41]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Initialize the Linear Regression model
lr_cwp = LinearRegression()

# Train the model on the scaled training data
lr_cwp.fit(X_train_cwp_scaled, y_train_cwp)

# Predict on the scaled test data
y_pred_cwp_lr = lr_cwp.predict(X_test_cwp_scaled)

# Evaluate the model performance
mse_cwp_lr = mean_squared_error(y_test_cwp, y_pred_cwp_lr)
r2_cwp_lr = r2_score(y_test_cwp, y_pred_cwp_lr)

print(f"Linear Regression for CWP - MSE: {mse_cwp_lr:.4f}, R2: {r2_cwp_lr:.4f}")


Linear Regression for CWP - MSE: 0.5085, R2: 0.3029


 # Step 13: Train a Random Forest Model for CWP

**Description:**  
Train a Random Forest Regressor on the scaled CWP training data and evaluate its performance on the test set.




In [44]:
from sklearn.ensemble import RandomForestRegressor

# Initialize the Random Forest model
rf_cwp = RandomForestRegressor(n_estimators=200, random_state=42)

# Train the model on the scaled training data
rf_cwp.fit(X_train_cwp_scaled, y_train_cwp)

# Predict on the scaled test data
y_pred_cwp_rf = rf_cwp.predict(X_test_cwp_scaled)

# Evaluate the model performance
mse_cwp_rf = mean_squared_error(y_test_cwp, y_pred_cwp_rf)
r2_cwp_rf = r2_score(y_test_cwp, y_pred_cwp_rf)

print(f"Random Forest for CWP - MSE: {mse_cwp_rf:.4f}, R2: {r2_cwp_rf:.4f}")


Random Forest for CWP - MSE: 0.4574, R2: 0.3729


 # Step 14: Hyperparameter Tuning for CWP Using GridSearchCV

**Description:**  
Perform hyperparameter tuning on the Random Forest model for CWP using GridSearchCV.  
Tuning parameters include `n_estimators`, `max_depth`, `max_features`, `min_samples_split`, and `min_samples_leaf`.




In [47]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor

# Define a parameter grid for tuning
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20],
    'max_features': ['sqrt', 'log2'],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2]
}

# Initialize the Random Forest model
rf_model = RandomForestRegressor(random_state=42)

# Setup GridSearchCV
grid_search = GridSearchCV(
    estimator=rf_model,
    param_grid=param_grid,
    cv=5,
    scoring='neg_mean_squared_error',
    n_jobs=-1,
    verbose=1
)

# Fit GridSearchCV on the scaled training data for CWP
grid_search.fit(X_train_cwp_scaled, y_train_cwp)

# Print best parameters
print("Best parameters for CWP:", grid_search.best_params_)

# Evaluate the tuned model on the test set
best_rf_cwp = grid_search.best_estimator_
y_pred_tuned_cwp = best_rf_cwp.predict(X_test_cwp_scaled)
mse_tuned_cwp = mean_squared_error(y_test_cwp, y_pred_tuned_cwp)
r2_tuned_cwp = r2_score(y_test_cwp, y_pred_tuned_cwp)

print(f"Tuned RF for CWP - MSE: {mse_tuned_cwp:.4f}, R2: {r2_tuned_cwp:.4f}")


Fitting 5 folds for each of 72 candidates, totalling 360 fits
Best parameters for CWP: {'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 100}
Tuned RF for CWP - MSE: 0.4191, R2: 0.4254


 # Step 15: Train-Test Split for CSP

**Description:**  
Split the dataset for **CSP** prediction (80/20 split).




In [50]:
from sklearn.model_selection import train_test_split

# For CSP prediction, separate features and target
X_csp = df.drop(columns=['CWP', 'CSP'])
y_csp = df['CSP']

# Split the data (80% train, 20% test) for CSP
X_train_csp, X_test_csp, y_train_csp, y_test_csp = train_test_split(X_csp, y_csp, test_size=0.2, random_state=42)

# Print the shapes to confirm the split
print("CSP Split:")
print("X_train_csp shape:", X_train_csp.shape)
print("X_test_csp shape:", X_test_csp.shape)
print("y_train_csp shape:", y_train_csp.shape)
print("y_test_csp shape:", y_test_csp.shape)


CSP Split:
X_train_csp shape: (151, 12)
X_test_csp shape: (38, 12)
y_train_csp shape: (151,)
y_test_csp shape: (38,)


# Step 16: Feature Scaling for CSP

**Description:**  
Scale the CSP training and test features using StandardScaler.



In [53]:
from sklearn.preprocessing import StandardScaler

# Step 16: Feature Scaling for CSP
scaler_csp = StandardScaler()
X_train_csp_scaled = scaler_csp.fit_transform(X_train_csp)
X_test_csp_scaled = scaler_csp.transform(X_test_csp)

print("Scaled X_train_csp shape:", X_train_csp_scaled.shape)
print("Scaled X_test_csp shape:", X_test_csp_scaled.shape)


Scaled X_train_csp shape: (151, 12)
Scaled X_test_csp shape: (38, 12)


# Step 17: Train a Linear Regression Model for CSP

**Description:**  
Train a baseline Linear Regression model for CSP using the scaled data and evaluate its performance.




In [56]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Initialize the Linear Regression model for CSP
lr_csp = LinearRegression()

# Train the model on the scaled CSP training data
lr_csp.fit(X_train_csp_scaled, y_train_csp)

# Predict on the scaled CSP test data
y_pred_csp_lr = lr_csp.predict(X_test_csp_scaled)

# Evaluate model performance
mse_csp_lr = mean_squared_error(y_test_csp, y_pred_csp_lr)
r2_csp_lr = r2_score(y_test_csp, y_pred_csp_lr)

print(f"Linear Regression for CSP - MSE: {mse_csp_lr:.4f}, R2: {r2_csp_lr:.4f}")


Linear Regression for CSP - MSE: 3.3799, R2: 0.3187


# Step 18: Train a Random Forest Model for CSP

**Description:**  
Train a Random Forest Regressor for CSP using the scaled training data and evaluate on the test set.




In [59]:
from sklearn.ensemble import RandomForestRegressor

# Initialize the Random Forest model for CSP
rf_csp = RandomForestRegressor(n_estimators=200, random_state=42)

# Train the model on the scaled CSP training data
rf_csp.fit(X_train_csp_scaled, y_train_csp)

# Predict on the scaled CSP test data
y_pred_csp_rf = rf_csp.predict(X_test_csp_scaled)

# Evaluate model performance
mse_csp_rf = mean_squared_error(y_test_csp, y_pred_csp_rf)
r2_csp_rf = r2_score(y_test_csp, y_pred_csp_rf)

print(f"Random Forest for CSP - MSE: {mse_csp_rf:.4f}, R2: {r2_csp_rf:.4f}")


Random Forest for CSP - MSE: 0.7099, R2: 0.8569


# Step 19: Hyperparameter Tuning for CSP Using GridSearchCV

**Description:**  
Perform hyperparameter tuning for the CSP Random Forest model using GridSearchCV.  
Tuning parameters include `n_estimators`, `max_depth`, `max_features`, `min_samples_split`, and `min_samples_leaf`.



In [62]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor

# Define the parameter grid for CSP tuning
param_grid_csp = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 10, 20],
    'max_features': ['sqrt', 'log2'],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2]
}

# Initialize the Random Forest model
rf_csp_model = RandomForestRegressor(random_state=42)

# Setup GridSearchCV
grid_search_csp = GridSearchCV(
    estimator=rf_csp_model,
    param_grid=param_grid_csp,
    cv=5,
    scoring='neg_mean_squared_error',
    n_jobs=-1,
    verbose=1
)

# Fit GridSearchCV on the scaled training data for CSP
grid_search_csp.fit(X_train_csp_scaled, y_train_csp)

# Print the best parameters
print("Best Parameters for CSP:", grid_search_csp.best_params_)

# Evaluate the tuned model on the test set
best_rf_csp = grid_search_csp.best_estimator_
y_pred_tuned_csp = best_rf_csp.predict(X_test_csp_scaled)

from sklearn.metrics import mean_squared_error, r2_score
mse_tuned_csp = mean_squared_error(y_test_csp, y_pred_tuned_csp)
r2_tuned_csp = r2_score(y_test_csp, y_pred_tuned_csp)

print(f"Tuned RF for CSP - MSE: {mse_tuned_csp:.4f}, R2: {r2_tuned_csp:.4f}")


Fitting 5 folds for each of 72 candidates, totalling 360 fits
Best Parameters for CSP: {'max_depth': None, 'max_features': 'sqrt', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 300}
Tuned RF for CSP - MSE: 0.8408, R2: 0.8305


In [64]:
import joblib

# Save the best Random Forest model for CWP
joblib.dump(best_rf_cwp, "best_rf_cwp_model.pkl")
print("Best RF model for CWP saved as 'best_rf_cwp_model.pkl'")

# Save the best Random Forest model for CSP
joblib.dump(best_rf_csp, "best_rf_csp_model.pkl")
print("Best RF model for CSP saved as 'best_rf_csp_model.pkl'")


Best RF model for CWP saved as 'best_rf_cwp_model.pkl'
Best RF model for CSP saved as 'best_rf_csp_model.pkl'


 Load the models
loaded_rf_cwp = joblib.load("best_rf_cwp_model.pkl")
loaded_rf_csp = joblib.load("best_rf_csp_model.pkl")

 Example: Predict on new data (replace X_new with your data)
 y_pred_cwp = loaded_rf_cwp.predict(X_new_scaled_for_cwp)
 y_pred_csp = loaded_rf_csp.predict(X_new_scaled_for_csp)
