<h1><u>Capstone 2 - Coffee Shop - Modeling</u>

[Rubric](https://docs.google.com/document/d/1rbG66SRqRj73Y-KtI_0qlkMX2CSGrvWddvi1V-WYXcY/edit)

In previous notebooks I have already defined my problem, cleaned the data set, created dummy variables for categorical data, standardized the originally numeric data, and created the train/test split. The data set is from Kaggle and can be found [here](https://www.kaggle.com/datasets/patkle/coffeereviewcom-over-7000-ratings-and-reviews). The previously completed data cleaning notebook can be found [here](https://github.com/lindseyc735/Springboard/blob/main/Capstone%202/Capstone_2_data_wrangling.ipynb). Please see the below review of the project prior to considering the modeling.

<u>**Problem Statement:**</u>
<br>What features most affect the coffee rating?

<u>**Context:**</u>
<br>A start-up coffee company is creating their signature blend to sell alongside the more generic blends of coffee. The start-up needs to know what three features to primarily incorporate into their signature blend to maximize its popularity and distinguish their company from other coffee companies.

<u>**Criteria for Success:**</u>
<br>Determine the three coffee features that will create a popular, signature blend of coffee.

<u>**Scope of Solution Space:**</u>
<br>Rating
<br>Acidity
<br>Aftertaste
<br>Aroma
<br>Body
<br>Flavor
<br>Review description
<br>Country of origin
<br>Roast level
<br>Roaster
<br>Roaster location

In [11]:
import warnings
warnings.filterwarnings('ignore') # Removes deprecation warnings
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns # For all our visualization needs.
from pandas_profiling import ProfileReport # Creates data description, visuals, and missing value statistics for the data frame
from IPython.display import display
import os
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

#Import Modeling Tools
from sklearn.metrics import make_scorer
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from sklearn.svm import SVR
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV

#Import Metric Tools for Evaluating Models
from sklearn.metrics import mean_squared_error #MSE
from sklearn.metrics import mean_absolute_error #MAE
from sklearn.metrics import make_scorer

# Import the data and run a ProfileReport to find statistical descriptions, visuals, and missing value information
df = pd.read_csv('reordered_preprocessed_coffee4.csv')
df.head()

Unnamed: 0,aftertaste,aroma,body,flavor,coffee_origin_20% Kona; other blend components not disclosed,coffee_origin_40% Colombia; 40% Brazil; 20% Rwanda,"coffee_origin_50% Colombia, 35% Ethiopia, 15% Sumatra",coffee_origin_50% Colombia; 50% Ethiopia,coffee_origin_50% Yirgacheffe Ethiopia; 25% Papua New Guinea; 25% Brazil,coffee_origin_A blend of coffees from southern India,...,"roaster_location_Youngstown, Ohio","roaster_location_Yuanlin, Taiwan","roaster_location_Yun-Lin County, Taiwan","roaster_location_Zhongli, Taiwan","roaster_location_Zhubei City, Taiwan","roaster_location_Zhubei, Taiwan","roaster_location_Zhuwei, Taiwan",roaster_location_Zimbabwe,"roaster_location_Zurich, Switzerland",rating
0,0.040738,0.700223,-0.111574,0.554627,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0.517301
1,0.040738,0.700223,-0.111574,0.554627,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0.27465
2,0.040738,0.700223,1.057494,0.554627,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0.759951
3,0.040738,0.700223,1.057494,0.554627,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0.759951
4,0.040738,0.700223,-0.111574,0.554627,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0.517301


In [2]:
# Import the train/test split data
X = df.iloc[:, :-1]  # Features (all columns except the last one)
y = df.iloc[:, -1]   # Target (last column)
X_train = pd.read_csv('X_train.csv')
X_test = pd.read_csv('X_test.csv')
y_train = pd.read_csv('y_train.csv')
y_test = pd.read_csv('y_test.csv')

In [3]:
df.shape

(7037, 4174)

In [4]:
X_train.shape, y_train.shape

((5629, 4173), (5629, 1))

In [5]:
X_test.shape, y_test.shape

((1408, 4173), (1408, 1))

# <u>Modeling</u>  
Goal: Build 3 to 5 different models and identify the best one.  

# Fit models with a training dataset  

# Model 1: Linear Regression Model 

In [6]:
# Create model specific variables for the train/test set components
XLR = X
yLR = y
X_trainLR = X_train
X_testLR = X_test
y_trainLR = y_train
y_testLR = y_test

In [7]:
# Instantiate the model, fit the model on the data, and make predictions
lr = LinearRegression()
lr.fit(X_trainLR, y_trainLR)
y_predLR = lr.predict(X_testLR)

In [8]:
# Evaluate the model using MAE, MSE, and RMSE
maeLR = mean_absolute_error(y_testLR, y_predLR)
mseLR = mean_squared_error(y_testLR, y_predLR)
rmseLR = np.sqrt(mseLR)

# Print the results
print(f"Mean Absolute Error for the Linear Regression Model: {maeLR}")
print(f"Mean Squared Error for the Linear Regression Model: {mseLR}")
print(f"RMSE for the Linear Regression Model: ", rmseLR)

Mean Absolute Error for the Linear Regression Model: 446544337.98906857
Mean Squared Error for the Linear Regression Model: 9.467784905330545e+19
RMSE for the Linear Regression Model:  9730254315.962427


# Linear Regression with GridSearchCV() #

In [16]:
# Create model specific variables for the train/test set components
XLR_gridsearchcv = X
yLR_gridsearchcv = y
X_trainLR_gridsearchcv = X_train
X_testLR_gridsearchcv = X_test
y_trainLR_gridsearchcv = y_train
y_testLR_gridsearchcv = y_test

# Define the parameter grid for linear regression
parametersLR_gridsearchcv = {'normalize': [True, False]}

# Define custom scoring functions for MAE, MSE, and RMSE
scoringLR_gridsearchcv = {
    'neg_mean_absolute_error': make_scorer(mean_absolute_error, greater_is_better=False),
    'neg_mean_squared_error': make_scorer(mean_squared_error, greater_is_better=False),
    'neg_root_mean_squared_error': make_scorer(lambda y_true, y_pred: np.sqrt(mean_squared_error(y_true, y_pred)), greater_is_better=False)
}
    
# Still use 'lr' for the regression model
    
# Create GridSearchCV instance with multiple scoring metrics
grid_searchLR = GridSearchCV(lr, parametersLR_gridsearchcv, scoring=scoringLR_gridsearchcv, refit = 'neg_mean_squared_error', cv=5)

# Fit the model
grid_searchLR.fit(X_trainLR_gridsearchcv, y_trainLR_gridsearchcv)

GridSearchCV(cv=5, estimator=LinearRegression(),
             param_grid={'normalize': [True, False]},
             refit='neg_mean_squared_error',
             scoring={'neg_mean_absolute_error': make_scorer(mean_absolute_error, greater_is_better=False),
                      'neg_mean_squared_error': make_scorer(mean_squared_error, greater_is_better=False),
                      'neg_root_mean_squared_error': make_scorer(<lambda>, greater_is_better=False)})

In [18]:
# Get the best parameters and best estimator
best_paramsLR = grid_searchLR.best_params_
best_estimatorLR = grid_searchLR.best_estimator_
print("Best Parameters for Linear Regression with GridSearchCV:", best_paramsLR)
print("Best Estimator for Linear Regression with GridSearchCV:", best_estimatorLR)

# Make a prediction on the best estimator and evaluate the model using all three metrics
y_predLR_gridsearchcv = best_estimatorLR.predict(X_testLR_gridsearchcv)
maeLR_gridsearchcv = mean_absolute_error(y_testLR_gridsearchcv, y_predLR_gridsearchcv)
mseLR_gridsearchcv = mean_squared_error(y_testLR_gridsearchcv, y_predLR_gridsearchcv)
rmseLR_gridsearchcv = np.sqrt(mseLR_gridsearchcv)

# Print the MAE, MSE, and RMSE
print(f"Mean Absolute Error for the LRG: {maeLR_gridsearchcv}")
print(f"Mean Squared Error for the LRG: {mseLR_gridsearchcv}")
print(f"RMSE for the LRG: ", rmseLR_gridsearchcv)

Best Parameters for Linear Regression with GridSearchCV: {'normalize': False}
Best Estimator for Linear Regression with GridSearchCV: LinearRegression(normalize=False)
Mean Absolute Error for the LRG: 446544337.98906857
Mean Squared Error for the LRG: 9.467784905330545e+19
RMSE for the LRG:  9730254315.962427


# Model 2: Random Forest Regressor Model

In [20]:
# Create model specific variables for the train/test set components
XRF = X
yRF = y
X_trainRF = X_train
X_testRF = X_test
y_trainRF = y_train
y_testRF = y_test

In [21]:
from sklearn.ensemble import RandomForestRegressor

# Initialize the Random Forest model
rf = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model on the training data
rf.fit(X_trainRF, y_trainRF.values.ravel())  # Using .values.ravel() to convert y_train DataFrame to a 1D array

# Predict on the test data
y_predRF = rf.predict(X_testRF)

In [22]:
# Calculate MAE
maeRF= mean_absolute_error(y_testRF, y_predRF)

# Print the MAE
print(f"Mean Absolute Error for the Random Forest Model: {maeRF}")

# Calculate Mean Squared Error (MSE) for evaluation
mseRF = mean_squared_error(y_testRF, y_predRF)
print(f"Mean Squared Error for the Random Forest Regressor Model: {mseRF}")

# Calculate the RMSE
rmseRF = np.sqrt(mseRF)
print(f"RMSE for the Random Forest Regressor Model: ", rmseRF)

Mean Absolute Error for the Random Forest Model: 0.1794049411585514
Mean Squared Error for the Random Forest Regressor Model: 0.0832775065420507
RMSE for the Random Forest Regressor Model:  0.28857842355597324


# Random Forest Regressor with GridSearchCV #

In [23]:
# Define the parameter grid to search
paramsRF = {
    'n_estimators': [50, 100, 150],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}

# Initialize the Random Forest model
rf = RandomForestRegressor(random_state=42)

# Initialize GridSearchCV
grid_searchRF = GridSearchCV(rf, paramsRF, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)

# Fit the grid search to your data
grid_searchRF.fit(X_trainRF, y_trainRF.values.ravel())

# Get the best parameters and best estimator
best_paramsRF = grid_searchRF.best_params_
best_estimatorRF = grid_searchRF.best_estimator_

# Make predictions using the best estimator
y_predRF_best = best_estimatorRF.predict(X_testRF)

In [25]:
# Calculate Mean Absolute Error for the best model
mae_bestRF = mean_absolute_error(y_testRF, y_predRF_best)

# Print the best parameters and best estimator
print("Best Parameters for Random Forest with GridSearchCV:", best_paramsRF)
print("Best Estimator for Random Forest with GridSearchCV:", best_estimatorRF)

# Evaluate the model using all three metrics
maeRF_gridsearchcv = mean_absolute_error(y_testRF, y_predRF_best)
mseRF_gridsearchcv = mean_squared_error(y_testRF, y_predRF_best)
rmseRF_gridsearchcv = np.sqrt(mseRF_gridsearchcv)

# Print the MAE, MSE, and RMSE
print(f"Mean Absolute Error for Random Forest with GridSearchCV: {maeRF_gridsearchcv}")
print(f"Mean Squared Error for Random Forest with GridSearchCV: {mseRF_gridsearchcv}")
print(f"RMSE for Random Forest with GridSearchCV: ", rmseRF_gridsearchcv)

Best Parameters for Random Forest with GridSearchCV: {'max_depth': 20, 'min_samples_split': 2, 'n_estimators': 150}
Best Estimator for Random Forest with GridSearchCV: RandomForestRegressor(max_depth=20, n_estimators=150, random_state=42)
Mean Absolute Error for Random Forest with GridSearchCV: 0.18457171428242514
Mean Squared Error for Random Forest with GridSearchCV: 0.08239772155392212
RMSE for Random Forest with GridSearchCV:  0.28705003318920225


# Model 3: Gradient Boosting Regressor Model

In [26]:
# Create model specific variables for the train/test set components
XGB = X
yGB = y
X_trainGB = X_train
X_testGB = X_test
y_trainGB = y_train
y_testGB = y_test

# Install XGBoost
#! pip install xgboost

In [27]:
import xgboost as xgb

# Initialize the XGBoost regressor
gb = xgb.XGBRegressor(objective='reg:squarederror', n_estimators=10, seed=42)

# Fit the model on the training data
gb.fit(X_trainGB, y_trainGB)

# Make predictions on the test set
y_predGB = gb.predict(X_testGB)

In [28]:
# Calculate MAE
maeGB = mean_absolute_error(y_testGB, y_predGB)

# Print the MAE
print(f"Mean Absolute Error for the Extreme Gradient Boosting Model: {maeGB}")

# Calculate Mean Squared Error for evaluation
mseGB = mean_squared_error(y_testGB, y_predGB)
print(f"Mean Squared Error for the Extreme Gradient Boosting Model: {mseGB}")

# Calculate the RMSE
rmseGB = np.sqrt(mseGB)
print(f"RMSE for the Extreme Gradient Boosting Model: ", rmseGB)

Mean Absolute Error for the Extreme Gradient Boosting Model: 0.19203128125179927
Mean Squared Error for the Extreme Gradient Boosting Model: 0.08294503229995759
RMSE for the Extreme Gradient Boosting Model:  0.2880017921818501


# Gradient Boosting with GridSearchCV #Z

In [30]:
# Define the parameter grid to search
param_gridGB = {
    'n_estimators': [50, 100, 150],
    'max_depth': [3, 6, 9],
    'learning_rate': [0.01, 0.1, 0.2]
}

# Initialize the XGBoost regressor
gb = xgb.XGBRegressor(objective='reg:squarederror', seed=42)

# Initialize GridSearchCV with scoring='neg_mean_squared_error' to minimize MSE
grid_searchGB = GridSearchCV(gb, param_gridGB, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)

# Fit the grid search to your data
grid_searchGB.fit(X_trainGB, y_trainGB)

# Get the best parameters and best estimator
best_paramsGB = grid_searchGB.best_params_
best_estimatorGB = grid_searchGB.best_estimator_

# Make predictions using the best estimator
y_predGB_best = best_estimatorGB.predict(X_testGB)

In [31]:
# Print the best parameters and best estimator
print("Best Parameters for Extreme Gradient Boosting with GridSearchCV:", best_paramsGB)
print("Best Estimator for Extreme Gradient Boosting with GridSearchCV:", best_estimatorGB)

# Evaluate the model using all three metrics
maeGB_gridsearchcv = mean_absolute_error(y_testGB, y_predGB_best)
mseGB_gridsearchcv = mean_squared_error(y_testGB, y_predGB_best)
rmseGB_gridsearchcv = np.sqrt(mseGB_gridsearchcv)

# Print the MAE, MSE, and RMSE
print(f"Mean Absolute Error for Extreme Gradient Boosting with GridSearchCV: {maeGB_gridsearchcv}")
print(f"Mean Squared Error for Extreme Gradient Boosting with GridSearchCV: {mseGB_gridsearchcv}")
print(f"RMSE for Extreme Gradient Boosting with GridSearchCV: ", rmseGB_gridsearchcv)

Best Parameters for Extreme Gradient Boosting with GridSearchCV: {'learning_rate': 0.1, 'max_depth': 9, 'n_estimators': 50}
Best Estimator for Extreme Gradient Boosting with GridSearchCV: XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=None, early_stopping_rounds=None,
             enable_categorical=False, eval_metric=None, feature_types=None,
             gamma=None, gpu_id=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=0.1, max_bin=None,
             max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=9, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             n_estimators=50, n_jobs=None, num_parallel_tree=None,
             predictor=None, random_state=None, ...)
Mean Absolute Error for Extreme Gradient Boosting with GridSearchCV: 0.18457

# Model 4: Support Vector Regression (SVR) Model

In [32]:
# Create model specific variables for the train/test set components
XSVR = X
ySVR = y
X_trainSVR = X_train
X_testSVR = X_test
y_trainSVR = y_train
y_testSVR = y_test

In [33]:
from sklearn.svm import SVR

# Initialize the SVR model
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)

# Fit the model on the training data
svr.fit(X_trainSVR, y_trainSVR)

# Make predictions on the test set
y_predSVR = svr.predict(X_testSVR)

In [34]:
# Calculate MAE
maeSVR= mean_absolute_error(y_testSVR, y_predSVR)

# Print the MAE
print(f"Mean Absolute Error for the Support Vector Regression Model: {maeSVR}")

# Calculate Mean Squared Error for evaluation
mseSVR = mean_squared_error(y_testSVR, y_predSVR)
print(f"Mean Squared Error for the Support Vector Regression Model: {mseSVR}")

# Calculate the RMSE
rmseSVR = np.sqrt(mseSVR)
print(f"RMSE for the Support Vecgtor Regression Model: ", rmseSVR)

Mean Absolute Error for the Support Vector Regression Model: 0.18286007431960155
Mean Squared Error for the Support Vector Regression Model: 0.08128681850700242
RMSE for the Support Vecgtor Regression Model:  0.28510843289352633


# Support Vector Regression (SVR) with GridSearchCV #

In [36]:
# Define the parameter grid to search
param_gridSVR = {
    'C': [0.1, 1, 10],
    'epsilon': [0.01, 0.1, 0.2],
    'kernel': ['linear', 'poly', 'rbf']
}

# Initialize the SVR model
svr = SVR()

# Initialize GridSearchCV
grid_searchSVR = GridSearchCV(svr, param_gridSVR, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)

# Fit the grid search to your data
grid_searchSVR.fit(X_trainSVR, y_trainSVR.values.ravel())

# Get the best parameters and best estimator
best_paramsSVR = grid_searchSVR.best_params_
best_estimatorSVR = grid_searchSVR.best_estimator_

# Make predictions using the best estimator
y_predSVR_best = best_estimatorSVR.predict(X_testSVR)

In [37]:
# Print the best parameters and best estimator
print("Best Parameters for Support Vector Regression with GridSearchCV:", best_paramsSVR)
print("Best Estimator for Support Vector Regression with GridSearchCV:", best_estimatorSVR)

# Evaluate the model using all three metrics
maeSVR_gridsearchcv = mean_absolute_error(y_testSVR, y_predSVR_best)
mseSVR_gridsearchcv = mean_squared_error(y_testSVR, y_predSVR_best)
rmseSVR_gridsearchcv = np.sqrt(mseSVR_gridsearchcv)

# Print the MAE, MSE, and RMSE
print(f"Mean Absolute Error for Support Vector Regression with GridSearchCV: {maeSVR_gridsearchcv}")
print(f"Mean Squared Error for Support Vector Regression with GridSearchCV: {mseSVR_gridsearchcv}")
print(f"RMSE for Support Vector Regression with GridSearchCV: ", rmseSVR_gridsearchcv)

Best Parameters for Support Vector Regression with GridSearchCV: {'C': 0.1, 'epsilon': 0.2, 'kernel': 'linear'}
Best Estimator for Support Vector Regression with GridSearchCV: SVR(C=0.1, epsilon=0.2, kernel='linear')
Mean Absolute Error for Support Vector Regression with GridSearchCV: 0.19640981425061413
Mean Squared Error for Support Vector Regression with GridSearchCV: 0.08474750686718774
RMSE for Support Vector Regression with GridSearchCV:  0.2911142505395223


# Model 5: Elastic Net Model

In [38]:
# Create model specific variables for the train/test set components
XEN = X
yEN = y
X_trainEN = X_train
X_testEN = X_test
y_trainEN = y_train
y_testEN = y_test

In [39]:
from sklearn.linear_model import ElasticNet

# Initialize the Elastic Net model
en = ElasticNet(alpha=1.0, l1_ratio=0.5)  

# Fit the model on the training data
en.fit(X_trainEN, y_trainEN)

# Make predictions on the test set
y_predEN = en.predict(X_testEN)

In [40]:
# Calculate MAE
maeEN = mean_absolute_error(y_testEN, y_predEN)

# Print the MAE
print(f"Mean Absolute Error for the Elastic Net Model: {maeEN}")

# Calculate Mean Squared Error for evaluation
mseEN = mean_squared_error(y_testEN, y_predEN)
print(f"Mean Squared Error for the Elastic Net Model: {mseEN}")

# Calculate the RMSE
rmseEN = np.sqrt(mseEN)
print(f"RMSE for the Elastic Net Model: ", rmseEN)

Mean Absolute Error for the Elastic Net Model: 0.4807004765238523
Mean Squared Error for the Elastic Net Model: 0.4763833374938104
RMSE for the Elastic Net Model:  0.6902052864864268


# Elastic Net with GridSearchCV #

In [42]:
# Define the parameter grid to search
param_gridEN = {
    'alpha': [0.01, 0.1, 1.0],
    'l1_ratio': [0.1, 0.5, 0.9]
}

# Initialize the Elastic Net model
en = ElasticNet()

# Initialize GridSearchCV
grid_searchEN = GridSearchCV(en, param_gridEN, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)

# Fit the grid search to your data
grid_searchEN.fit(X_trainEN, y_trainEN)

# Get the best parameters and best estimator
best_paramsEN = grid_searchEN.best_params_
best_estimatorEN = grid_searchEN.best_estimator_

# Make predictions using the best estimator
y_predEN_best = best_estimatorEN.predict(X_testEN)

In [43]:
# Print the best parameters and best estimator
print("Best Parameters for Elastic Net with GridSearchCV:", best_paramsEN)
print("Best Estimator for Elastic Net with GridSearchCV:", best_estimatorEN)

# Evaluate the model using all three metrics
maeEN_gridsearchcv = mean_absolute_error(y_testEN, y_predEN_best)
mseEN_gridsearchcv = mean_squared_error(y_testEN, y_predEN_best)
rmseEN_gridsearchcv = np.sqrt(mseEN_gridsearchcv)

# Print the MAE, MSE, and RMSE
print(f"Mean Absolute Error for Support Vector Regression with GridSearchCV: {maeEN_gridsearchcv}")
print(f"Mean Squared Error for Support Vector Regression with GridSearchCV: {mseEN_gridsearchcv}")
print(f"RMSE for Support Vector Regression with GridSearchCV: ", rmseEN_gridsearchcv)

Best Parameters for Elastic Net with GridSearchCV: {'alpha': 0.01, 'l1_ratio': 0.1}
Best Estimator for Elastic Net with GridSearchCV: ElasticNet(alpha=0.01, l1_ratio=0.1)
Mean Absolute Error for Support Vector Regression with GridSearchCV: 0.2003814539543021
Mean Squared Error for Support Vector Regression with GridSearchCV: 0.09039427542149797
RMSE for Support Vector Regression with GridSearchCV:  0.30065640758430207


# Review model outcomes — Iterate over additional models as needed  

In [44]:
# Create a dictionary containing your data
table = {
    'Model': ['Linear Regression', 'Random Forest', 'XGBoosting', 'SVR', 'Elastic Net'],
    'MAE': [maeLR, maeRF, maeGB, maeSVR, maeEN], 
    'MSE': [mseLR, mseRF, mseGB, mseSVR, mseEN],
    'RMSE': [rmseLR, rmseRF, rmseGB, rmseSVR, rmseEN]
}

# Create a DataFrame from the dictionary
data_table = pd.DataFrame(table)

# Display the data table
print(data_table)

               Model           MAE           MSE          RMSE
0  Linear Regression  4.465443e+08  9.467785e+19  9.730254e+09
1      Random Forest  1.794049e-01  8.327751e-02  2.885784e-01
2         XGBoosting  1.920313e-01  8.294503e-02  2.880018e-01
3                SVR  1.828601e-01  8.128682e-02  2.851084e-01
4        Elastic Net  4.807005e-01  4.763833e-01  6.902053e-01


In [45]:
# Create a dictionary containing your data
table_gridsearchcv = {
    'Model with GridSearchCV': ['Linear Regression', 'Random Forest', 'XGBoosting', 'SVR', 'Elastic Net'],
    'MAE': [maeLR_gridsearchcv, maeRF_gridsearchcv, maeGB_gridsearchcv, maeSVR_gridsearchcv, maeEN_gridsearchcv], 
    'MSE': [mseLR_gridsearchcv, mseRF_gridsearchcv, mseGB_gridsearchcv, mseSVR_gridsearchcv, mseEN_gridsearchcv],
    'RMSE': [rmseLR_gridsearchcv, rmseRF_gridsearchcv, rmseGB_gridsearchcv, rmseSVR_gridsearchcv, rmseEN_gridsearchcv]
}

# Create a DataFrame from the dictionary
data_table_gridsearchcv = pd.DataFrame(table_gridsearchcv)

# Display the data table
print(data_table_gridsearchcv)

  Model with GridSearchCV           MAE           MSE          RMSE
0       Linear Regression  4.465443e+08  9.467785e+19  9.730254e+09
1           Random Forest  1.845717e-01  8.239772e-02  2.870500e-01
2              XGBoosting  1.845799e-01  8.032733e-02  2.834208e-01
3                     SVR  1.964098e-01  8.474751e-02  2.911143e-01
4             Elastic Net  2.003815e-01  9.039428e-02  3.006564e-01


# Identify the final model that you think is the best model for this project  
Hint: the most powerful model isn’t always the best one to use. Other considerations
include computational complexity, scalability, and maintenance costs. 

In modeling both with and without GridSearchCV, Extreme Gradient Boosting displays the lowest MSE and RMSE, and second lowest MAE. I will select the Extreme Gradient Boosting as the best model.