# Regression Model Selection
- We have 5 different regression model, but which is best and more effective model/algorithm for the dataset (Combined_Cycle_Power_Plant.csv) 😕?
- No worries, we answer that question in this **Regression Model Selection** notebook.

## R square (R^2)
- R square is the factor which evaluate the result of the model and give the score out of 1.

<img src="../images/r_squared_eqn.png" alt="r_squared_eqn.png">

## Data preprocessing

✔️ Import the necessary libraries.

✔️ Load dataset (Combined_Cycle_Power_Plant.csv).

❌ Our dataset doesn't have any missing data.

❌ We have categorical string data.

✔️ We have 9569 data. So, we can split this dataset into testing and training datasets to evaluate the result.

⚠️ Please apply feature scaling only if required by the regression model.

In [1]:
# Import libraries....
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
# Load dataset....
dataset = pd.read_csv(r"../dataset/Combined_Cycle_Power_Plant.csv")
X = dataset.iloc[:, :-1].values # [row, column]
Y = dataset.iloc[:, -1].values

In [3]:
# Split training and testing dataset....
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)

## Train and evaluate the performance of multiple linear regression

In [4]:
# Train multiple linear regression model...
from sklearn.linear_model import LinearRegression
multiple_linear_regressor = LinearRegression()
multiple_linear_regressor.fit(x_train, y_train)

# Test multiple linear regression model...
y_pred = multiple_linear_regressor.predict(x_test)

# Check the result...
np.set_printoptions(precision=2)
print("Comparison of y_test & y_pred", np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1), sep='\n')

# Find preformance by R^2 score...
from sklearn.metrics import r2_score
multiple_linear_regression_r2_score = r2_score(y_test, y_pred)
print("R^2 score", multiple_linear_regression_r2_score)

Comparison of y_test & y_pred
[[442.69 436.54]
 [483.76 485.43]
 [481.63 486.91]
 ...
 [450.02 447.46]
 [445.43 440.42]
 [444.72 442.02]]
R^2 score 0.9289049198241702


## Train and evaluate the performance of polynomial linear regression

In [5]:
# Converting normal feature x to x^n ...
from sklearn.preprocessing import PolynomialFeatures
x_ploy_convertor = PolynomialFeatures(degree=4)

# Train polynomial linear regression model...
from sklearn.linear_model import LinearRegression
polynomial_linear_regressor = LinearRegression()
polynomial_linear_regressor.fit(x_ploy_convertor.fit_transform(x_train), y_train)

# Test polynomial linear regression model...
y_pred = polynomial_linear_regressor.predict(x_ploy_convertor.fit_transform(x_test))

# Check the result...
np.set_printoptions(precision=2)
print("Comparison of y_test & y_pred", np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1), sep='\n')

# Find preformance by R^2 score...
from sklearn.metrics import r2_score
polynomial_linear_regression_r2_score = r2_score(y_test, y_pred)
print("R^2 score", polynomial_linear_regression_r2_score)

Comparison of y_test & y_pred
[[443.16 436.54]
 [485.66 485.43]
 [483.27 486.91]
 ...
 [447.07 447.46]
 [446.17 440.42]
 [445.37 442.02]]
R^2 score 0.9419544431773555
