# <b><p style="background-color: #ff6200; font-family:calibri; color:white; font-size:100%; font-family:Verdana; text-align:center; border-radius:15px 50px;">Task 24-> Evaluation Techniques for Regression Models</p>

Evaluation techniques are crucial in assessing the performance of regression models. They help determine how well the model predicts the dependent variable based on the independent variables. Proper evaluation ensures that the model is reliable and can generalize well to new, unseen data. Here, we discuss the key evaluation metrics used in regression analysis and their importance.

### Techniques Implemented
1. [Mean Absolute Error (MAE)](#1)
2. [Mean Squared Error (MSE)](#2)
3. [Root Mean Squared Error (RMSE)](#3)
4. [R-squared (R²)](#4)
5. [Adjusted R-squared](#5)
6. [Mean Absolute Percentage Error (MAPE)](#6)
7. [Median Absolute Error](#7)

## Generating Data

In [21]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

X, y = make_regression(n_samples=100, n_features=2, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [22]:
X.shape, y.shape

((100, 2), (100,))

## Training and Testing model 

In [23]:
from sklearn.linear_model import LinearRegression

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

y_pred = lr_model.predict(X_test)
y_pred

array([ -58.20320564,   -0.99412493,   84.42053424,  -19.80868937,
         67.78544042,  115.08677163,  195.3800306 , -126.83541741,
       -185.65676039,  -55.87134508,   80.26682995, -139.17457332,
        116.77784102,  -42.44949517,  112.75540523,   77.47491592,
        -27.1998263 ,    2.03152121,  -74.29575855,   43.73414793])

<a id=1></a>
## <span style='color:#fcc36d'> Mean Absolute Error (MAE)</span>

MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.

\begin{align}
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|
\end{align}

### Importance:
- MAE is easy to understand and interpret.
- It provides a linear score which means all individual differences are weighted equally.

In [24]:
from sklearn.metrics import mean_absolute_error

def Mean_Absolute_Error(y_true, y_pred):
    return np.mean(np.abs(y_true - y_pred))

sklearn_mae = mean_absolute_error(y_test, y_pred)

print(f"Implemented MAE: {Mean_Absolute_Error(y_test, y_pred)}")
print(f"scikit-learn MAE: {sklearn_mae}")

Implemented MAE: 0.0966780107065582
scikit-learn MAE: 0.0966780107065582


<a id=2></a>
## <span style='color:#fcc36d'> Mean Squared Error (MSE)</span>

MSE is the average squared difference between the predicted and actual values. It measures the average squared difference between the predicted and actual values, taking into account the square of the differences. It penalizes larger errors more heavily compared to the Mean Absolute Error (MAE).

\begin{align}
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
\end{align}

### Importance:
- MSE penalizes larger errors more than smaller errors, making it useful when large errors are particularly undesirable.
- It is a key metric used in training many regression models by minimizing the MSE.

In [25]:
from sklearn.metrics import mean_squared_error

def Mean_Squared_Error(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

sklearn_mse = mean_squared_error(y_test, y_pred)

print(f"Implemented MSE: {Mean_Squared_Error(y_test, y_pred)}")
print(f"scikit-learn MSE: {sklearn_mse}")

Implemented MSE: 0.015462772689491205
scikit-learn MSE: 0.015462772689491205


<a id=3></a>
## <span style='color:#fcc36d'> Root Mean Squared Error (RMSE)</span>

RMSE is the square root of the Mean Squared Error. It measures the average difference between the predicted and actual values, taking into account the square root of the differences. It is less sensitive to outliers compared to the Mean Absolute Error (MAE) and the Mean Squared Error (MSE).

\begin{align}
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
\end{align}

### Importance:
- RMSE is widely used because it provides a good indication of the fit of the model and is easy to interpret.
- It is sensitive to outliers, making it useful in situations where large errors are particularly problematic.

In [26]:
def manual_rmse(y_true, y_pred):
    return np.sqrt(np.mean((y_true - y_pred) ** 2))

print(f"Manual RMSE: {manual_rmse(y_test, y_pred)}")

Manual RMSE: 0.12434939762415902


<a id=4></a>
## <span style='color:#fcc36d'> R-squared (R²)</span>

R-squared is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. It ranges from 0 to 1, where 1 indicates that the dependent variable is perfectly explained by the independent variable(s), and 0 indicates that the independent variable(s) do not contribute to explaining the dependent variable at all.

\begin{align}
R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}i)^2}{\sum{i=1}^{n} (y_i - \bar{y})^2}
\end{align}

### Importance:
- R-squared provides a measure of how well the model's predictions fit the actual data.
- It helps in comparing the goodness of fit of different models.

In [27]:
from sklearn.metrics import r2_score

def manual_r2(y_true, y_pred):
    ss_res = np.sum((y_true - y_pred) ** 2)
    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
    return 1 - (ss_res / ss_tot)

sklearn_r2 = r2_score(y_test, y_pred)

print(f"Implemented R²: {manual_r2(y_test, y_pred)}")
print(f"scikit-learn R²: {sklearn_r2}")

Implemented R²: 0.9999983497435199
scikit-learn R²: 0.9999983497435199


<a id=5></a>
## <span style='color:#fcc36d'> Adjusted R-squared</span>

Adjusted R-squared is a statistical measure that adjusts the R-squared value for the number of independent variables in a regression model to account for the effects of multiple factors. It is calculated as follows:


\begin{align}
\text{Adjusted } R^2 = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1}
\end{align}

Where:
- R^2 is the R-squared value
- n is the number of data points
- k is the number of independent variables
- Adjusted R^2 is the adjusted R-squared value
- 
### Importance:
- Adjusted R-squared provides a more accurate measure of the goodness of fit, especially in models with multiple predictors.
- It helps in model selection by providing a balance between model complexity and goodness of fit.

In [28]:
def manual_adjusted_r2(y_true, y_pred, n, p):
    r2 = manual_r2(y_true, y_pred)
    return 1 - (1 - r2) * ((n - 1) / (n - p - 1))

n = len(y_test)
p = X_test.shape[1]

print(f"Implemented Adjusted R²: {manual_adjusted_r2(y_test, y_pred, n, p)}")

Implemented Adjusted R²: 0.9999981555956987


<a id=6></a>
## <span style='color:#fcc36d'> Mean Absolute Percentage Error (MAPE)</span>

MAPE measures the accuracy of a forecast system. It is the average of the absolute percentage errors of actual values compared to predicted values.

\begin{align}
\text{MAPE} = \frac{100%}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|
\end{align}

### Importance:

- MAPE is easy to interpret and understand, representing the percentage error.
- It is scale-independent and useful for comparing forecast accuracy across different datasets.

In [29]:
def manual_mape(y_true, y_pred):
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

print(f"Implemented MAPE: {manual_mape(y_test, y_pred)}")

Implemented MAPE: 1.299922552625877


<a id=7></a>
## <span style='color:#fcc36d'> Median Absolute Error</span>

Median Absolute Error is a robust metric that measures the median of the absolute differences between the predicted and actual values. Unlike MAE, it is less sensitive to outliers.

\begin{align}
\text{Median Absolute Error} = \text{median}(|y_i - \hat{y}_i|)
\end{align}

### Importance:

- Median Absolute Error is robust to outliers, providing a more accurate measure of central tendency when outliers are present.
- It is useful when the data contains anomalies or non-normal distributions.


In [30]:
from sklearn.metrics import median_absolute_error

def manual_median_absolute_error(y_true, y_pred):
    return np.median(np.abs(y_true - y_pred))

sklearn_median_absolute_error = median_absolute_error(y_test, y_pred)

print(f"Implemented Median Absolute Error: {manual_median_absolute_error(y_test, y_pred)}")
print(f"scikit-learn Median Absolute Error: {sklearn_median_absolute_error}")

Implemented Median Absolute Error: 0.06988051213575375
scikit-learn Median Absolute Error: 0.06988051213575375
