# Evaluation Techniques for Regression Models

### Importance of Evaluation in Regression

Evaluation is a crucial step in the regression analysis process. It provides insight into how well a regression model performs and helps to understand its effectiveness in making predictions. Here are several key reasons why evaluation is important in regression:

#### 1. **Assess Model Performance**
   - **Accuracy**: Evaluation metrics help quantify how accurately the model predicts the dependent variable.
   - **Quality Check**: By evaluating the model, you can check if it is performing well or if there are significant errors that need addressing.

#### 2. **Understand Model Fit**
   - **Goodness-of-Fit**: Metrics such as R² (Coefficient of Determination) and Explained Variance Score help understand how well the model explains the variability of the target variable.
   - **Residual Analysis**: Evaluation helps analyze residuals (differences between observed and predicted values) to assess if the model assumptions are met.

#### 3. **Compare Models**
   - **Selection of Best Model**: Evaluation metrics allow for the comparison of different regression models to choose the best one based on performance criteria.
   - **Benchmarking**: Helps in benchmarking the model's performance against baseline or existing models.

#### 4. **Identify Overfitting or Underfitting**
   - **Overfitting**: Evaluation metrics can indicate if a model is too complex and fits the training data too closely, potentially harming generalization to new data.
   - **Underfitting**: Metrics help in identifying if a model is too simple and fails to capture the underlying patterns in the data.

#### 5. **Guide Model Improvement**
   - **Error Analysis**: By evaluating the model, you can identify areas where predictions are inaccurate and use this information to refine or improve the model.
   - **Feature Engineering**: Evaluation can provide insights into which features are important and which may need to be adjusted or added.

#### 6. **Communicate Results**
   - **Stakeholder Understanding**: Evaluation metrics provide a clear, quantitative way to communicate the performance of the model to stakeholders, decision-makers, or clients.
   - **Documentation**: Evaluation results are essential for documenting the model development process and ensuring transparency in model performance.

#### 7. **Ensure Robustness**
   - **Cross-Validation**: Evaluation often involves techniques like cross-validation to ensure that the model generalizes well across different subsets of the data.
   - **Performance Stability**: Helps in checking if the model's performance is stable across various data distributions or conditions.

#### 8. **Optimize Model Parameters**
   - **Hyperparameter Tuning**: Evaluation metrics are used in conjunction with techniques like grid search or random search to tune model parameters for optimal performance.
   - **Model Selection**: Helps in choosing the right model complexity and configuration.

#### Common Evaluation Metrics in Regression:

1. **Mean Absolute Error (MAE)**: Measures the average magnitude of errors in predictions, without considering their direction.

2. **Mean Squared Error (MSE)**: Measures the average of the squares of the errors, giving more weight to larger errors.

3. **Root Mean Squared Error (RMSE)**: Provides the square root of MSE, giving errors in the same unit as the target variable.

4. **R² (Coefficient of Determination)**: Indicates the proportion of variance in the dependent variable that is predictable from the independent variables.

5. **Mean Absolute Percentage Error (MAPE)**: Measures the accuracy as a percentage of the error, providing a relative measure of performance.

6. **Explained Variance Score**: Indicates the proportion of variance in the target variable that is explained by the model.

7. **Max Error**: Measures the maximum absolute error between actual and predicted values.

### Importing the dependencies

In [17]:
import numpy as np
import pandas as pd 
from sklearn.metrics import mean_absolute_error, mean_squared_error, root_mean_squared_error, root_mean_squared_log_error, r2_score, max_error, mean_pinball_loss, mean_poisson_deviance, mean_gamma_deviance, mean_tweedie_deviance, mean_absolute_percentage_error, median_absolute_error, d2_tweedie_score, d2_absolute_error_score, d2_pinball_score, explained_variance_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

### Reading the Dataset File
- `This dataset is created my me, while integrating the the flappy bird game with the dnn model, to automate the bird to move by itself`

In [20]:
df = pd.read_csv("Flappy_bird_dataset.csv")
df.head()

Unnamed: 0,bird_y,pipe_x,BIRD_VELOCITY,pipe_gap_y,reward
0,300,597,0.0,145,0.1
1,300,594,0.0,145,0.1
2,300,591,0.0,145,0.1
3,300,588,0.0,145,0.1
4,300,585,0.0,145,0.1


### Linear Regression
- Define the function, it'll train the model and do prediction

In [23]:
def linear_regression(df):
    global y_pred, y_test, x_test
    x = df.drop(['bird_y'], axis = 1)
    y = df['bird_y']
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
    model = LinearRegression()
    df_model = model.fit(x_train, y_train)
    print(df_model)
    y_pred = model.predict(x_test)
    print(y_pred[:10])

linear_regression(df)

LinearRegression()
[363.30171831 278.2740386  545.23516131 233.55703457 206.81331711
 229.74424839 333.48990286 106.07548021 254.77658926 222.39681198]


### Mean Absolute Error(MAE)
- MAE is a very simple metric which calculates the absolute difference between actual and predicted values.

- To better understand, let’s take an example you have input data and output data and use Linear Regression, which draws a best-fit line.

- Now you have to find the MAE of your model which is basically a mistake made by the model known as an error. Now find the difference between the actual value and predicted value that is an absolute error but we have to find the mean absolute of the complete dataset.

- so, sum all the errors and divide them by a total number of observations And this is MAE. And we aim to get a minimum MAE because this is a loss.
![online](https://editor.analyticsvidhya.com/uploads/71890MAE%20Formula.png)
- **Advantages of MAE**
The MAE you get is in the same unit as the output variable.
It is most Robust to outliers.
- **Disadvantages of MAE**
The graph of MAE is not differentiable so we have to apply various optimizers like Gradient descent which can be differentiable.

In [26]:
eva = mean_absolute_error(y_test, y_pred)
eva  #118 is the larger term which shows that our model is not fitted well, it have the larger error

118.08896195883062

**Now to overcome the disadvantage of MAE next metric came as MSE.**

### Mean Squared Error(MSE)
- MSE is a most used and very simple metric with a little bit of change in mean absolute error. Mean squared error states that finding the squared difference between actual and predicted value.

- So, above we are finding the absolute difference and here we are finding the squared difference.

- **What actually the MSE represents?** It represents the squared distance between actual and predicted values. we perform squared to avoid the cancellation of negative terms and it is the benefit of MSE.

![online](https://lh3.googleusercontent.com/-JBio3Q_1FiI/YB2oQKEmRBI/AAAAAAAAAkM/c8KJ3wPwtMEd3Ik0nYMMdmr_pRqMF6MlQCLcBGAsYHQ/w550-h177/image.png)

**Advantages of MSE**

The graph of MSE is differentiable, so you can easily use it as a loss function.

**Disadvantages of MSE**

The value you get after calculating MSE is a squared unit of output. for example, the output variable is in meter(m) then after calculating MSE the output we get is in meter squared.
If you have outliers in the dataset then it penalizes the outliers most and the calculated MSE is bigger. So, in short, It is not Robust to outliers which were an advantage in MAE.

In [29]:
eva = mean_squared_error(y_test, y_pred)
eva #the mse value is oso much, because the output is in different unit which is the squared of the input unit, this shows that we have a greater error

46604.29846025482

### Root Mean Squared Error(RMSE)
- As RMSE is clear by the name itself, that it is a simple square root of mean squared error.

![online](https://editor.analyticsvidhya.com/uploads/34962RMSLE%20Formula.png)

**Advantages of RMSE**
- The output value you get is in the same unit as the required output variable which makes interpretation of loss easy.
**Disadvantages of RMSE**
- It is not that robust to outliers as compared to MAE.
for performing RMSE we have to NumPy NumPy square root function over MSE.

- Most of the time people use RMSE as an evaluation metric and mostly when you are working with deep learning techniques the most preferred metric is RMSE.

In [32]:
eva = root_mean_squared_error(y_test, y_pred)
eva #this is in the same unit as input but still the large value shows that we have a larger error in our model

215.88028733595576

### Root Mean Squared Log Error(RMSLE)
- Taking the log of the RMSE metric slows down the scale of error. The metric is very helpful when you are developing a model without calling the inputs. In that case, the output will vary on a large scale.

- To control this situation of RMSE we take the log of calculated RMSE error and resultant we get as RMSLE.

- To perform RMSLE we have to use the NumPy log function over RMSE.

- It is a very simple metric that is used by most of the datasets hosted for Machine Learning competitions.

In [35]:
eva = root_mean_squared_log_error(y_test, y_pred)
eva #it's showing us the error in the log scale that's all the difference

0.5406595542262108

### R Squared (R2)
- R2 score is a metric that tells the performance of your model, not the loss in an absolute sense that how many wells did your model perform.

- In contrast, MAE and MSE depend on the context as we have seen whereas the R2 score is independent of context.

- So, with help of R squared we have a baseline model to compare a model which none of the other metrics provides. The same we have in classification problems which we call a threshold which is fixed at 0.5. So basically R2 squared calculates how must regression line is better than a mean line.

- Hence, R2 squared is also known as *Coefficient of Determination* or sometimes also known as *Goodness of fit.*

![online](https://editor.analyticsvidhya.com/uploads/22091R2%20Squared%20Formula.png)

**R2 Squared**
- Now, how will you interpret the R2 score? suppose If the R2 score is zero then the above regression line by mean line is equal means 1 so 1-1 is zero. So, in this case, both lines are overlapping means model performance is worst, It is not capable to take advantage of the output column.

- Now the second case is when the R2 score is 1, it means when the division term is zero and it will happen when the regression line does not make any mistake, it is perfect. In the real world, it is not possible.

- So we can conclude that as our regression line moves towards perfection, R2 score move towards one. And the model performance improves.

- The normal case is when the R2 score is between zero and one like 0.8 which means your model is capable to explain 80 per cent of the variance of data.

In [38]:
eva = r2_score(y_test, y_pred)
eva  #r2 score is not the loss of t he function, it's basically the score of the model which tell us how our model is performing, as the score moves to
#it means the model is performing better and vice versa

0.5151614981513495

### Adjusted R Squared
- The disadvantage of the R2 score is while adding new features in data the R2 score starts increasing or remains constant but it never decreases because It assumes that while adding more data variance of data increases.

- But the problem is when we add an irrelevant feature in the dataset then at that time R2 sometimes starts increasing which is incorrect.

- Hence, To control this situation Adjusted R Squared came into existence.


![online](https://lh3.googleusercontent.com/-6T1LxrK1by8/YB6D5hjSCjI/AAAAAAAAAlk/gCmLpEJMJ3MpwO6r-sI7GQzuOQP2I1B3QCLcBGAsYHQ/w332-h179/image.png)


**r2a**
- Now as K increases by adding some features so the denominator will decrease, n-1 will remain constant. R2 score will remain constant or will increase slightly so the complete answer will increase and when we subtract this from one then the resultant score will decrease. so this is the case when we add an irrelevant feature in the dataset.

- And if we add a relevant feature then the R2 score will increase and 1-R2 will decrease heavily and the denominator will also decrease so the complete term decreases, and on subtracting from one the score increases.


Hence, this metric becomes one of the most important metrics to use during the evaluation of the model.

In [41]:
n=40
k=2
adj_r2_score = 1 - ((1-eva)*(n-1)/(n-k-1))
print(adj_r2_score)

#the problem with the r2 square was that when we add new features the r2 says the model is performing more better than before, it thinks that the model
#performance will increase with the variance, but if the irrelevant feature will added, it'll decrese the performance or the performance will stay
#constant so overcome the adjusted r2 comes, which adds the parameter for irrelevant for upcomming features

0.4889540115649359


### median_absolute_error
- **Description**: The median of the absolute differences between predicted and actual values. It is a robust measure of error.
- **Advantages**: Robust to outliers, providing a measure of central tendency for error.
- **Disadvantages**: Less commonly used compared to mean absolute error

![local](images/mar.png)

In [44]:
eva = median_absolute_error(y_test, y_pred)
eva 
#The Median Absolute Error (MedAE) is a robust measure of central tendency for errors in a regression model.
#It calculates the median of the absolute differences between predicted and actual values. Unlike the mean absolute error (MAE),
#which averages the absolute errors, MedAE takes the median of these errors. This makes MedAE less sensitive to outliers in the data.

71.42096104604173

### mean_absolute_percentage_error
- **Description**: The mean absolute percentage error (MAPE) measures the accuracy of a forecast as a percentage. It is the average of the absolute percentage errors.
- **Advantages**: Easy to interpret, expressed as a percentage.
- **Disadvantages**: Can be undefined if actual values are zero and can be skewed if actual values are very small.


In [47]:
eva = mean_absolute_percentage_error(y_test, y_pred)
eva

0.4466306299353287

### mean_tweedie_deviance
- **Description**: Measures the goodness of fit for a model predicting data that follows a Tweedie distribution, which includes Poisson, Gamma, and compound Poisson-Gamma distributions.
- **Advantages**: Flexible, applicable to a variety of distributions.
- **Disadvantages**: Requires specifying the power parameter, which might not be straightforward.

![local](images/mtd.png)

In [50]:
eva = mean_tweedie_deviance(y_test, y_pred)
eva

46604.29846025482

### mean_gamma_deviance
- **Description**: Measures the goodness of fit for a model predicting positive continuous data based on the Gamma distribution.
- **Advantages**: Suitable for data with positive continuous outcomes.
- **Disadvantages**: Assumes the data follows a Gamma distribution, which might not always be the case.

![local](images/mgd.png)

In [53]:
eva = mean_gamma_deviance(y_test, y_pred)
eva

0.35950065355252037

### mean_poisson_deviance
- **Description**: Measures the goodness of fit for a model predicting count data based on the Poisson distribution.
- **Advantages**: Suitable for count data and models predicting event rates.
- **Disadvantages**: Assumes the data follows a Poisson distribution, which might not always be the case.

![local](images/mpd.png)

In [56]:
eva = mean_poisson_deviance(y_test, y_pred)
eva

94.16875722900177

### mean_pinball_loss
- **Description**: The average pinball loss over all quantiles for quantile regression. It is used to measure the accuracy of quantile predictions.
- **Advantages**: Useful for evaluating quantile regression models.
- **Disadvantages**: Requires understanding of pinball loss and quantile regression.

![local](images/mpl.png)

In [59]:
eva = max_error(y_test, y_pred)
eva

1830.7105898712578

### max_error
- **Description**: The maximum absolute difference between predicted and actual values. It represents the worst-case error of the model.
- **Advantages**: Provides insight into the worst prediction error.
- **Disadvantages**: Sensitive to outliers and does not provide information about overall model performance.

![local](images/me.png)

In [62]:
eva = mean_absolute_percentage_error(y_test, y_pred)
eva

0.4466306299353287

### explained_variance_score
- **Description**: Measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with 1 indicating perfect prediction.
- **Advantages**: Directly interpretable as the proportion of explained variance.
- **Disadvantages**: Can be misleading when used with data that has non-linear relationships.

![local](images/evs.png)

In [65]:
eva = explained_variance_score(y_test, y_pred)
eva

0.5198317077117782

### d2_pinball_score
- **Description**: The D² based on pinball loss is a metric for quantile regression models, measuring the goodness of fit based on quantile loss.
- **Advantages**: Suitable for models predicting quantiles instead of means.
- **Disadvantages**: Less commonly used and requires understanding of quantile loss.

![local](images/d2p.png)

In [68]:
eva = d2_pinball_score(y_test, y_pred)
eva

9.835748516029419e-05

### d2_absolute_error_score
- **Description**: The D² based on absolute error is similar to R² but uses absolute errors rather than squared errors. It measures the proportion of absolute error explained by the model.
- **Advantages**: More robust to outliers than metrics based on squared errors.
- **Disadvantages**: Less commonly used and may be less intuitive for interpretation.

In [71]:
eva = d2_absolute_error_score(y_test, y_pred)
eva

9.835748516029419e-05

### d2_tweedie_score
- **Description**: The D² (also known as pseudo-R²) of a Tweedie deviance regression model is a measure of how well the model explains the variability of the response variable. It ranges from -∞ to 1, with 1 indicating a perfect fit.
- **Advantages**: Takes into account both the mean and variance structure of the data.
- **Disadvantages**: Requires specifying the power parameter of the Tweedie distribution, which might not be straightforward.

In [74]:
eva = d2_tweedie_score(y_test, y_pred)
eva

0.5151614981513495