The quality of a regression model is how well its predictions match up against actual values, but how do we actually evaluate quality? Statisticians have developed error metrics to judge the quality of a model and enable us to compare regresssions against other regressions with different parameters.


# Some common types of Error:
* Sum of Squared Errors
* Mean Absolute Error
* Mean Squared Error
* Root Mean Squared Error
* Mean Absolute Percentage Error
* Mean Percentage Error


## 1. Sum of Squared Error
* In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepancy between the data and an estimation model, such as a linear regression. A small RSS indicates a tight fit of the model to the data. It is used as an optimality criterion in parameter selection and model selection.


* In simple language, SSE is the error is the difference between the observed value and the predicted value.
![SSE.png](attachment:SSE.png)


## 2. Mean Absolute Error:

- It is the simplest regression error metric

**Absolute error**: It is the amount of error in your measurements. It is the difference between the measured value and `true` value. For example, if a scale states 80 kg but you know your true weight is 79 kg, then the scale has an absolute error of 80 kg – 79 kg = 1 kg.
This can be caused by your scale not measuring the exact amount you are trying to measure. For example, your scale may be accurate to the nearest kg. If you weigh 79.6 kg, the scale may “round up” and give you 80 kg. In this case the absolute error is 80 kg – 79.6 kg = 0.4 kg.

Sometimes, the formula is written with the absolute value symbol (bars: | |). This is often used when we’re dealing with multiple measurements:
(Δx) = |xi – x|,

**Absolute Accuracy Error**: Absolute error is also called Absolute Accuracy Error. You might see the formula written 
this way:

E = x(experimental) – x(true)
The formula is the exact same thing, just with different names. “x(experimental)” is the measurement we take and x(true) is the true measurement.

The **Mean Absolute Error(MAE)** is the average of all absolute errors. The formula is:
![MAE.png](attachment:MAE.png)

* n = total number of data points,
* |xi – x| = the absolute errors,
* x = true value,
* xi = predicted value.

In [15]:
# Python code to calculate MAE

# The array of actual values and predicted values should both be of equal length in order for this sklearn function to work correctly.
actual = [12, 13, 14, 15, 15, 22, 27]
pred = [11, 13, 14, 14, 15, 16, 18]

# Using sklearn
from sklearn.metrics import mean_absolute_error as mae

# Calculate
mae(actual, pred)
# This tells us that the average difference between the actual data value and the value predicted by the model is 2.42857.
# The lower the MAE for a given model, the more closely the model is able to predict the actual values.

2.4285714285714284

In [16]:
# Using manual method
mae_sum = 0
for a, p in zip(actual, pred):
    mae_sum += abs(p-a)
mae_sum = mae_sum/len(actual)
mae_sum

2.4285714285714284

## 3. Mean Squared Error
* The mean square error (MSE) is just like the MAE, but squares the difference before summing them all instead of using the absolute value.



* The mean squared error (MSE) tells you how close a regression line is to a set of points. 


* It does this by taking the distances from the points to the regression line (these distances are the “errors”) and squaring them. ***The squaring is necessary to remove any negative signs***. It also gives more weight to larger differences. It’s called the mean squared error as we’re finding the average of a set of errors. 

* ***The lower the MSE, the better the prediction***. Because the smaller the mean squared error, the closer we are to finding the line of best fit.


* Because we are squaring the difference, the MSE will almost always be bigger than the MAE. For this reason, we cannot directly compare the MAE to the MSE

* Formula for this:
![MSE.png](attachment:MSE.png)

* n = total number of data points,
* y = actual value,
* ŷ = predicted value

**General steps to calculate the MSE from a set of X and Y values**:

- Find the regression line.
- Insert your X values into the linear regression equation to find the new Y values (Y’).
- Subtract the new Y value from the original to get the error.
- Square the errors.
- Add up the errors (the Σ in the formula is summation notation).


In [17]:
y_actual = [2, -0.5, 2, 9]
y_pred = [4.5, 0.0, 2, 6]

# Using sklearn
from sklearn.metrics import mean_squared_error as mse
mse(y_actual, y_pred)

3.875

In [18]:
# Using manual method
mse_sum = 0
for a, p in zip(y_actual, y_pred):
    mse_sum += pow(abs(a-p), 2)
mse_sum = mse_sum/len(y_actual)
mse_sum

3.875

## 4. Root Mean Squared Error
* Root mean squared error (RMSE) is the square root of the mean of the square of all of the error. The use of RMSE is very common, and it is considered an excellent general-purpose error metric for numerical predictions.
* It shows how far predictions fall from measured true values using Euclidean distance
* It is the standard deviation of the residuals(difference between prediction and truth).
* Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are.
* In other words, it tells you how concentrated the data is around the line of best fit.
* Root mean square error is commonly used in climatology, forecasting, and regression analysis to verify experimental results.
* Formula:
![rmse.png](attachment:rmse.png)

* Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. This means the RMSE should be more useful when large errors are particularly undesirable


In [19]:
from sklearn.metrics import mean_squared_error
import math

actual = [0, 1, 6, 4, 5]
predicted = [0.1, 5.3, 2.1, 3.5, 3.1]

mse = mean_squared_error(actual, predicted)
rmse = math.sqrt(mse)
print(rmse)
# Or we can use
rmse = mean_squared_error(actual, predicted, squared=False)
# If squared is True then returns MSE value, if False returns RMSE value.
print(rmse)


2.741167634421507
2.741167634421507


## 5. Mean Absolute Percentage Error
* The mean absolute percentage error (MAPE) is the percentage equivalent of MAE. The equation looks just like that of MAE, but with adjustments to convert everything into percentages.
![MAPE.jpg](attachment:MAPE.jpg)

* Just as MAE is the average magnitude of error produced by your model, the MAPE is how far the model’s predictions are off from their corresponding outputs on average. Like MAE, MAPE also has a clear interpretation since percentages are easier for people to conceptualize. Both MAPE and MAE are robust to the effects of outliers thanks to the use of absolute value.

* MAPE is the most common measure used to forecast error, and works best if there are no extremes to the data (and no zeros, as division by 0  will give error).

In [20]:
actual = [0.1, 1, 6, 4, 5]
predicted = [0.2, 5.3, 2.1, 3.5, 3.1]

mape_sum = 0
for a, p in zip(actual, predicted):
    mape_sum += (abs((a - p))/a)
mape = mape_sum/len(actual)
mape
# MAPE states that our predictions are, on average, 1.291% off from actual value.

1.291

## 6. Mean Percentage Error
* The mean percentage error (MPE) equation is exactly like that of MAPE. The only difference is that it lacks the absolute value operation.
* Even though the MPE lacks the absolute value operation, it is actually its absence that makes MPE useful.
* Since positive and negative errors will cancel out, we cannot make any statements about how well the model predictions perform overall. However, if there are more negative or positive errors, this bias will show up in the MPE. Unlike MAE and MAPE, MPE is useful to us because it allows us to see if our model systematically underestimates (more negative error) or overestimates (positive error).
![MPE.jpg](attachment:MPE.jpg)

In [21]:
actual = [0.1, 1, 6, 4, 5]
predicted = [0.2, 5.3, 2.1, 3.5, 3.1]
mpe_sum = 0
for a, p in zip(actual, predicted):
    mpe_sum += ((a - p)/a)
mpe = mpe_sum/len(actual)
mpe
# Here MPE indicates that it actually systematically underestimates the actual value.
# Knowing this aspect about our model is helpful to us since it allows us to look back at the data and reiterate on which inputs to include that may improve our metrics.

-0.829