The quality of a regression model is how well its predictions match up against actual values, but how do we actually evaluate quality? Statisticians have developed error metrics to judge the quality of a model and enable us to compare regresssions against other regressions with different parameters.


## Some common types of Error:
* Mean Absolute Error	
* Mean Squared Error
* Root Mean Squared Error
* Mean Absolute Percentage Error
* Mean Percentage Error


## 1. Mean Absolute Error:

- It is the simplest regression error metric

**Absolute error**: It is the amount of error in your measurements. It is the difference between the measured value and `true` value. For example, if a scale states 80 kg but you know your true weight is 79 kg, then the scale has an absolute error of 80 kg – 79 kg = 1 kg.
This can be caused by your scale not measuring the exact amount you are trying to measure. For example, your scale may be accurate to the nearest kg. If you weigh 79.6 kg, the scale may “round up” and give you 80 kg. In this case the absolute error is 80 kg – 79.6 kg = 0.4 kg.

Sometimes, the formula is written with the absolute value symbol (bars: | |). This is often used when we’re dealing with multiple measurements:
(Δx) = |xi – x|,

**Absolute Accuracy Error**: Absolute error is also called Absolute Accuracy Error. You might see the formula written 
this way:

E = x(experimental) – x(true)
The formula is the exact same thing, just with different names. “x(experimental)” is the measurement we take and x(true) is the true measurement.

The **Mean Absolute Error(MAE)** is the average of all absolute errors. The formula is:
![MAE.png](attachment:MAE.png)

* n = total number of data points,
* |xi – x| = the absolute errors,
* x = true value,
* xi = predicted value.

In [2]:
# Python code to calculate MAE

# The array of actual values and predicted values should both be of equal length in order for this sklearn function to work correctly.
actual = [12, 13, 14, 15, 15, 22, 27]
pred = [11, 13, 14, 14, 15, 16, 18]

# Using sklearn
from sklearn.metrics import mean_absolute_error as mae

# Calculate
mae(actual, pred)
# This tells us that the average difference between the actual data value and the value predicted by the model is 2.42857.
# The lower the MAE for a given model, the more closely the model is able to predict the actual values.

2.4285714285714284

In [4]:
# Using manual method
mae_sum = 0
for a, p in zip(actual, pred):
    mae_sum += abs(p-a)
mae_sum = mae_sum/len(actual)
mae_sum

2.4285714285714284

## 2. Mean Squared Error
* The mean square error (MSE) is just like the MAE, but squares the difference before summing them all instead of using the absolute value.



* The mean squared error (MSE) tells you how close a regression line is to a set of points. 


* It does this by taking the distances from the points to the regression line (these distances are the “errors”) and squaring them. ***The squaring is necessary to remove any negative signs***. It also gives more weight to larger differences. It’s called the mean squared error as we’re finding the average of a set of errors. 

* ***The lower the MSE, the better the prediction***. Because the smaller the mean squared error, the closer we are to finding the line of best fit.


* Because we are squaring the difference, the MSE will almost always be bigger than the MAE. For this reason, we cannot directly compare the MAE to the MSE

* Formula for this:
![MSE.png](attachment:MSE.png)

* n = total number of data points,
* y = actual value,
* ŷ = predicted value

**General steps to calculate the MSE from a set of X and Y values**:

- Find the regression line.
- Insert your X values into the linear regression equation to find the new Y values (Y’).
- Subtract the new Y value from the original to get the error.
- Square the errors.
- Add up the errors (the Σ in the formula is summation notation).


In [9]:
y_actual = [2, -0.5, 2, 9]
y_pred = [4.5, 0.0, 2, 6]

# Using sklearn
from sklearn.metrics import mean_squared_error as mse
mse(y_actual, y_pred)

3.875

In [11]:
# Using manual method
mse_sum = 0
for a, p in zip(y_actual, y_pred):
    mse_sum += pow(abs(a-p), 2)
mse_sum = mse_sum/len(y_actual)
mse_sum

3.875