# Evaluating accuracy of a model using calculations
After you train a model, you need to get a sense of it's accuracy. The accuracy of a model gives you an idea of how much confidence you can put it predictions made by the model.

The **scitkit-learn** and **numpy** libraries are both helpful for measuring model accuracy

Let's start by recreating our trained linear regression model from the last lesson

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [2]:
# Load our data from the csv file
delays_df = pd.read_csv('Data/Lots_of_flight_data.csv') 

# Remove rows with null values since those will crash our linear regression model training
delays_df.dropna(inplace=True)

# Move our features into the X DataFrame
X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]

# Move our labels into the y DataFrame
y = delays_df.loc[:,['ARR_DELAY']] 

# Split our data into test and training DataFrames
X_train, X_test, y_train, y_test = train_test_split(
    X, 
    y, 
    test_size=0.3, 
    random_state=42
)
regressor = LinearRegression()     # Create a scikit learn LinearRegression object
regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data

y_pred = regressor.predict(X_test)

## Measuring accuracy
Now that we have a trained model there are a number of metrics you can use to check the accuracy of the model. 

All these metrics are based on mathematical calculations, the key take-away here is you don't have to calculate everything yourself. Scikit-learn and numpy will do most of the work and provide good performance.

### Mean Squared Error (MSE)
The MSE is the average error performed by the model when predicting the outcome for an observation. 
The lower the MSE, the better the model.

MSE is the average squared difference between the observed actual outome values and the values predicted by the model.

MSE = mean((actuals - predicteds)^2) 

We could write code to loop through our records comparing actual and predicated values to perform this calculation, but we don't have to! Just use **mean_squared_error** from the **scikit-learn** library

In [3]:
from sklearn import metrics
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))

Mean Squared Error: 2250.4445141530855


## Root Mean Squared Error (RMSE)
RMSE is the average error performed by the model when predicting the outcome for an observation. 
The lower the RMSE, the better the model.

Mathematically, the RMSE is the square root of the mean squared error 

RMSE = sqrt(MSE)

Skikit learn does not have a function for RMSE, but since it's just the square root of MSE, we can use the numpy library which contains lots of mathematical functions to calculate the square root of the MSE

In [None]:
import numpy as np
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

## Mean Absolute Error (MAE)
MAE measures the prediction error. The lower the MAE the better the model

Mathematically, it is the average absolute difference between observed and predicted outcomes

MAE = mean(abs(actuals - predicteds)). 

MAE is less sensitive to outliers compared to RMSE. Calculate RMSE using **mean_absolute_error** in the **scikit-learn** library

In [None]:
print('Mean absolute error: ',metrics.mean_absolute_error(y_test, y_pred))

# R^2 or R-Squared

R squared is the proportion of variation in the outcome that is explained by the predictor variables. It is an indication of how much the values passed to the model influence the predicted value. 

The Higher the R-squared, the better the model. Calculate R-Squared using **r2_score** in the **scikit-learn** library.

In [None]:
print('R^2: ',metrics.r2_score(y_test, y_pred))

Different models have different ways to measure accuracy. Fortunately **scikit-learn** and **numpy** provide a wide variety of functions to help measure accuracy.