# Evaluating models

We'll be using the following data science libraries throughout this course (click the links for cheat sheets provided by DataCamp)
* [numpy](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)  (for vectorised math operations)
* [pandas](http://datacamp-community.s3.amazonaws.com/9f0f2ae1-8bd8-4302-a67b-e17f3059d9e8) (for dataframes)
* [keras](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Keras_Cheat_Sheet_Python.pdf) (for neural networks)
* [scikitlearn](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf) (for other machine learning models)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

The first step to training models is to figure out a way to tell how good a model is.  

You can't just train until it "gets the right results", there will almost always be some difference between the predicted outcome and the measured outcomes.

To tackle this, we'll split our data up into a "training set" (for inspection and training models), and a "test set" for model evaluation.

This is to ensure a fair test of the model's ability to generalise to new examples. The same reason why an exam contains different questions to the practice exams a student learns from.

Throughout this module, we'll be using the [Ames Housing Prices Data Set](https://www.kaggle.com/c/house-prices-advanced-regression-techniques)

The data is described [here](https://github.com/eliiza/ml-training-data/blob/master/housing_price_data/data_description.txt)

In [None]:
# Load data from Eliiza's github page
labelled = pd.read_csv("https://raw.githubusercontent.com/eliiza/ml-training-data/master/housing_price_data/housing_data.csv") 

# Randomly shuffle the rows to avoid any ordering
labelled = labelled.sample(frac=1.0, replace=False, random_state=2)

# Split up into training and test sets (80:20).
num_rows = labelled.shape[0]
training_set = labelled[0:round(num_rows*0.8)]
test_set = labelled[round(num_rows*0.8):num_rows]

# If you wanted to save the training and test sets when running locally
# training_set.to_csv("../data/housing_price_data/training_data.csv",index = False)
# test_set.to_csv("../data/housing_price_data/test_data.csv",index = False)

### Exercise
- How many observations are now in the training and test datasets?

# Mean Absolute Error

Now that we have a test set, we can start to evaluate some models!

We'll use the mean absolute error. This is the average size of the difference between the predicted value vs the observed value.

Formally, this is defined as

$$  \mathsf{MAE} = \frac{1}{n} * \sum_{i=1}^n |\mathsf{predicted\_value}[i] - \mathsf{actual\_value}[i]|  $$

## Vectorised Math Operations

Machine Learning makes significant use of vectorised functions (or maths with arrays) to perform various aspects of model building, and as we will see in this lab, evaluating the error of our model on our test set of data. 

We will be using the numpy library to do this maths for us, but it is good to understand that the operations taking place are occuring over the whole array. 

In [None]:
# Create two one-dimensional numpy arrays (vectors)
a = np.array([2,2,-4,6])
b = np.array([-1,4,-3,4])

print(a - b)

## Exercise
Using `np.abs` and `np.mean`, calculate the MAE between the predictions `a` and the observed values `b` above

# Model Evaluation Workflow

To help with our workflow as we try different models throughout this module, we're going to try to write functions for many of our tasks. Ask if you're not sure what any function does - we can explain!

First, we write a function to evaluate how accurate any model is. It evaluates how good the target model is at predicting the Sales Price in the unseen test dataset. Its only argument is itself a function, `model_fn`. `model_fn` must be able to return Sales Price predictions when given appropriate input housing data (see below for an example).

In [None]:
def evaluate_model(model_fn, print_result=False):
    '''
    Consumes a function model_fn
    and evaluates its predictive accuracy against 
    the housing prices test set.
    We have included a switch for the output to be a more human readable
    printed version or the uncurtailed floating point value of the average.
    '''
    # Load test set from Eliiza's GitHub (pre-saved)
    test_data = pd.read_csv("https://raw.githubusercontent.com/eliiza/ml-training-data/master/housing_price_data/test_data.csv")
    actual_values = test_data['SalePrice']
    # Pass in all columns except SalePrice
    test_input = test_data.filter(regex='^(?!SalePrice$).*')
    predicted_saleprice = model_fn(test_input)
    mae = np.mean(np.abs(predicted_saleprice-actual_values))
    if print_result:
        return print("The model is inaccurate by $%.2f on average." % mae)
    else:
        return mae

Let's evaluate a very simple predictive heuristic: **Sale Price = 50,000 * {Overall Quality Rating}**. `OverallQual` had a high correlation with `SalePrice` in the data exploration.

So that it can be an input to `evaluate_model()`, we write it as a function which returns predictions when given some input_data:

In [None]:
def quality_heuristic(input_data):
    """
    Extracts a single vector called 'OverallQual' from input data and multiplies every value by 50,000
    """
    quality = input_data['OverallQual']
    prediction = 50000*quality
    return(prediction)

evaluate_model(quality_heuristic, print_result=True)

**Note:** we will follow this pattern throughout this module:
1. Write a function that contains the model we want to use. It returns predictions when given input data.
2. Use the `evaluate_model()` function to evaluate the model from above on the unseen test data.

## Exercise 

- Check you follow the structure and concepts behind `evaluate_model()` and `quality_heuristic()`
- Then try to make a new version of `quality_heuristic` that achieves a lower score.

## Exercise (optional)

We can make new "Quality Heuristics" models simply by changing the amount of dollars we multiply the number of quality rating by. Here is a function that allows us to change this value (`a`) in an automated manner.

In [None]:
def generate_quality_heuristic(a):
    """
    Creates a heuristic function with a given linear multiplier
    """
    def heuristic(input_data):
        prediction = a * input_data['OverallQual']
        return(prediction)
    return(heuristic)

The question is now "What's the best amount of dollars to multiply by?" 

Let's start to answer this by using a Brute Force approach - a for loop and trial and error. 

**Exercise:** Choose the values of `a` to try in the chunk of code below. We will loop over your values and store the accuracy of each model's attempt.

In [None]:
# Define the values of `a` to try over.
values = [] # Insert values of a. 
# Hint: You can use range() to help. Maybe with a stepsize of $1000!

# Create empty list and evaluate models between $0 and $100,000 per quality rating
model_scores = []
for i in values:
    score = evaluate_model(generate_quality_heuristic(i))
    model_scores.append(score)

In [None]:
# Plot the model MAEs
plt.plot(model_scores)
plt.xlabel("Quality Score x 1000")
plt.ylabel("Mean absolute error")

In [None]:
# Find the minimum score with a pandas dataframe
models = pd.DataFrame()
models['Score'] = model_scores
models.loc[models.Score == models.Score.min()] 

One issue with this approach is choosing an appropriate level of granularity.  
In this example we probably stepped through increments of $1000.  
But how do we know what the optimal value really is between multiples of 1000?  
We could lower our step size but this would mean increasing our compute time by orders of magnitude. 

## Exercise (optional)

Extend `evaluate_model` to report how fast the model takes to make its predictions [hint](https://docs.python.org/3.6/library/time.html)
