# Practical 2.  Baseline Models

In this practical will apply our knowledge in

* Creating baseline forecasts
* Performing a train-test split
* Using forecast error metrics MAE and MAPE to select the best method 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.metrics import mean_absolute_error

In [None]:
#baseline forecast methods
from forecast.baseline import (Naive1, 
                               SNaive,
                               Drift,
                               Average,
                               baseline_estimators)

from forecast.metrics import mean_absolute_percentage_error

In [None]:
#helper functions
def preds_as_series(data, preds):
    '''
    Helper function for plotting predictions.
    Converts a numpy array of predictions to a 
    pandas.DataFrame with datetimeindex
    
    Parameters
    -----
    data - arraylike - the training data
    preds - numpy.array, vector of predictions 
    '''
    start = pd.date_range(start=data.index.max(), periods=2, freq=data.index.freq).max()
    idx = pd.date_range(start=start, periods=len(preds), freq=data.index.freq)
    return pd.DataFrame(preds, index=idx)

## Exercise 1: Using Naive1 to forecast Nile River Flow

**Step 1: Import Nile river flow data.**  

This can be found in **"data/nile.csv"**

* Hint: this is yearly data.  You can use the Annual Start ('AS') frequency

In [None]:
#your code here

**Step 2 Plot the Nile data**

In [None]:
#your code here

**Step 3: Create and fit Naive1 forecast model**

* Hint: you want to fit `nile['flow']`

In [None]:
#your code here

**Step 4: Plot the Naive1 fitted values**

All the baseline models have fitted values.  These are the in-sample prediction i.e. the predictions of the training data.

Once you have created and fitted a Naive1 model you can access the fitted values using the `.fittedvalues` property.  This returns a `DataFrame`.

Plot the fitted values against the observed data.

In [None]:
#your code here

**Step 5: Forecast the next 5 years**

After you have created a forecast plot the predictions.  

* Hint: use the `pred_as_series()` method to plot the predictions.  See the lecture notes for exampes of how to use it.

In [None]:
#your code here

## Exercise 2. Choose the best baseline forecast method for ED reattendances

**Step 1: Import emergency department reattendance data.**  

This is a time series from a hospital that measures the number of patients per month that have reattended an ED within 7 days of a previous attendance.

This can be found in **"data/ed_reattend.csv"** 

* Hint 1: The format of the 'date' column is in UK standard dd/mm/yyyy.  You will need to set the `dayfirst=True` of `pd.read_csv()` to make sure pandas interprets the dates correctly.

* Hint 2: The data is monthly and the dates are all the first day of the month.  This is called monthly start and its shorthand is 'MS'

In [None]:
#your code here

**Step 2: Perform a calender adjustment**

In [None]:
#your code here

**Step 3: Perform a train-test split**

Create a train test split where you holdback the final 6 months of the data.

Remember to work with the calender adjusted data.

In [None]:
#your code here

**Step 4: Plot the TRAINING data**

Remember don't look at the test data just yet.  You don't want to bias your model selection process.

In [None]:
#your code here

**Step 5: Create and fit Naive1, SNaive, Average, Drift and Ensemble baseline models**

* Hint: remember that the `baseline_estimators()` function will create all of these objects for you and return them in a dict.  

* Hint: Fit the TRAINING data.

In [None]:
#your code here

**Step 6: Use each model to predict 6 months ahead**

* Hint.  You need to store the prediction results so that later on you can calculate the forecast error.

In [None]:
#your code here

**Step 7: Calculate the mean absolute error of each forecast method**
    
Based on the results which method would you choose?

In [None]:
#your code here

**Step 8: Calculate the out of sample MAPE of each forecast method**
    
Would you still choose the same forecasting method?

Is it useful to calculate both metrics?

In [None]:
#your code here