# Computer Lab 1: Naive Models

In any forecasting study the first thing you need to do is create a naive benchmark.  We can use naive benchmarks as simple methods for producing forecasting or as a way to check that the more complicated models we will use later the course are worth the effort to use/maintain.

**In this practical will apply our knowledge in**

* Creating baseline naive forecasts
* Performing a train-test split
* Using forecast error metrics MAE and MAPE to select the best method 
* Producing prediction intervals for naive methods

---
**Before attempting the exercises, it is recommended that you watch the following code along tutorials that describes how to use python for basic forecasting**.

* **Reading time series data into pandas**:
    * Code along video (5 mins): https://bit.ly/pandas_ts
    * [Code along notebook](https://colab.research.google.com/github/health-data-science-OR/forecasting/blob/master/01_basics/01_code_along_notebooks/pandas_time_series.ipynb)
    
* **Benchmark models**:
    * Code along video (15 mins): https://bit.ly/benchmark_code_along
    * [Code along notebook](https://colab.research.google.com/github/health-data-science-OR/forecasting/blob/master/01_basics/01_code_along_notebooks/ca_benchmark_forecasts.ipynb)
    
---

# Standard Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sys

# Install forecast-tools

In [None]:
# if running in Google Colab install forecast-tools
if 'google.colab' in sys.modules:
    !pip install forecast-tools

# forecast-tools imports

In [None]:
#baseline forecast methods
from forecast_tools.baseline import (Naive1, 
                                     SNaive,
                                     Drift,
                                     Average,
                                     baseline_estimators)

from forecast_tools.metrics import (mean_absolute_percentage_error,
                                    mean_absolute_error)

# Helper functions

In [None]:
def preds_as_series(data, preds):
    '''
    Helper function for plotting predictions.
    Converts a numpy array of predictions to a 
    pandas.DataFrame with datetimeindex
    
    Parameters
    -----
    data - arraylike - the training data
    preds - numpy.array, vector of predictions 
    
    Returns:
    -------
    pandas.DataFrame
    '''
    start = pd.date_range(start=data.index.max(), periods=2, 
                          freq=data.index.freq).max()
    idx = pd.date_range(start=start, periods=len(preds), freq=data.index.freq)
    return pd.DataFrame(preds, index=idx)

## Exercise 1: Using Naive1 to forecast monthly outpatient appointments.

**Step 1: Import monthly outpatient appointments time series**  

This can be found in **"data/out_appoints_mth.csv"**
or 'https://raw.githubusercontent.com/health-data-science-OR/hpdm097-datasets/master/out_appoints_mth.csv'

* Hint: this is monthly data.  You can use the monthly Start ('MS') frequency

In [None]:
# your code here ...
url = 'https://raw.githubusercontent.com/health-data-science-OR/' \
        + 'hpdm097-datasets/master/out_appoints_mth.csv'

**Step 2 Plot the data**

In [None]:
# your code here ...

**Step 3: Create and fit Naive1 forecast model**

* Hint: you want to fit `appoints['out_apts']`

In [None]:
# your code here ...

**Step 4: Plot the Naive1 fitted values**

All the baseline models have fitted values.  These are the in-sample prediction i.e. the predictions of the training data.

Once you have created and fitted a Naive1 model you can access the fitted values using the `.fittedvalues` property.  This returns a `DataFrame`.

Plot the fitted values against the observed data.

In [None]:
# your code here ...

**Step 5: Forecast the next 6 months**

After you have created a forecast plot the predictions.  

* Hint: use the `pred_as_series()` method to plot the predictions.  See the lecture notes for exampes of how to use it.

In [None]:
# your code here ...

## Exercise 2. Choose the best baseline forecast method for ED reattendances

**Step 1: Import emergency department reattendance data.**  

This is a time series from a hospital that measures the number of patients per month that have reattended an ED within 7 days of a previous attendance.

This can be found in **"ed_reattend.csv"**:  
'https://raw.githubusercontent.com/health-data-science-OR/hpdm097-datasets/master/ed_reattend.csv'

* Hint 1: The format of the 'date' column is in UK standard dd/mm/yyyy.  You will need to set the `dayfirst=True` of `pd.read_csv()` to make sure pandas interprets the dates correctly.

* Hint 2: The data is monthly and the dates are all the first day of the month.  This is called monthly start and its shorthand is 'MS'

In [None]:
# your code here ...
url = 'https://raw.githubusercontent.com/health-data-science-OR/' \
       + 'hpdm097-datasets/master/ed_reattend.csv'

**Step 2: Perform a calender adjustment**

In [None]:
#your code here ...

**Step 3: Perform a train-test split**

Create a train test split where you holdback the final 6 months of the data.

Remember to work with the calender adjusted data.

* Hint: The test set is the last 6 rows in your pandas DataFrame

In [None]:
#your code here ...

**Step 4: Plot the TRAINING data**

Remember don't look at the test data just yet.  You don't want to bias your model selection process.

In [None]:
# your code here ...

**Step 5: Create and fit Naive1, and SNaive baseline models**

* Hint: Fit the TRAINING data.

In [None]:
# your code here ...

**Step 6: Use each model to predict 6 months ahead**

* Hint.  You need to store the prediction results so that later on you can calculate the forecast error.

In [None]:
# your code here ...

**Step 7: Calculate the mean absolute error of each forecast method**
    
Based on the results which method would you choose?

In [None]:
# your code here ...

**Step 8: Produce 80 and 95% prediction intervals for your chosen method.**

In [None]:
#your code here ...

# End