# Exploring Time Series

## Exploring and plotting before forecasting.

Before any forecasting is attempted it is important to understand a time series.

In this notebook you will learn how to:

* Plot a time series
* Adjust monthly time series to account for the different number of days
* Run a smoother through the time series to assess trend.
* Break a time series into its trend and seasonal components.

#### References
* [Pandas Plot](#https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html)
* [Pandas Rolling](#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html)

## Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.style as style
from statsmodels.tsa.seasonal import seasonal_decompose

In [None]:
import statsmodels as sm

In [None]:
sm.__version__

# Good practice to use v 0.11.1

## The ED arrivals dataset.

The dataset we will use represent monthly adult (age > 18) arrivals to an Emergency Department.  The monthly frequency observations are between April 2009 and May 2017. 

In [None]:
url = 'https://raw.githubusercontent.com/hsma5/9a_introduction_to_forecasting/main/data/' \
        + 'ed_mth_ts.csv'
ed_month = pd.read_csv(url, index_col='date', parse_dates=True)
ed_month.index.freq = 'MS'

The first thing you should do when exploring a time series is check its length and duration.

In [None]:
# This tells us how many months are in the ts
ed_month.shape 

In [None]:
# the minimum date
ed_month.index.min()

In [None]:
# the maximum date
ed_month.index.max()

### How to use Pandas and Matplotlib to visualise a time series

Pandas implements matplotlib functionality as part of its `DataFrame`. The quickest way to visualise a time series in Python is therefore to call the `plot()` method of a `DataFrame`.  

The `plot()` method takes a tuple parameter called `figsize`. The tuple represents the (length, height) of the image.

In [None]:
# the plot function returns an axis object we usually assign that to `ax`
ax = ed_month.plot(figsize=(12,4))

You can then easily to save a high resolution image to file if you would like to use it in a report.

In [None]:
ax = ed_month.plot(figsize=(12,4))
ax.figure.savefig('images/explore.png')

### Improving the appearance of your time series plot

Matplotlib is very flexible. The full functionality is beyond the scope of this tutorial and it is recommended to review the matplotlib site for examples.  Here we recommend the following parameters to help manipulate your plot.

* `color` e.g. 'green', 'blue' or 'orange'
* `linestyle` e.g. '--' for dashed, '-.' for dash-dot, or '' for none.
* `linewidth` - a number - typically 1, 1.5 or 2 will do.
* `marker` - e.g. 'o' for dots, '+' for crosses, '^' for triangle, and '' for none.

In [None]:
ax = ed_month.plot(figsize=(12,4),
                   color='black',
                   linestyle='-.',
                   marker='^',
                   linewidth=1)

The `plot` method returns an `axis` object. You can use this to manipulate the plot.  The following is often useful for time series plots.

* The y and x scale
* The y and x label.

In [None]:
ax = ed_month.plot(figsize=(12,4),
                   legend=False);
ax.set_ylabel('attendances')

# set ylim returns a tuple.  Its not useful so we normally assign it to '_'
_ = ax.set_ylim(0, 12_000) # underscore acts as a comma

#### Using Seaborn
You can also use the `seaborn` package to improve the default appearance of your charts.  

In [None]:
import seaborn as sns
sns.set()

In [None]:
# lets take a look at how seaborn has now improved appearance
_ = ed_month.plot(figsize=(12,4))

### Visualising monthly data after adjusting for days in the month

When you are working with monthly data, some of the noise you are seeing the time series is due to months having a different number of days (i.e. February having 28 days).  This makes forecasting harder than it needs to be.  Adjusting the time series by dividing by the number of days per month removes that noise.  

This is very straightforward in pandas, using the built in property `DateTimeIndex.days_in_month`

In [None]:
ed_month.index.days_in_month

In [None]:
arrival_rate = ed_month['arrivals'] / ed_month.index.days_in_month

In [None]:
arrival_rate.head()

In [None]:
ed_month.head()

**Note that the units of the time series are now '*attendances / per day*'.**

In [None]:
_ = ed_month.plot(figsize=(12,4));

In [None]:
# slightly smoother using arrival rate
# therefore making it easier for methods to forecast data
_ = arrival_rate.plot(figsize=(12,4));

## Run a smoother through the time series.

Time series are subject to seasonal patterns and noise.  To help explore the trend in the data you can smooth the time series using a *moving average*.

Use the `rolling()` method of a pandas dataframe to create a moving average.

We will run a 12 month moving average through the data.

**What would happen if we looked at a 6 month rolling average?**

In [None]:
ma6 = arrival_rate.rolling(window=12).mean() #can also use median

In [None]:
ax = arrival_rate.plot(figsize=(12,4),
                       label='observations')
ax = ma6.plot(ax=ax,
              linestyle='-.',
              label='Smoothed Observations (MA_12)') #MA = moving average
ax.legend()
_ = ax.set_ylabel('attends/day')

### Breaking a times series up into its trend and seasonal components.

To help visualise and understand trend and seasonality in a time series we can use seasonal decomposition.

This is a model based approach that breaks the time series into three components.  The basic approach to seasonal decomposition has two forms: additive and multiplicative.  

#### Additive decomposition

If we assume that an observation at time t $Y_t$ is the additive sum of trend $T_t$, seasonality $S_t$ and random error $E_t$.  then we have the following model.

$Y_t = T_t + S_t + E_t$

An additive model merely suggests that the components are added together. It is linear in nature and changes over time are consistently made by the same amount. *NB inear seasonality has the same frequency (width of cycles) and amplitude (height of cycles).*

#### Multiplicative decomposition

If the seasonal fluctuations of the data grow over time then it is best to a multiplicative model. Where an observation at time t $Y_t$ is the product of multiply the trend $T_t$, seasonality $S_t$ and random error $E_t$

$Y_t = T_t \cdot  S_t \cdot  E_t$

A multiplicative model is nonlinear, such as quadratic or exponential. NB A non-linear seasonality has an increasing or decreasing frequency and/or amplitude over time.

![image](images/add_mult.png)
[Credit: Nikolaos Kourentzes](#https://kourentzes.com/forecasting/2014/11/09/additive-and-multiplicative-seasonality/)



Python has a built in seasonal decomposition method for you to use.  It can be imported from `statsmodels.tsa.seasonal.seasonal_decompose`

In [None]:
# its easy to use. Pass in the ts and specify the model
# How to decide on the model? Clear trend but spread appears consistent.
decomp = seasonal_decompose(arrival_rate, model='additive')

#### Plotting the components

The results of the seasonal decomposition include dataframes containing the trend and seasonality components.  As they are dataframe they can be plotted in the same manner as the raw data.

**Plotting Trend**

In [None]:
# Note that this is similar to smoothing....
_ = decomp.trend.plot(figsize=(12,4))

**Plotting Seasonality**

In [None]:
_ = decomp.seasonal.plot(figsize=(12,4))

**Residuals (error)**

In [None]:
_ = decomp.resid.plot(figsize=(12,4))