<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Time Series: Decomposition

## Learning Objectives
 
**After this lesson, you will be able to:**
- Describe the different components of time series data (trend, seasonality, cyclical, and residual).
- Decompose time series data into trend, seasonality, cyclical, and residual components.
- Plot the decomposed components of a time series.

## Lesson Guide

**Decomposition**
- [Time Series Decomposition](#A)
- [Decompose a Time Series](#B)
- [Plotting the Residuals and the ACF and PACF of the Residuals](#C)
- [Independent Practice](#D)
----

<h2><a id = "A">Time Series Decomposition</a></h2>

Splitting a time series into several components is useful for both understanding the data and diagnosing the appropriate forecasting model. Each of these components will represent an underlying pattern. 

- **Trend**: A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes, we will refer to a trend “changing direction” when, for example, it might go from an increasing trend to a decreasing trend.

- **Seasonal**: A seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). Seasonality is always of a fixed and known period.

- **Residual**: The leftover or error component.

### Guided Practice

We are going to play around with some bus data from Portland, Oregon. Load in the data set below and check it out.

In [None]:
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from dateutil.relativedelta import *
%matplotlib inline

plt.rcParams['figure.figsize'] = [15, 8]
plt.rcParams['font.size'] = 14
plt.style.use('fivethirtyeight')

In [None]:
bus = pd.read_csv('./data/bus.csv')
bus.head()

In [None]:
bus.tail()

We'll need to clean this data a little. Let's simplify the names of the columns. There is a bad row at the end of the file so we need to get rid of it. Additionally, we need to make the `riders` column an integer. 

In [None]:
bus.drop(bus.index[114], inplace=True)
bus.columns= ['riders']
bus['riders'] = bus.riders.apply(lambda x: int(x))
bus.head()

Looking at the original name of the `riders` column we can see that the data is monthly bus ridership from January 1973 to June 1982. We're going to create an artificial date index using the `relativedelta()` function, as shown below. We will simply start at `1973-01-01` and iterate up one month at a time.

In [None]:
start = datetime.datetime.strptime("1973-01-01", "%Y-%m-%d")
date_list = [start + relativedelta(months=x) for x in range(0,114)] # Edited to 115.
bus['date'] = date_list

bus.set_index(['date'], inplace=True)
bus.index.name=None

bus.head() 

In [None]:
bus.tail()

### StatsModels Time Series Tools 

The Python StatsModels library offers a wide variety of reliable time series analysis tools. We'll start off by loading the autocorrelation and partial autocorrelation functions, as well as a function for decomposing time series.

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose

### Plot the raw data.

We can look at the raw data first. Let's plot the time series.

In [None]:
bus.riders.plot(title= 'Monthly Ridership (100,000s)');

<h2><a id = "B">Decompose the time series and plot using the `.seasonal_decompose()` function.</a></h2>

A useful abstraction for selecting forecasting methods is to break a time series down into systematic and unsystematic components.

* **Systematic**: Components of the time series that have consistency or recurrence and can be described and modeled.
* **Non-Systematic**: Components of the time series that cannot be directly modeled.

A given time series is thought to consist of three systematic components including level, trend, seasonality, and one non-systematic component called noise.

These components are defined as follows:

* **Level**: The average value in the series.
* **Trend**: The increasing or decreasing value in the series.
* **Seasonality**: The repeating short-term cycle in the series.
* **Noise**: The random variation in the series.

Via [Machine Learning Mastery](https://machinelearningmastery.com/decompose-time-series-data-trend-seasonality/).

Using the `seasonal_decompose()` function, we can break the time series into its constituent parts. The function returns a result object that contains arrays to access four pieces of data from the decomposition.

Use the function on the `riders` data with a frequency of `12`, then plot the data. We're using a frequency of 12 because the data are monthly. The result object from `seasonal_decompose()` has a `.plot()` function, like with Pandas DataFrames.

In [None]:
decomposition = seasonal_decompose(bus.riders, freq=12)  
decomposition

In [None]:
fig = decomposition.plot();

We can see that the trend and seasonality information extracted from the series does seem reasonable. The residuals are also interesting, showing periods of high variability in certain years of the series.

### Plot a single component of the decomposition plot.

We can pull out just one component of the decomposition plot.

In [None]:
plt.rcParams['figure.figsize'] = [10, 5]
seasonal = decomposition.seasonal 
seasonal.plot();

In [None]:
trend = decomposition.trend
trend.plot();

<h2><a id = "C">Examining the residuals and their ACF and PACF.</a></h2>

Let's examine the residuals of our data.

In [None]:
resid = decomposition.resid
resid.plot();

In [None]:
plot_acf(resid, lags=30);

In [None]:
plot_pacf(resid, lags=30);

We notice that the residuals of our time series don't have any significant autocorrelation. This is because the trend and seasonal components have been taken out and adjusted for. 

**Important Takeaways**
* Trend is a long-term change in the data. 
* Seasonality is a pattern of a fixed period that repeats in the data. 
* Residuals are the error components of the data.
* StatsModels contains a `seasonal_decompose()` function that breaks a time series into its components.

<h2><a id="D">Independent Practice</a></h2>

**Instructor Note:** These are optional and can be assigned as student practice questions outside of class.

### Import the Airline Passengers data set, preprocess the data, and plot the raw time series.

In [None]:
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from dateutil.relativedelta import *
%matplotlib inline

plt.rcParams['figure.figsize'] = [10, 7]
plt.rcParams['font.size'] = 14
plt.style.use('fivethirtyeight')

In [None]:
airline = pd.read_csv('./data/airline.csv')
airline.head()

In [None]:
airline.tail()

In [None]:
# Rename the column containing the number of passengers and convert the values to int

In [None]:
# Create an artificial date index using the relativedelta() function.
# Looking at the original name of the 'passenger' column what starting date should we use?
start = 
date_list = 
airline['date'] = 

airline.head()

In [None]:
airline.tail()

In [None]:
# Plot the data.

### Decompose the time series and plot using the `.seasonal_decompose()` function.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

In [None]:
# Decompose the time series with frequency 12

In [None]:
# Plot all four components of the decomposition.

### Interpret these plots.