# <center> Time Series

### Outcomes
- Visualize time series data
- Understand and measure stationarity 
- Detect trends and seasonality
- Use differencing techniques
- Utilize ACF and PACF

## <center> What is a time series?

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
## generate time series data
freq = 2
period = 1
amp = 50
offset = 50
n = 12*10
t = np.linspace(0, period, n)
values = np.sin(2*np.pi*freq*t)*amp+offset
date_vals = pd.date_range(start='1/1/2008', periods=n, freq='M')
time_series =  pd.Series(values, index=date_vals)
time_series.index = pd.DatetimeIndex(time_series.index)

In [None]:
time_series.plot()

## <center> A time series contains continuous values with a consistent frequency or equal intervals between them 

## <center> Stationarity

<center> A time series is stationary if it maintains a constant mean, constant variance, and constant covariance over time. 

<center><img src='stationarity.png' height=400 width=400>

#### <center> Which of these series are stationary and what do they violate if not?

<center> <img src='stationary_sorting_activity.jpg' height=550 width=550>

### <center> Dickey-Fuller Test

<center> Hypothesis test to check for stationarity


    Null Hypothesis (H0): The time series is non-stationary.
    
    Alternate Hypothesis (H1): The time series is stationary.


In [None]:
from statsmodels.tsa.stattools import adfuller

In [None]:
time_series.head()

In [None]:
results = adfuller(time_series)
print('ADF Statistic:',results[0])
print('p-value:',results[1])

In [None]:
freq = 2
period = 1
amp = 50
offset = 50
n = 12*10
t = np.linspace(0, period, n)
values = np.sin(2*np.pi*freq*t)*amp+offset
values = [i+j for i,j in zip(values,range(len(values)))]
date_vals = pd.date_range(start='1/1/2008', periods=n, freq='M')
time_series =  pd.Series(values, index=date_vals)
time_series.index = pd.DatetimeIndex(time_series.index)

In [None]:
time_series.plot()

In [None]:
results = adfuller(time_series)
print('ADF Statistic:',results[0])
print('p-value:',results[1])

## <center> Making a time series stationary

- Differencing
- Transformations

### <center> Differencing

<center> Subtracting previous values at a given frequency for each data point

### <center> Lag-1 Differencing

<center> For each data point, subtract the previous data point.

In [None]:
df = pd.DataFrame(time_series, columns=['X'])
df.head()

In [None]:
df['Lag-1'] = df.diff(axis=0, periods=1)
df.head()

In [None]:
df['X'].plot(); plt.show()
df['Lag-1'].plot()

### <center> Lag-n differencing

In [None]:
freq = 5
period = 1
amp = 50
offset = 50
n = 12*10
t = np.linspace(0, period, n)
values = np.sin(2*np.pi*freq*t)*amp+offset
values = [i+j/2 if j%2==0 else i+j for i,j in zip(values,range(len(values)))]
date_vals = pd.date_range(start='1/1/2008', periods=n, freq='M')
time_series =  pd.Series(values, index=date_vals)
time_series.index = pd.DatetimeIndex(time_series.index)
time_series.plot()

In [None]:
time_series.diff(periods=1).plot()

In [None]:
time_series.diff(periods=2).plot()

### <center> ACF and PACF

<center> Plots that give insights on time series in order to making decisions about differencing, transformation, and model selection.

Auto-correlation function (ACF)
- Measures relationship between points and various lags.

Partial auto-correlation function (PACF)
- Measures relationship between points and various lags without the influence of other lags.

In [None]:
time_series.plot()
time_series.shift(1).plot()

In [None]:
time_series.plot()
time_series.shift(7).plot()

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

In [None]:
plot_acf(time_series); plt.xlim(0,24); plt.show()
plot_pacf(time_series); plt.xlim(0,24); plt.ylim(-1,1);plt.show()

In [None]:
freq = 5
period = 1
amp = 50
offset = 50
n = 12*10
t = np.linspace(0, period, n)
values = np.sin(2*np.pi*freq*t)*amp+offset
values = [i+j/2 if j%5==0 else i+j*1.5 if j%3 else i+j for i,j in zip(values,range(len(values)))]
date_vals = pd.date_range(start='1/1/2008', periods=n, freq='M')
time_series =  pd.Series(values, index=date_vals)
time_series.index = pd.DatetimeIndex(time_series.index)
time_series.plot()

In [None]:
plot_acf(time_series); plt.xlim(0,24); plt.show()
plot_pacf(time_series); plt.xlim(0,24); plt.ylim(-1,1);plt.show()

In [None]:
time_series.diff(periods=3).plot()

In [None]:
plot_acf(time_series.diff(periods=3).bfill()); plt.xlim(0,24); plt.show()
plot_pacf(time_series.diff(periods=3).bfill()); plt.xlim(0,24); plt.ylim(-1,1);plt.show()

In [None]:
time_series.diff(periods=3).plot()

### <center> Transformations

In [None]:
freq = 10
period = 1
amp = 50
offset = 50
n = 12*10
t = np.linspace(0, period, n)
values = np.sin(2*np.pi*freq*t)*amp+offset
values = [i*np.random.rand()+2.5*(j/12)**2 for i,j in zip(values,range(len(values)))]
date_vals = pd.date_range(start='1/1/2008', periods=n, freq='M')
time_series =  pd.Series(values, index=date_vals)
time_series.index = pd.DatetimeIndex(time_series.index)
time_series.plot()

### <center> Rolling Mean

In [None]:
time_series.plot()
time_series.rolling(window=12).mean().plot()

In [None]:
np.log(time_series).plot()
np.log(time_series).rolling(window=12).mean().plot()

In [None]:
np.sqrt(time_series).plot()
np.sqrt(time_series).rolling(window=12).mean().plot()

In [None]:
time_series_transformed = np.sqrt(time_series) - np.sqrt(time_series).rolling(window=12).mean()
time_series_transformed.plot()

In [None]:
results = adfuller(time_series_transformed.bfill())
print('ADF Statistic:',results[0])
print('p-value:',results[1])

### <center> Seasonal Decomposition

In [None]:
# Import and apply seasonal_decompose()
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(time_series)
# Gather the trend, seasonality, and residuals 
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
# Plot gathered statistics
plt.figure(figsize=(12,8))
plt.subplot(411)
plt.plot(time_series, label='Original', color='blue')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend', color='blue')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality', color='blue')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals', color='blue')
plt.legend(loc='best')
plt.tight_layout()

## <center> Activity

<center> Read in the data from <i>2008_2009_sales.csv</i> and perform a time series analysis on it.

<center> Are there any trends?
    Is the series stationary? Why not?

<center> After answering these questions, use the techniques above to make the series stationary.