## Import libraries and data

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller, kpss, acf, grangercausalitytests
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf,month_plot,quarter_plot
from scipy import signal
import matplotlib.pyplot as plt
import seaborn as sns 
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline 
sns.set_style("whitegrid")
plt.rc('xtick', labelsize=15) 
plt.rc('ytick', labelsize=15) 

In [None]:
d = pd.read_csv('../input/m5-data-for-tsa/data_for_tsa.csv')
d.head()

I prepared the data for just one product of the M5 competition, the 'FOODS_3_586_TX_2_validation'

## Time Plot
For time series data, the obvious graph to start with is a time plot. That is, the observations are plotted against the time of observation, with consecutive observations joined by straight lines.
In these plots we can see:
- Strange values, like outliers and values which need to be explained because they differ from the seasonality or trend.
- Periods of missing observations.
- Fluctuations of the data. For example in salinity, there is a fluctuation which increases in 2008.
- The trend, seasonality and cyclic behavior. For example, the temperature had strong seasonality and no trend or cyclic behavior.

In [None]:
fig, ax = plt.subplots(figsize=(15, 6))
ax.set_title('Demand over Time', fontsize = 20, loc='center', fontdict=dict(weight='bold'))
ax.set_xlabel('Time', fontsize = 16, fontdict=dict(weight='bold'))
ax.set_ylabel('Demand', fontsize = 16, fontdict=dict(weight='bold'))
plt.tick_params(axis='y', which='major', labelsize=16);plt.tick_params(axis='x', which='major', labelsize=16)
d.plot(x='date',y='demand',figsize=(15,5),ax=ax);

This product is being sold every day in the timespan. It is a very popular product.

## Seasonal Plot and Box Plots
A seasonal plot is similar to a time plot except that the data are plotted against the individual “seasons” in which the data were observed.

In [None]:
variable = 'demand'
fig, ax = plt.subplots(figsize=(15, 6))

palette = sns.color_palette("colorblind", 6)
sns.lineplot(d['month'], d[variable], hue=d['year'], palette=palette)
ax.set_title('Seasonal plot of demand', fontsize = 20, loc='center', fontdict=dict(weight='bold'))
ax.set_xlabel('Month', fontsize = 16, fontdict=dict(weight='bold'));ax.set_ylabel('Demand', fontsize = 16, fontdict=dict(weight='bold'))

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 6))

sns.boxplot(d['year'], d[variable], ax=ax[0])
ax[0].set_title('Year-wise Box Plot\n(The Trend)', fontsize = 20, loc='center', fontdict=dict(weight='bold'))
ax[0].set_xlabel('Year', fontsize = 16, fontdict=dict(weight='bold'));ax[0].set_ylabel('Demand', fontsize = 16, fontdict=dict(weight='bold'))

sns.boxplot(d['month'], d[variable], ax=ax[1])
ax[1].set_title('Month-wise Box Plot\n(The Seasonality)', fontsize = 20, loc='center', fontdict=dict(weight='bold'))
ax[1].set_xlabel('Month', fontsize = 16, fontdict=dict(weight='bold'));ax[1].set_ylabel('Demand', fontsize = 16, fontdict=dict(weight='bold'));

In the seasonal plot we can instantly see:
- More clearly the seasonal pattern if it exists.
- Identify the years in which the pattern changes.
- Identify large jumps or drops.

In the trend and seasonality plots we can see:
- More clearly the trend and the seasonality.
- Years or months with outliers.
- Compare years or months easier.

As year pass by, the demand is decreasing. There is some seasonality of the product with months 2, 8 and 12 being the local maximums.

## Decomposition
Time series data can exhibit a variety of patterns, and it is often helpful to split a time series into several components, each representing an underlying pattern category. When we decompose a time series into components, we usually combine the trend and cycle into a single trend-cycle component (sometimes called the trend for simplicity). Thus we think of a time series comprising three components: a trend-cycle component, a seasonal component, and a remainder component (containing anything else in the time series).

In [None]:
from pylab import rcParams
rcParams['figure.figsize'] = 15, 12
rcParams['axes.labelsize'] = 20
rcParams['ytick.labelsize'] = 16
rcParams['xtick.labelsize'] = 16

y = d[['date','demand']].set_index('date')
y = y.asfreq('d')
decomposition = sm.tsa.seasonal_decompose(y, model='additive')
decomp = decomposition.plot()
decomp.suptitle('Demand Decomposition', fontsize=22);

## Stationarity
A stationary time series is one whose properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary - the trend and seasonality will affect the value of the time series at different times. On the other hand, a white noise series is stationary - it does not matter when you observe it, it should look much the same at any point in time.

In [None]:
from statsmodels.tsa.stattools import adfuller
# check for stationarity
def adf_test(series,title=''):
    """
    Pass in a time series and an optional title, returns an ADF report
    """
    print('Augmented Dickey-Fuller Test: {}'.format(title))
    result = adfuller(series.dropna(),autolag='AIC') # .dropna() handles differenced data
    
    labels = ['ADF test statistic','p-value','# lags used','# observations']
    out = pd.Series(result[0:4],index=labels)

    for key,val in result[4].items():
        out['critical value ({})'.format(key)]=val
        
    print(out.to_string())          # .to_string() removes the line "dtype: float64"
    
    if result[1] <= 0.05:
        print("Strong evidence against the null hypothesis")
        print("Reject the null hypothesis")
        print("Data has no unit root and is stationary")
    else:
        print("Weak evidence against the null hypothesis")
        print("Fail to reject the null hypothesis")
        print("Data has a unit root and is non-stationary")

In [None]:
adf_test(d[['date','demand']]['demand'],title='Demand')

In the ADF test, the null hypothesis is the time series possesses a unit root and is non-stationary. So because the P-Value is <0.05 we reject the null hypothesis.

The series is not stationary. In order to make a time series stationary we can difference the series once or more times (subtracting the next value by the current value)

In [None]:
from statsmodels.tsa.statespace.tools import diff

fig, ax = plt.subplots(nrows=2, ncols=2,figsize=(15, 11))

d['demand_Diff1'] = diff(d['demand'],k_diff=1)
d['demand_Diff2'] = diff(d['demand'],k_diff=2)
d['demand_Diff3'] = diff(d['demand'],k_diff=3)

d['demand'].plot(title="Initial Data",ax=ax[0][0]).autoscale(axis='x',tight=True);
d['demand_Diff1'].plot(title="First Difference Data",ax=ax[0][1]).autoscale(axis='x',tight=True);
d['demand_Diff2'].plot(title="Second Difference Data",ax=ax[1][0]).autoscale(axis='x',tight=True);
d['demand_Diff3'].plot(title="Third Difference Data",ax=ax[1][1]).autoscale(axis='x',tight=True);

In [None]:
adf_test(d[['date','demand_Diff1']]['demand_Diff1'],title='demand_Diff1')

One difference was enough to make the series stationary.

## Lag Scatter Plot

In [None]:
from pandas.plotting import lag_plot
lag_plot(d['demand']);

A plot to observe the relationship between each observation and its lag. There is a positive correlation, but it is not very strong as the diagonial line is not very tight.

## Autocorrelation

In [None]:
fig, ax = plt.subplots(nrows=1, ncols=2,figsize=(15, 6))
autocorr = acf(d['demand'], nlags=30, fft=False) # just the numbers
plot_acf(d['demand'].tolist(), lags=30, ax=ax[0]); # just the plot
plot_pacf(d['demand'].tolist(), lags=30, ax=ax[1]); # just the plot

There is correlation between a value and the value 7 days earlier.

Thank you for reading my kernel!

Want to learn more about Time Series Analysis?
Check my towardsdatascience posts:
- [Time Series Analysis with Theory, Plots, and Code Part 1](https://towardsdatascience.com/time-series-analysis-with-theory-plots-and-code-part-1-dd3ea417d8c4)
- [Time Series Analysis with Theory, Plots, and Code Part 2](https://towardsdatascience.com/time-series-analysis-with-theory-plots-and-code-part-2-c72b447da634)