<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Problem-Statement" data-toc-modified-id="Problem-Statement-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Problem Statement</a></span></li><li><span><a href="#Prepare" data-toc-modified-id="Prepare-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Prepare</a></span><ul class="toc-item"><li><span><a href="#Load-libraries" data-toc-modified-id="Load-libraries-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Load libraries</a></span></li><li><span><a href="#Load-dataset" data-toc-modified-id="Load-dataset-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Load dataset</a></span></li></ul></li><li><span><a href="#Exploratory-Data-Analysis" data-toc-modified-id="Exploratory-Data-Analysis-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Exploratory Data Analysis</a></span><ul class="toc-item"><li><span><a href="#Descriptive-statistics" data-toc-modified-id="Descriptive-statistics-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Descriptive statistics</a></span></li><li><span><a href="#Data-visualizations" data-toc-modified-id="Data-visualizations-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Data visualizations</a></span><ul class="toc-item"><li><span><a href="#Sales" data-toc-modified-id="Sales-4.2.1"><span class="toc-item-num">4.2.1&nbsp;&nbsp;</span>Sales</a></span></li><li><span><a href="#Stores" data-toc-modified-id="Stores-4.2.2"><span class="toc-item-num">4.2.2&nbsp;&nbsp;</span>Stores</a></span></li><li><span><a href="#Items" data-toc-modified-id="Items-4.2.3"><span class="toc-item-num">4.2.3&nbsp;&nbsp;</span>Items</a></span></li><li><span><a href="#Time-Series" data-toc-modified-id="Time-Series-4.2.4"><span class="toc-item-num">4.2.4&nbsp;&nbsp;</span>Time Series</a></span></li></ul></li></ul></li><li><span><a href="#Evaluate-Models" data-toc-modified-id="Evaluate-Models-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Evaluate Models</a></span><ul class="toc-item"><li><span><a href="#Split-out-validation-dataset" data-toc-modified-id="Split-out-validation-dataset-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Split-out validation dataset</a></span></li><li><span><a href="#Naive-Approach" data-toc-modified-id="Naive-Approach-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Naive Approach</a></span></li><li><span><a href="#Moving-Averages" data-toc-modified-id="Moving-Averages-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>Moving Averages</a></span></li><li><span><a href="#Logarithmic-Sales" data-toc-modified-id="Logarithmic-Sales-5.4"><span class="toc-item-num">5.4&nbsp;&nbsp;</span>Logarithmic Sales</a></span></li><li><span><a href="#Moving-Averages-Log" data-toc-modified-id="Moving-Averages-Log-5.5"><span class="toc-item-num">5.5&nbsp;&nbsp;</span>Moving Averages Log</a></span></li><li><span><a href="#Decompose-Log" data-toc-modified-id="Decompose-Log-5.6"><span class="toc-item-num">5.6&nbsp;&nbsp;</span>Decompose Log</a></span></li></ul></div>

## Introduction

This dataset was obtained from a Kaggle challenge “Store Item Demand Forecasting Challenge.” In this challenge, they want you to predict predict 3 months of sales for 50 different items at 10 different stores.  The data we are provided with contains set 5 years of store-item sales data.

Key factors about this dataset:
- Number of rows: 913000
- Only 3 columns: store, item and sales.
- 50 items
- 10 stores
- Sales are given for each item, store and date (daily)
- Time frame - 2013/01/01 to 2017/12/31
- No missing data

## Problem Statement

The goal of this assignment is to show time series analysis visualizations.

##	Prepare

###	Load libraries

In [None]:
import pandas as pd
from pandas.tseries.holiday import *
import os

import datetime
from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
%matplotlib inline
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split, TimeSeriesSplit
from sklearn.preprocessing import StandardScaler

from itertools import *
import itertools

from sklearn.metrics import mean_squared_error
from math import sqrt
from scipy.stats import norm
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

#importing packages for the prediction of time-series data
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.statespace.sarimax import SARIMAX
import statsmodels.api as sm
import statsmodels.tsa.api as smt
import statsmodels.formula.api as smf
from statsmodels.tsa.stattools import acf  
from statsmodels.tsa.stattools import pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose





###	Load dataset

In [None]:
# First let us load the datasets
train = pd.read_csv("../input/train.csv")
test = pd.read_csv("../input/test.csv")

Make a copy of the data for later use.

In [None]:
train_org = train.copy()
test_org = test.copy()

## Exploratory Data Analysis

###	Descriptive statistics

In [None]:
train.columns, test.columns

The test dataset does not contain an ID column.

In [None]:
train.dtypes, test.dtypes

In [None]:
test.shape, train.shape

Copy date column as Datetime .

In [None]:
train['Datetime'] = pd.to_datetime(train.date)
test['Datetime'] = pd.to_datetime(test.date)
test_org['Datetime'] = pd.to_datetime(test_org.date)
train_org['Datetime'] = pd.to_datetime(train_org.date)

Resample data into daily, weekly, monthly, and quarterly to show sales for those time peroids.

In [None]:
train.timestamp = pd.to_datetime(train.Datetime, format = '%d-%m-%Y %H:%M')
train.index = train.timestamp

#converting to daily mean
daily = train.resample('D').mean()

#converting to weekly mean
weekly = train.resample('W').mean()

#converting to monthly mean
monthly = train.resample('M').mean()

#converting to Quarter mean
quarterly = train.resample('Q').mean()

In [None]:
train.head()

In [None]:
train.describe()

Sales during the 5 year period ranges from 0 to 231, with an average of 52.3.

Break time down into segments. 

In [None]:
for i in (test, train, test_org, train_org):
    i['Year'] = i.Datetime.dt.year
    i['Month'] = i.Datetime.dt.month
    i['day'] = i.Datetime.dt.day
    i["dow"] = i.Datetime.dt.dayofweek


In [None]:
test.tail()

In [None]:
train.head()

Identify which days are weekends. 

In [None]:
temp = train['Datetime']

In [None]:
def applyer(row):
    if row.dayofweek == 5 or row.dayofweek == 6:
        return 1
    else:
        return 0
    
temp2 = train.Datetime.apply(applyer)
train['weekend'] = temp2

In [None]:
train.head(3)
            

###	Data visualizations

#### Sales

In [None]:
fig, ax = plt.subplots(figsize=(20, 10))
ax = sns.distplot(train['sales'], bins=5);

ax.set_ylabel(ylabel='Sales', fontsize=16)
ax.set_title(label='Sales Distributions', fontsize=20)
plt.show();


In [None]:
# Sales distribution across the train data
sales_df = train.copy(deep=True)
sales_df['sales_bins'] = pd.cut(sales_df.sales, [0, 50, 100, 150, 200, 250])

# Total number of data points
total_points = pd.value_counts(sales_df.sales_bins).sum()
print('Sales bucket v/s Total percentage:')
display(pd.value_counts(sales_df.sales_bins).apply(lambda s: (s/total_points)*100))

In [None]:
f, ax = plt.subplots(figsize=(20, 10))
pd.value_counts(sales_df.sales_bins).plot(kind='bar', title='Sales distribution');

In [None]:
weekDay = train.groupby('weekend')['sales'].sum()

total_points = weekDay.sum()
print("Total", total_points)
weekDay.apply(lambda s: (s/total_points)*100)

In [None]:
f, ax = plt.subplots(figsize=(20, 10))
weekDay.index = ['Weekday', 'Weekend']
ax = sns.barplot(x=weekDay.index, y=weekDay.values, 
                label="Total")

ax.set_xlabel(xlabel='Stores', fontsize=16)
ax.set_ylabel(ylabel='Sales', fontsize=16)
ax.set_title(label='Total Sales: Weekday vs Weekend', fontsize=20)
plt.show();

In [None]:
by_weekday = train.groupby(train.index.dayofweek).mean()

x = by_weekday['sales']
f, ax = plt.subplots(figsize=(20, 10))

x.index = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']

ax = sns.barplot(x=x.index, y=x.values, 
                label="Total")

ax.set_xlabel(xlabel='Stores', fontsize=16)
ax.set_ylabel(ylabel='Sales', fontsize=16)
ax.set_title(label='Mean Sales per Day of week', fontsize=20)
plt.show();



Saturday and Sunday have the highest sales and the weekend has 33% of the sales.

In [None]:
by_month = train.groupby(train.index.month).mean()

x = by_month['sales']
f, ax = plt.subplots(figsize=(20, 14))
x.index = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

ax = sns.barplot(x=x.index, y=x.values, 
                label="Total")

ax.set_xlabel(xlabel='Months', fontsize=16)
ax.set_ylabel(ylabel='Sales', fontsize=16)
ax.set_title(label='Mean Sales per Month', fontsize=20)
plt.show();


Sales peak in the summer months, although sales have a burst in November before dropping in December.

In [None]:
by_quarter = train.groupby(train.index.quarter).mean()

x = by_quarter['sales']
f, ax = plt.subplots(figsize=(20, 14))
x.index = ["Q!", "Q2", "Q3", "Q4"]

ax = sns.barplot(x=x.index, y=x.values, 
                label="Total")

ax.set_xlabel(xlabel='Quarter', fontsize=16)
ax.set_ylabel(ylabel='Sales', fontsize=16)
ax.set_title(label='Mean Sales per Quarter', fontsize=20)
plt.show();


In [None]:
by_year = train.groupby(train.index.year).mean()

In [None]:
x = by_year["sales"].pct_change()
print("Mean year on year change", x.mean() * 100)


In [None]:
x = by_year['sales']
f, ax = plt.subplots(figsize=(20, 14))
ax = sns.barplot(x=x.index, y=x.values, 
                label="Total")

ax.set_xlabel(xlabel='Years', fontsize=16)
ax.set_ylabel(ylabel='Sales', fontsize=16)
ax.set_title(label='Mean Sales per Year', fontsize=20)
plt.show();

#### Stores

In [None]:
years =[2013,2014,2015,2016,2017]
length = len(years)
import itertools

ax = plt.figure(figsize=(20,20))
ax.set_facecolor("white")


for i,j in itertools.zip_longest(years,range(length)):
    
    plt.subplot(3,2,j+1)
    temp_1 = train[train.Year == i]
    temp_1 = temp_1.groupby(['store', "Year"], as_index=False).agg({'sales': np.sum})  
    temp_1 = temp_1.sort_values(['sales'],ascending=False).reset_index(drop=True)
    it = temp_1.index

    plt.title(i)
    plt.subplots_adjust(hspace = .3)
    sns.set_color_codes("pastel")
    ax = sns.barplot(x="sales", y=it, data=temp_1, orient='h')
    ax.set_xlabel(xlabel='Sales', fontsize=16)
    ax.set_ylabel(ylabel='Store', fontsize=16)
    ax.set_title(label='Top 10 Stores '+ str(i), fontsize=20)
    ax.set_yticklabels(labels = temp_1['store'], fontsize=14)
    

plt.show()


In [None]:
data = train.groupby(['store']).agg({'sales': np.sum})
data.reset_index(level=0, inplace=True)
# print(data)
data = pd.DataFrame(data.sort_values('sales',ascending=False).reset_index(drop=True))[0:10]

publishers = data.index
plt.figure(figsize=(20,10))
ax = sns.barplot(y = publishers , x = 'sales', data=data, orient='h')
ax.set_xlabel(xlabel='Total Sales', fontsize=16)
ax.set_ylabel(ylabel='Stores', fontsize=16)
ax.set_title(label='Top Stores', fontsize=20)
ax.set_yticklabels(labels = data['store'], fontsize=14)
plt.show()

In [None]:
table = train.pivot_table('sales', index='store', columns='Year', aggfunc='sum')

publishers = table.idxmax()
sales = table.max()
years = table.columns.astype(int)
data = pd.concat([publishers, sales], axis=1)
data.columns = ['Store', 'Sales']

plt.figure(figsize=(12,8))
ax = sns.pointplot(y = 'Sales', x = years, hue='Store', data=data, size=15)
ax.set_xlabel(xlabel='Year', fontsize=16)
ax.set_ylabel(ylabel='Store Sales Per Year', fontsize=16)
ax.set_title(label='Best Store - Sales Per Year', fontsize=20)
ax.set_xticklabels(labels = years, fontsize=12, rotation=50)
plt.show()

Store 2 is the store with the most sales and it's year on year sales have increased for the past 5 years. 

#### Items

In [None]:
years =[2013,2014,2015,2016,2017]
length = len(years)
import itertools

ax = plt.figure(figsize=(20,20))
ax.set_facecolor("white")

# df_ts.groupby('store', "month").agg({'sales':{'Mean': np.mean, 'Sum': np.sum}})


# print(temp_1)

for i,j in itertools.zip_longest(years,range(length)):
    
    plt.subplot(3,2,j+1)
    temp_1 = train[train.Year == i].groupby(['item', "Year"], as_index=False).agg({'sales': np.sum})

    temp_1 = temp_1[temp_1.Year == i]
    temp_1 = temp_1.sort_values(['sales'],ascending=False).reset_index(drop=True)[0:10]
    it = temp_1.index

    plt.title(i)
    plt.subplots_adjust(hspace = .3)
    sns.set_color_codes("pastel")
    ax = sns.barplot(x="sales", y= temp_1.index, data=temp_1, orient='h')
    ax.set_xlabel(xlabel='Sales', fontsize=16)
    ax.set_ylabel(ylabel='Item', fontsize=16)
    ax.set_title(label='Top 10 Items '+ str(i), fontsize=20)
    ax.set_yticklabels(labels = temp_1['item'], fontsize=14)
    

plt.show()


In [None]:
data = train.groupby(['item']).agg({'sales': np.sum})
data.reset_index(level=0, inplace=True)
# print(data)
data = pd.DataFrame(data.sort_values('sales',ascending=False).reset_index(drop=True))[0:10]
publishers = data.index

colors = sns.color_palette("spring", len(data))
plt.figure(figsize=(12,8))
ax = sns.barplot(y = publishers , x = 'sales', data=data, orient='h', palette=colors)
ax.set_xlabel(xlabel='Sales', fontsize=16)
ax.set_ylabel(ylabel='Item', fontsize=16)
ax.set_title(label='Top 10 Items', fontsize=20)
ax.set_yticklabels(labels = data['item'], fontsize=14)
plt.show()

In [None]:
table = train.pivot_table('sales', index='item', columns='Year', aggfunc='sum')

publishers = table.idxmax()
sales = table.max()
years = table.columns.astype(int)
data = pd.concat([publishers, sales], axis=1)
data.columns = ['Item', 'Sales']

plt.figure(figsize=(12,8))
ax = sns.pointplot(y = 'Sales', x = years, hue='Item', data=data, size=15)
ax.set_xlabel(xlabel='Year', fontsize=16)
ax.set_ylabel(ylabel='Item Sales Per Year', fontsize=16)
ax.set_title(label='Best Selling Item - Sales Per Year', fontsize=20)
ax.set_xticklabels(labels = years, fontsize=12, rotation=50)
plt.show()

Item 15 is the best selling item and it's year on year sales have increased for the past 5 years. 

#### Time Series


In [None]:
fig, axs = plt.subplots(4,1)
daily.sales.plot(figsize = (20,14), title = 'Daily', fontsize = 14, ax = axs[0])
weekly.sales.plot(figsize = (20,14), title = 'Weekly', fontsize = 14, ax = axs[1])
monthly.sales.plot(figsize = (20,14), title = 'Monthly', fontsize = 14, ax = axs[2])
quarterly.sales.plot(figsize = (20,14), title = 'Quarterly', fontsize = 14, ax = axs[3])

plt.show()

Use the mean daily sales to show in the future.

In [None]:
test.Timestamp = pd.to_datetime(test.Datetime,format='%d-%m-%Y %H:%M') 
test.index = test.Timestamp 

# Converting to daily mean
test = test.resample('D').mean()

train.Timestamp = pd.to_datetime(train.Datetime,format='%d-%m-%Y %H:%M') 
train.index = train.Timestamp

# Converting to daily mean
train = train.resample('D').mean()

## Evaluate Models

### Split-out validation dataset

In [None]:
# Splitting train and validation data
Train = train.ix['2013-01-01':'2016-12-30']
valid = train['2017-01-01':'2017-12-31']

In [None]:
Train.sales.plot(figsize=(20,10), title= 'Daily Sales', fontsize=14, label='train')
valid.sales.plot(figsize=(20,10), title= 'Daily Sales', fontsize=14, label='valid')
plt.xlabel("Datetime")
plt.ylabel("Sales")
plt.legend(loc='best')
plt.show()

### Naive Approach

In [None]:
dd = np.asarray(Train.sales)
y_hat = valid.copy()
y_hat['naive'] = dd[len(dd)-1]
plt.figure(figsize = (20,10))
plt.plot(Train.index, Train['sales'], label = 'Train')
plt.plot(valid.index,valid['sales'], label='Valid')
plt.plot(y_hat.index,y_hat['naive'], label='Naive Forecast')
plt.legend(loc='best')
plt.title("Naive Forecast")
plt.show()

In [None]:
#checking the accruacy with RMSE for Naive Approach
from sklearn.metrics import mean_squared_error
from math import sqrt
rms = sqrt(mean_squared_error(valid.sales, y_hat.naive))
print(rms)

### Moving Averages

In [None]:
# Moving average of last 10 observations
y_hat_avg = valid.copy()
y_hat_avg['moving_avg_forecast'] = Train['sales'].rolling(10).mean().iloc[-1] 
plt.figure(figsize=(15,5)) 
plt.plot(Train['sales'], label='Train')
plt.plot(valid['sales'], label='Valid')
plt.plot(y_hat_avg['moving_avg_forecast'], label='Moving Average Forecast using 10 observations')
plt.legend(loc='best')
plt.show()

#Moving average of last 20 observations
y_hat_avg = valid.copy()
y_hat_avg['moving_avg_forecast'] = Train['sales'].rolling(20).mean().iloc[-1]
plt.figure(figsize=(15,5))
plt.plot(Train['sales'], label='Train')
plt.plot(valid['sales'], label='Valid')
plt.plot(y_hat_avg['moving_avg_forecast'], label='Moving Average Forecast using 20 observations')
plt.legend(loc='best')
plt.show()

#Moving average of last 50 observations
y_hat_avg = valid.copy()
y_hat_avg['moving_avg_forecast'] = Train['sales'].rolling(50).mean().iloc[-1]
plt.figure(figsize=(15,5))
plt.plot(Train['sales'], label='Train')
plt.plot(valid['sales'], label='Valid')
plt.plot(y_hat_avg['moving_avg_forecast'], label='Moving Average Forecast using 50 observations')
plt.legend(loc='best')
plt.show()

### Seasonal Decompose

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
decomposition = seasonal_decompose(pd.DataFrame(Train["sales"]), freq = 12)

trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
fig, ax = plt.subplots(figsize=(20, 14))

plt.subplot(411)
plt.plot(Train["sales"], label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()
plt.show()

In [None]:
def test_stationarity(timeseries):
    
    #Determing rolling statistics
    rolmean = timeseries.rolling(12).mean()
    rolstd = timeseries.rolling(12).mean()
    #Change window depending on time frame of data


    #Plot rolling statistics:
    plt.figure(figsize = (16,8))
    orig = plt.plot(timeseries, color='blue',label='Original')
    mean = plt.plot(rolmean, color='red', label='Rolling Mean')
    std = plt.plot(rolstd, color='black', label = 'Rolling Std')
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show(block=False)
    
    #Perform Dickey-Fuller test:
    print ('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print (dfoutput)
    
test_stationarity(train["sales"])



The test stats show that the data is stationary since the p-value is less than 0.05

### Logarithmic Sales

In [None]:
Train["sales_log"] = np.log(Train['sales'])
valid["sales_log"]  = np.log(Train['sales'])


In [None]:
Train.head()

In [None]:
fig, ax = plt.subplots(figsize=(20, 10))
ax = sns.distplot(Train["sales_log"]);

# ax.set_xlabel(xlabel='Stores', fontsize=16)
ax.set_ylabel(ylabel='Log Sales', fontsize=16)
ax.set_title(label='Log Sales Distributions', fontsize=20)
# ax.set_xticklabels(labels = years)
plt.show();


### Moving Averages Log

In [None]:
fig, ax = plt.subplots(figsize=(20, 10))

moving_avg = Train["sales_log"].rolling(12).mean()
plt.plot(Train["sales_log"])
plt.plot(moving_avg, color = 'red')
plt.show()

In [None]:
train_log_moving_avg_diff = Train["sales_log"] - moving_avg

In [None]:
train_log_moving_avg_diff.dropna(inplace = True)

In [None]:
test_stationarity(train_log_moving_avg_diff)

In [None]:
train_log_diff = Train["sales_log"] - Train["sales_log"].shift(1)
test_stationarity(train_log_diff.dropna())

### Decompose Log

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
train_log_decompose = seasonal_decompose(pd.DataFrame(Train["sales_log"]), freq = 12)

trend = train_log_decompose.trend
seasonal = train_log_decompose.seasonal
residual = train_log_decompose.resid
fig, ax = plt.subplots(figsize=(20, 14))

plt.subplot(411)
plt.plot(Train["sales_log"], label='Original')
plt.legend(loc='best')
plt.subplot(412)
plt.plot(trend, label='Trend')
plt.legend(loc='best')
plt.subplot(413)
plt.plot(seasonal,label='Seasonality')
plt.legend(loc='best')
plt.subplot(414)
plt.plot(residual, label='Residuals')
plt.legend(loc='best')
plt.tight_layout()
plt.show()

### Auto Correlation

In [None]:
from statsmodels.tsa.stattools import acf, pacf
lag_acf = acf(train_log_diff.dropna(), nlags=25)
lag_pacf = pacf(train_log_diff.dropna(), nlags=25, method='ols')

In [None]:
fig, ax = plt.subplots(figsize=(20, 14))
plt.plot(lag_acf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(train_log_diff.dropna())),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(train_log_diff.dropna())),linestyle='--',color='gray')
plt.title('Autocorrelation Function')
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(20, 14))
plt.plot(lag_pacf)
plt.axhline(y=0,linestyle='--',color='gray')
plt.axhline(y=-1.96/np.sqrt(len(train_log_diff.dropna())),linestyle='--',color='gray')
plt.axhline(y=1.96/np.sqrt(len(train_log_diff.dropna())),linestyle='--',color='gray')
plt.title('Partial Autocorrelation Function')
plt.show()

### Credits

- https://www.datahubbs.com/towards-machine-learning-in-supply-chain-forecasting-part-2/
- https://www.datahubbs.com/forecasting-with-seasonality
- https://github.com/nishanthgampa/Time-Series-Analysis-on-Transportation-Data
- https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3
- http://www.blackarbs.com/blog/time-series-analysis-in-python-linear-models-to-garch/11/1/2016
- https://www.quantstart.com/articles#time-series-analysis
- https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
- https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3