# Retail Giant Sales Forecasting Assignment

## Problem Statement:
Onlilne supergiant store, 'Global Mart' has worldwide operations - taking orders and delivering across the globe. Its major product categories are Consumer, Corporate and Home Office.
As a Sales Manager, we are expected to forecast the sales of products for the next 6 months - this will help managing inventory and business processes accordingly - for the combination of Market and Segment which is profitable with least variation in Profits.

## Dataset Available:
We have been provided with 4 year's sales data with following columns:
1. Order Date: Represents the date (January 2011 to December 2014) on which the order was placed
2. Segment: The segment to which the product belongs. It has 3 categories:
    a. Consumer
    b. Corporate
    c. Home Office
3. Market: The market to which the customer belongs. It has 7 categories:
    a. Africa
    b. APAC (for Asia Pacific)
    c. Canada
    d. EMEA (for Middle East)
    e. EU (for European Union)
    f. LATAM (for Latin America)
    g. US (United States)
4. Sales: Total sales value of the order/ transaction
5. Profit: Total profit made on the transaction

### Importing Required Packages
We would be primarily using pandas, numpy, seaborn, matplotlib for various calculations.
We might import other packages as the need arises

In [None]:
import pandas as pd 
pd.set_option('display.max_colwidth', -1)

import numpy as np 

import seaborn as sns

import matplotlib.pyplot as plt
from matplotlib.pyplot import xticks
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

### Importing the Dataset
We will create a DataFrame, 'retail' to import the dataset, which is in csv format as of now.

In [None]:
retail = pd.read_csv('../input/globalsuperstoredata/GlobalSuperstoreData.csv', parse_dates=['Order Date'], dayfirst=True)
retail.head()

#### Let's do some basic checks and try to understand the various parameters/ qualities of the dataset we have imported

In [None]:
retail.describe()

In [None]:
retail.info()

In [None]:
retail.shape

Checking the range of dates for the dataset

In [None]:
from datetime import datetime
print("The dataset is with transactions between ", datetime.date(min(retail['Order Date'])),
      " and ", datetime.date(max(retail['Order Date'])))

#### Let's perform some basic Data Cleaning

### Checking columns with Null Values

In [None]:
def null_percentage():
    null_cols = [col for col in retail.columns if retail[col].isnull().sum()>0]
    null_retail = pd.DataFrame(round(100*retail[null_cols].isnull().mean(),2).sort_values(ascending = False))
    null_retail.columns = ["Null Percentage"]
    null_retail.sort_values(by = "Null Percentage", inplace = True, ascending = False)
    return null_retail

In [None]:
null_percentage()

Apparently, there are no columns which have null values, hence we do not need to impute missing data

In [None]:
retail.describe(include = 'object')

The data seems to be well spread with 3 Segments and 7 Markets. Segment 'Consumer' and Market 'APAC' seem to be the leading categories

Let's plot the Segments and Categories and check for their spread

In [None]:
plt.figure(figsize = (16,4))
ax = sns.countplot(x = 'Segment', data = retail, order = retail.Segment.value_counts().index, palette="Set2")
xticks(rotation = 90)
for p in ax.patches:
    ax.annotate('{}'.format(p.get_height()), (p.get_x()+0.35, p.get_height()+50))
plt.show()

There are 3 Segments in which Global Mart classifies its products into

The highest number of transactions are for Consumer products, followed by Corporate and lowest for Home Office

In [None]:
plt.figure(figsize = (16,4))
ax = sns.countplot(x = 'Market', data = retail, order = retail.Market.value_counts().index, palette="Set2")
xticks(rotation = 90)
for p in ax.patches:
    ax.annotate('{}'.format(p.get_height()), (p.get_x()+0.3, p.get_height()+50))
plt.show()

There are 7 Markets to which Global Mart caters to

The highest number of transactions are from APAC, followed by LATAM, EU and US

Canada has lowest number of transactions

#### Let's check the distribution of Sales and Profit

In [None]:
plt.figure(figsize = (25,4))
num_cols = ["Sales","Profit"]

for i in enumerate(num_cols):
    plt.subplot(1,3,i[0]+1)
    sns.distplot(retail[i[1]], hist = False, kde = True, color="blue")

Sales is right skewed, majority transactions centered around 1,000, with a few worth more than 5,000. 

Profit seems to be normally distributed with lots of variations on either side of the mean (0).

In [None]:
retail['Sales'].plot(figsize=(16, 4))
plt.legend(loc='best')
plt.title('Sales')
plt.show(block=False)

In [None]:
fig = retail['Sales'].hist(figsize = (16, 4))
plt.show(block=False)

#### Let's also check for outliers

In [None]:
fig = plt.subplots(figsize = (16, 4))
ax = sns.boxplot(x = retail['Sales'], whis = 1.5)
plt.show(block=False)

In [None]:
fig = plt.subplots(figsize = (16, 4))
ax = sns.boxplot(x = retail['Profit'], whis = 1.5)
plt.show(block=False)

There are lots of outliers for Sales data on the higher end whereas Profit has outliers on both the directions (Profit can take a negative value whereas Sales can not). Let's also look at the quantile distribution of the Sales and Profit.

In [None]:
retail.describe(include = 'number', percentiles = [0.0,0.25,0.5,0.75,0.9,0.95,0.99,1.0])

We observe that there is a huge gap between 99th and 100th percentile. Considering this is real data and such huge sales are indeed possible, we chose not to remove/ limit the outliers.

Generally, the Price of an item can not determine its Profit, i.e. a product with lower Price may command a higher Profit depending on the costs involved. Let's validate this using correlation.

In [None]:
retail.corr()

In [None]:
plt.figure(figsize = (8, 8))
sns.heatmap(retail.corr(), annot = True)
plt.show()

At this stage, the dataset seems to be in a good shape. Let's now convert the Order Date into Monthly format so that we can create aggregated Sales and Profit data.

In [None]:
retail['Order Date'] = pd.to_datetime(retail['Order Date']).dt.to_period('M')
retail = retail.sort_values(by = ['Order Date'])

#### Let's create a new DataFrame which includes a combination of Market and Segment. This will lead us to 21 newly created categories under Market-Segment, comprsing of all the combinations of Market with Segment.

In [None]:
retail = retail.copy()
retail['Market-Segment'] = retail['Market'] + "-" + retail['Segment']
retail.head()

We will create a pivot with each of these 21 'Market-Segment' categories.

In [None]:
retail_new = retail.pivot_table(index = 'Order Date', values = 'Profit', columns = 'Market-Segment', aggfunc = 'sum')
retail_new

We will divide the dataset into Train and Test dataset.

In [None]:
train_len = 42

In [None]:
retail_new_Train = retail_new[0:train_len]
retail_new_Train.head()

#### Let's look at all the 'Markets' & 'Segments' again, along with the newly created 'Market-Segments'

In [None]:
plt.figure(figsize=(16,4))
ax = sns.countplot(retail['Market'], order = retail.Market.value_counts().index, palette="Set2")
plt.title('Regional Markets (7)')
for p in ax.patches:
    ax.annotate('{}'.format(p.get_height()), (p.get_x()+0.3, p.get_height()+50))
plt.show()

There are 7 Markets to which Global Mart caters to

The highest number of transactions are from APAC, followed by LATAM, EU and US

Canada has lowest number of transactions

In [None]:
plt.figure(figsize=(16,4))
ax = sns.countplot(retail['Segment'], order = retail.Segment.value_counts().index, palette="Set2")
plt.title('Customer Segments (3)')
for p in ax.patches:
    ax.annotate('{}'.format(p.get_height()), (p.get_x()+0.35, p.get_height()+50))
plt.show()

There are 3 Segments in which Global Mart classifies its products into

The highest number of transactions are for Consumer products, followed by Corporate and lowest for Home Office

In [None]:
plt.figure(figsize=(16,4))
ax = sns.countplot(retail['Market-Segment'], order = retail['Market-Segment'].value_counts().index, palette="Set2")
plt.title('Market-Segments (21)')
plt.xticks(rotation = 90)
for p in ax.patches:
    ax.annotate('{}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()

We observe that there are a total of 21 Market-Segments with all the combinations of Market and Segment. The highest number of transactions are from APAC-Consumer, followed by LATAM-Consumer, US-Consumer and EU-Consumer. Canada seems to be have least number of transactions across Segments.

### Finding the most consitently profitable Market-Segment using CoV

Before proceeding further with the model, let's calculate Coefficient of Variation (CoV) on Profit for each of these 21 Market-Segments (on Train data). The CoV is a relative measure of dispersion and assesss the degree of dispersion of a data relative to its mean. It is calculated as ratio of Standard Deviation of the distribution with its Mean.

CoV = (Standard Deviation) / Mean

usually represented in percentage

In [None]:
mean=round(np.mean(retail_new_Train),1)
std= round(np.std(retail_new_Train),1)

CoV_df= pd.DataFrame(mean)
CoV_df['Standard Deviation']= std
CoV_df['CoV'] = round(std/mean*100,1)

CoV_df= CoV_df.reset_index()
CoV_df.columns= ['Market-Segment', 'Mean', 'Std', 'CoV (%)']
CoV_df.sort_values(by='CoV (%)', ascending= True, inplace = True)
CoV_df

#### Considering we are focussing on only one Market-Segment which is profitable with least variation in Profits, we chose APAC-Consumer. The reason we prefer lowest CoV is because we want to estimate sales for the most consistently profitable Market-Segment, so that our sales forecasts are reliable.

### Let's extract APAC_Consumer as a separate DataFrame and perform further steps on this newly create DataFrame

In [None]:
APAC_Consumer = retail[retail['Market-Segment'] == 'APAC-Consumer']
APAC_Consumer = APAC_Consumer.drop(columns = ['Segment', 'Market', 'Market-Segment'])
APAC_Consumer.set_index('Order Date')

APAC_Consumer.head()

Let's aggregate Sales and Profit by month, we will get data for 48 months (4 years).

In [None]:
APAC_Consumer = APAC_Consumer.groupby(APAC_Consumer['Order Date']).sum()
APAC_Consumer

In [None]:
APAC_Consumer.shape

#### Let's split this DataFrame into Train and Test datasets. Train dataset with 42 months' data and Test with the rest 6 months' data.

In [None]:
train_len = 42

In [None]:
APAC_Consumer_Train = APAC_Consumer[0:train_len]
APAC_Consumer_Test = APAC_Consumer[train_len:]

In [None]:
APAC_Consumer_Train = APAC_Consumer_Train.reset_index()
APAC_Consumer_Train.head()

Before proceeding further, we need to convert data type of Order Data to timestamp so that the column can be used effectively.

In [None]:
APAC_Consumer_Train['Order Date'] = APAC_Consumer_Train['Order Date'].apply(lambda x: x.to_timestamp())
APAC_Consumer_Train['Order Date'].dtype

In [None]:
APAC_Consumer_Train = APAC_Consumer_Train.set_index(['Order Date'])
APAC_Consumer_Train.head()

## Time Series Decomposition

We will now decompose the Time Series using Additive as well as Multiplicative methods. This will help us breakdown the series and observe Trend as well as Seasonality, if any.

In [None]:
x = list(APAC_Consumer_Train.Sales)

In [None]:
from pylab import rcParams
import statsmodels.api as sm
rcParams['figure.figsize'] = 16, 8
decomposition = sm.tsa.seasonal_decompose(x, model='additive', period = 12) # additive seasonal index
fig = decomposition.plot()
plt.show()

There is an upward trend as well as seasonality (12 months) in the dataset

The residuals seem to have some pattern – we perform Multiplicative decomposition

In [None]:
from pylab import rcParams
import statsmodels.api as sm
rcParams['figure.figsize'] = 16, 8
decomposition = sm.tsa.seasonal_decompose(x, model='multiplicative', period = 12) # additive seasonal index
fig = decomposition.plot()
plt.show()

There is an upward trend as well as seasonality (12 months) in the dataset

The Residuals now seem to be randomly distributed

### Choosing the Right Time Series Method

In [None]:
# from PIL import Image
# import requests
# from io import BytesIO

# response = requests.get('https://miro.medium.com/max/700/1*G0rTkadf_010ewO7mhOiuA.png')
# img = Image.open(BytesIO(response.content))
# img

## Observations based on the dataset and above flowchart:
1. Number of data points is more than 10, hence we will utilize Exponential Smoothing or ARIMA and we should NOT utilize Simple Moving Average or Naive Methods.
2. We observe that there is an upward trend in the dataset
3. We also observe Seasonality (at 12 months) in the dataset
4. Exponential Smoothing: We should use Simple Exponential Smoothing, Holt's Exponential Smoothing, Holt Winter's Smoothing techniques among others.
5. We shoule also look at ARIMA: There is an Upward Trend in the dataset, we can use ARIMA techniques. As we can observe there is seasonality as well, we can use SARIMA. However, we don't have any information on Exogenous variables, hence we would NOT use ARIMAX or SARIMAX.

We need to convert datatype of Order Date in Test series as well (as we did for the Train dataset)

In [None]:
APAC_Consumer_Test = APAC_Consumer_Test.reset_index()
APAC_Consumer_Test['Order Date'] = APAC_Consumer_Test['Order Date'].apply(lambda x: x.to_timestamp())
APAC_Consumer_Test = APAC_Consumer_Test.set_index(['Order Date'])
APAC_Consumer_Test.head()

## Building and Evaluating Time Series Forecasts

## Smoothing Techniques

With more than 10 observations depicting a clear upward Trend and Seasonality, we can use following Smoothing techniques:
1. Simple Exponential Smoothing
2. Holt’s Exponential Smoothing
3. Holt Winter’s Additive Smoothing
4. Holt Winter’s Multiplicative Smoothing

### 1. Simple Exponential Smoothing

In [None]:
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
APAC_Consumer_SES = APAC_Consumer_Test.copy()
model = SimpleExpSmoothing(APAC_Consumer_Train['Sales'])
model_fit = model.fit(optimized = True)
model_fit.params
APAC_Consumer_SES['Sales_Forecast'] = model_fit.forecast(6)

#### Plotting the Train, Test and Forecasts

In [None]:
plt.figure(figsize=(16,4))
plt.plot(APAC_Consumer_Train['Sales'], label='Train')
plt.plot(APAC_Consumer_Test['Sales'], label='Test')
plt.plot(APAC_Consumer_SES['Sales_Forecast'], label='Simple exponential smoothing forecast')
plt.legend(loc='best')
plt.title('Simple Exponential Smoothing Method')
plt.show()

#### Calculating Errors in Forecasted series, using RMSE and MAPE

In [None]:
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(APAC_Consumer_Test['Sales'], APAC_Consumer_SES['Sales_Forecast'])).round(2)
mape = np.round(np.mean(np.abs(APAC_Consumer_Test['Sales']-APAC_Consumer_SES['Sales_Forecast'])/APAC_Consumer_Test['Sales'])*100,2)
results = pd.DataFrame({'Method':['Simple Exponential Method'], 'RMSE': [rmse],'MAPE': [mape]})
results = results[['Method', 'RMSE', 'MAPE']]
results

We observe that Simple Exponential Smoothing gives us a flat estimate for the next six months and does not take care of movements within the months.

The RMSE is 22,992 and MAPE is 27.7

## 2. Holt's Exponential Smoothing

### 2.1 Holt's Exponential Smoothing with Trend

In [None]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing
APAC_Consumer_HES = APAC_Consumer_Test.copy()
model = ExponentialSmoothing(np.asarray(APAC_Consumer_Train['Sales']) ,seasonal_periods=12 ,trend='additive', seasonal=None)
model_fit = model.fit(optimized=True)
print(model_fit.params)
APAC_Consumer_HES['Sales_Forecast'] = model_fit.forecast(6)

#### Plotting the Train, Test and Forecasts

In [None]:
plt.figure(figsize=(16,4))
plt.plot(APAC_Consumer_Train['Sales'], label='Train')
plt.plot(APAC_Consumer_Test['Sales'], label='Test')
plt.plot(APAC_Consumer_HES['Sales_Forecast'], label='Holt\'s Exponential Forecasting')
plt.legend(loc='best')
plt.title('Holt\'s Exponential Smoothing Method')
plt.show()

#### Calculating Errors in Forecasted series, using RMSE and MAPE

In [None]:
rmse = np.sqrt(mean_squared_error(APAC_Consumer_Test['Sales'], APAC_Consumer_HES['Sales_Forecast'])).round(2)
mape = np.round(np.mean(np.abs(APAC_Consumer_Test['Sales']-APAC_Consumer_HES['Sales_Forecast'])/APAC_Consumer_Test['Sales'])*100,2)
temp_results = pd.DataFrame({'Method':['Holt\'s Exponential Trend Method'], 'RMSE': [rmse],'MAPE': [mape]})
results = pd.concat([results, temp_results])
results = results[['Method', 'RMSE', 'MAPE']]
results

We observe that Holt’s Exponential Smoothing with Trend gives us an upward trending estimate for the next six months, however, does not take care of movements within the months.

The RMSE is 17,194 and MAPE is 25.0

Holt’s Exponential Smoothing performs better than Simple Exponential Smoothing.

### 2.2 Holt Winters' additive method with trend and seasonality

In [None]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing
APAC_Consumer_HWA = APAC_Consumer_Test.copy()
model = ExponentialSmoothing(np.asarray(APAC_Consumer_Train['Sales']) ,seasonal_periods=12 ,trend='add', seasonal='add')
model_fit = model.fit(optimized=True)
print(model_fit.params)
APAC_Consumer_HWA['Sales_Forecast'] = model_fit.forecast(6)

#### Plotting the Train, Test and Forecasts

In [None]:
plt.figure(figsize=(16,4))
plt.plot(APAC_Consumer_Train['Sales'], label='Train')
plt.plot(APAC_Consumer_Test['Sales'], label='Test')
plt.plot(APAC_Consumer_HWA['Sales_Forecast'], label='Holt Winters\'s additive forecast')
plt.legend(loc='best')
plt.title('Holt Winters\' Additive Method')
plt.show()

#### Calculating Errors in Forecasted series, using RMSE and MAPE

In [None]:
rmse = np.sqrt(mean_squared_error(APAC_Consumer_Test['Sales'], APAC_Consumer_HWA['Sales_Forecast'])).round(2)
mape = np.round(np.mean(np.abs(APAC_Consumer_Test['Sales']-APAC_Consumer_HWA['Sales_Forecast'])/APAC_Consumer_Test['Sales'])*100,2)
temp_results = pd.DataFrame({'Method':['Holt Winters\'s Additive Method'], 'RMSE': [rmse],'MAPE': [mape]})
results = pd.concat([results, temp_results])
results = results[['Method', 'RMSE', 'MAPE']]
results

We observe that Holt Winter’s Additive Smoothing with Trend gives us an upward trending estimate for the next six months and takes care of movements within the months.

The RMSE is 12,971 and MAPE is 17.6

It is better than both Simple Exponential as well as Holt’s Exponential Smoothing techniques.

### 2.3 Holt Winters' multiplicative method with trend and seasonality

In [None]:
APAC_Consumer_HWM = APAC_Consumer_Test.copy()
model = ExponentialSmoothing(np.asarray(APAC_Consumer_Train['Sales']) ,seasonal_periods=12 ,trend='add', seasonal='mul')
model_fit = model.fit(optimized=True)
print(model_fit.params)
APAC_Consumer_HWM['Sales_Forecast'] = model_fit.forecast(6)

#### Plotting the Train, Test and Forecasts

In [None]:
plt.figure(figsize=(16,4))
plt.plot(APAC_Consumer_Train['Sales'], label='Train')
plt.plot(APAC_Consumer_Test['Sales'], label='Test')
plt.plot(APAC_Consumer_HWM['Sales_Forecast'], label='Holt Winters\'s Multiplicative forecast')
plt.legend(loc='best')
plt.title('Holt Winters\' Multiplicative Method')
plt.show()

#### Calculating Errors in Forecasted series, using RMSE and MAPE

In [None]:
rmse = np.sqrt(mean_squared_error(APAC_Consumer_Test['Sales'], APAC_Consumer_HWM['Sales_Forecast'])).round(2)
mape = np.round(np.mean(np.abs(APAC_Consumer_Test['Sales']-APAC_Consumer_HWM['Sales_Forecast'])/APAC_Consumer_Test['Sales'])*100,2)
temp_results = pd.DataFrame({'Method':['Holt Winters\'s Multiplicative Method'], 'RMSE': [rmse],'MAPE': [mape]})
results = pd.concat([results, temp_results])
results = results[['Method', 'RMSE', 'MAPE']]
results

We observe that Holt Winter’s Multiplicative Smoothing with Trend gives us an upward trending estimate for the next six months and take care of movements within the months.

The RMSE is 11,753 and MAPE is 19.6

It is better than Simple Exponential and Holt’s Exponential but not as good as Holt Winter’s Additive Smoothing.

#### Overall, among the 4 Smoothing techniques, Holt Winter’s Additive Method has the least MAPE and hence, the best Smoothing technique for the given Time-Series data.

### Stationarity Vs. Non-Stationarity in the Time Series

Before proceeding towards ARIMA methods, we will check for Stationarity of the Time-Series

If the Time-Series is not Stationary, we need to transform it in order to make it Stationary

To validate wether the Time Series possesses Stationarity or not, we will perform couple of statistical tests, namely ADF and KPSS

### Augmented Dickey-Fuller (ADF) Test

In [None]:
from statsmodels.tsa.stattools import adfuller
adf_test = adfuller(APAC_Consumer['Sales'])
print('ADF Statistics: %f' % adf_test[0])
print('Critical Value @ 0.05: %.2f' % adf_test[4]['5%'])
print('p-value: %f' % adf_test[1])
print(adf_test)

#### Based on the ADF Test, we observe that that the p-value (~0.20) is more than the critical value of  0.05 hence we fail to reject the null hypothesis i.e., the series is NOT stationary

### Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

In [None]:
from statsmodels.tsa.stattools import kpss
kpss_test = kpss(APAC_Consumer['Sales'])
print('KPSS Statistics: %f: ' % kpss_test[0])
print('Critical Value @ 0.05: %.2f' % kpss_test[3]['5%'])
print('p-value: %f' % kpss_test[1])
print(kpss_test)

#### The KPSS Test results into a p-value of 0.02, which is less than the critical value of 0.05, hence we reject the null hypotheses i.e., the series is NOT stationary

We need the series to have Stationarity, hence we need to transform the series. We will use Box Cox Transformation and Differencing for this and will check the Stationarity once again using ADF and KPSS methods.

## Box Cox Transformation

In [None]:
APAC_Consumer.index = APAC_Consumer.index.to_timestamp()

In [None]:
from scipy.stats import boxcox
data_boxcox = pd.Series(boxcox(APAC_Consumer['Sales'], lmbda = 0), index = APAC_Consumer.index)
plt.figure(figsize = (16,4))
plt.plot(data_boxcox, label = 'After Box Cox Transformation')
plt.legend(loc = 'best')
plt.title('After Box Cox Transformation')
plt.show()

### Differencing

In [None]:
data_boxcox_diff = pd.Series(data_boxcox - data_boxcox.shift())
data_boxcox_diff.dropna(inplace = True)
plt.figure(figsize = (16,4))
plt.plot(data_boxcox_diff, label = 'After Box Cox Transformation and Differencing')
plt.legend(loc = 'best')
plt.title('After Box Cox Transformation and Differencing')
plt.show()

### Augmented Dickey-Fuller (ADF) Test post Box Cox Transformation and Differencing

In [None]:
from statsmodels.tsa.stattools import adfuller
adf_test = adfuller(data_boxcox_diff)
print('ADF Statistics: %f' % adf_test[0])
print('Critical Value @ 0.05: %.2f' % adf_test[4]['5%'])
print('p-value: %f' % adf_test[1])
print(adf_test)

#### Post Box Cox Transformation and Differencing, the ADF test results in p-value of ~0.00, which is less than the critical value of 0.05 hence we reject the null hypothesis i.e., the series is now stationary

### Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test post Box Cox Transformation and Differencing

In [None]:
from statsmodels.tsa.stattools import kpss
kpss_test = kpss(data_boxcox_diff)
print('KPSS Statistics: %f: ' % kpss_test[0])
print('Critical Value @ 0.05: %.2f' % kpss_test[3]['5%'])
print('p-value: %f' % kpss_test[1])
print(kpss_test)

#### Post Box Cox Transformation and Differencing, the KPSS test results in p-value of 0.10, which is more than the critical value of 0.05 hence we fail to reject the null hypotheses i.e., the series is now stationary

Let's have a quick look at the ACF and PACF plots to visualize Autocorrelations

#### Autocorrelation Function (ACF) Plot

In [None]:
from statsmodels.graphics.tsaplots import plot_acf
plt.figure(figsize = (16,4))
plot_acf(data_boxcox_diff, ax = plt.gca(), lags = 30)
plt.show()

#### Partical Autocorrelation (PACF) Plot

In [None]:
from statsmodels.graphics.tsaplots import plot_pacf
plt.figure(figsize = (16,4))
plot_pacf(data_boxcox_diff, ax = plt.gca(), lags = 30)
plt.show()

## Auto Regressive Methods

### 3. Auto Regressive Models

With more than 10 observations depicting a clear upward Trend and Seasonality, we will be using following ARIMA techniques:
1. Auto Regressive (AR) Smoothing
2. Moving Average (MA) Smoothing
3. Auto Regressive Moving Average (ARMA) Smoothing
4. Auto Regressive Integrated Moving Average (ARIMA) Smoothing
5. Seasonal Auto Regressive Integrated Moving Average (SARIMA) Smoothing

We will split the boxcox series into train and test series

In [None]:
train_data_boxcox = data_boxcox[:train_len]
test_data_boxcox = data_boxcox[train_len:]
train_data_boxcox_diff = data_boxcox_diff[:train_len-1]
test_data_boxcox_diff = data_boxcox_diff[train_len-1:]

#### 3.1 Auto Regression (AR) Method

In [None]:
from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(train_data_boxcox_diff, order = (1,0,0))
model_fit = model.fit()
print(model_fit.params)

#### Recover Original Time Series Forecast

In [None]:
APAC_Consumer_AR = data_boxcox_diff.copy()
APAC_Consumer_AR['AR_Sales_Forecast_Boxcox_Diff'] = model_fit.predict(data_boxcox_diff.index.min(), data_boxcox_diff.index.max())
APAC_Consumer_AR['AR_Sales_Forecast_Boxcox'] = APAC_Consumer_AR['AR_Sales_Forecast_Boxcox_Diff'].cumsum()
APAC_Consumer_AR['AR_Sales_Forecast_Boxcox'] = APAC_Consumer_AR['AR_Sales_Forecast_Boxcox'].add(data_boxcox[0])
APAC_Consumer_AR['AR_Sales_Forecast'] = np.exp(APAC_Consumer_AR['AR_Sales_Forecast_Boxcox'])

#### Plotting the Train, Test and Forecasts

In [None]:
plt.figure(figsize=(16,4))
plt.plot(APAC_Consumer_Train['Sales'], label='Train')
plt.plot(APAC_Consumer_Test['Sales'], label='Test')
plt.plot(APAC_Consumer_AR['AR_Sales_Forecast'][APAC_Consumer_Test.index.min():], label='Auto Regression (AR) forecast')
plt.legend(loc='best')
plt.title('AR (Auto Regression) forecast')
plt.show()

#### Calculating Errors in Forecasted series, using RMSE and MAPE

In [None]:
rmse = np.sqrt(mean_squared_error(APAC_Consumer_Test['Sales'], APAC_Consumer_AR['AR_Sales_Forecast'][APAC_Consumer_Test.index.min():])).round(2)
mape = np.round(np.mean(np.abs(APAC_Consumer_Test['Sales']-APAC_Consumer_AR['AR_Sales_Forecast'][APAC_Consumer_Test.index.min():])/APAC_Consumer_Test['Sales'])*100,2)
temp_results = pd.DataFrame({'Method':['AR (Auto Regression) Method'], 'RMSE': [rmse],'MAPE': [mape]})
results = pd.concat([results, temp_results])
results = results[['Method', 'RMSE', 'MAPE']]
results

We observe that AR Forecast gives us an upward trending estimate for the next six months and take care of movements within the months.

The RMSE is 15,505 and MAPE is 27.3

#### 3.2 Moving Average (MA) Method

In [None]:
model = ARIMA(train_data_boxcox_diff, order = (0, 0, 1))
model_fit = model.fit()
print(model_fit.params)

#### Recover Original Time Series Forecast

In [None]:
APAC_Consumer_MA = data_boxcox_diff.copy()
APAC_Consumer_MA['MA_Sales_Forecast_Boxcox_Diff'] = model_fit.predict(data_boxcox_diff.index.min(), data_boxcox_diff.index.max())
APAC_Consumer_MA['MA_Sales_Forecast_Boxcox'] = APAC_Consumer_MA['MA_Sales_Forecast_Boxcox_Diff'].cumsum()
APAC_Consumer_MA['MA_Sales_Forecast_Boxcox'] = APAC_Consumer_MA['MA_Sales_Forecast_Boxcox'].add(data_boxcox[0])
APAC_Consumer_MA['MA_Sales_Forecast'] = np.exp(APAC_Consumer_MA['MA_Sales_Forecast_Boxcox'])

#### Plotting the Train, Test and Forecasts

In [None]:
plt.figure(figsize=(16,4))
plt.plot(APAC_Consumer_Train['Sales'], label='Train')
plt.plot(APAC_Consumer_Test['Sales'], label='Test')
plt.plot(APAC_Consumer_MA['MA_Sales_Forecast'][APAC_Consumer_Test.index.min():], label='Moving Average (MA) forecast')
plt.legend(loc='best')
plt.title('MA (Moving Average) forecast')
plt.show()

#### Calculating Errors in Forecasted series, using RMSE and MAPE

In [None]:
rmse = np.sqrt(mean_squared_error(APAC_Consumer_Test['Sales'], APAC_Consumer_MA['MA_Sales_Forecast'][APAC_Consumer_Test.index.min():])).round(2)
mape = np.round(np.mean(np.abs(APAC_Consumer_Test['Sales']-APAC_Consumer_MA['MA_Sales_Forecast'][APAC_Consumer_Test.index.min():])/APAC_Consumer_Test['Sales'])*100,2)
temp_results = pd.DataFrame({'Method':['MA (Moving Average) Method'], 'RMSE': [rmse],'MAPE': [mape]})
results = pd.concat([results, temp_results])
results = results[['Method', 'RMSE', 'MAPE']]
results

We observe that MA Forecast gives us an upward trending estimate for the next six months and take care of movements within the months.

The RMSE is 52,903 and MAPE is 81.6

The MA forecast has a very high MAPE and is not as good as AR method.

#### 3.3 Auto Regressive Moving Average (ARMA) Model

In [None]:
model = ARIMA(train_data_boxcox_diff, order = (1, 0, 1))
model_fit = model.fit()
print(model_fit.params)

#### Recover Original Time Series Forecast

In [None]:
APAC_Consumer_ARMA = data_boxcox_diff.copy()
APAC_Consumer_ARMA['ARMA_Sales_Forecast_Boxcox_Diff'] = model_fit.predict(data_boxcox_diff.index.min(), data_boxcox_diff.index.max())
APAC_Consumer_ARMA['ARMA_Sales_Forecast_Boxcox'] = APAC_Consumer_ARMA['ARMA_Sales_Forecast_Boxcox_Diff'].cumsum()
APAC_Consumer_ARMA['ARMA_Sales_Forecast_Boxcox'] = APAC_Consumer_ARMA['ARMA_Sales_Forecast_Boxcox'].add(data_boxcox[0])
APAC_Consumer_ARMA['ARMA_Sales_Forecast'] = np.exp(APAC_Consumer_ARMA['ARMA_Sales_Forecast_Boxcox'])

#### Plotting the Train, Test and Forecasts

In [None]:
plt.figure(figsize=(16,4))
plt.plot(APAC_Consumer_Train['Sales'], label='Train')
plt.plot(APAC_Consumer_Test['Sales'], label='Test')
plt.plot(APAC_Consumer_ARMA['ARMA_Sales_Forecast'][APAC_Consumer_Test.index.min():], label='Auto Regressive Moving Average (MA) forecast')
plt.legend(loc='best')
plt.title('ARMA (Auto Regressive Moving Average) forecast')
plt.show()

#### Calculating Errors in Forecasted series, using RMSE and MAPE

In [None]:
rmse = np.sqrt(mean_squared_error(APAC_Consumer_Test['Sales'], APAC_Consumer_ARMA['ARMA_Sales_Forecast'][APAC_Consumer_Test.index.min():])).round(2)
mape = np.round(np.mean(np.abs(APAC_Consumer_Test['Sales']-APAC_Consumer_ARMA['ARMA_Sales_Forecast'][APAC_Consumer_Test.index.min():])/APAC_Consumer_Test['Sales'])*100,2)
temp_results = pd.DataFrame({'Method':['ARMA (Auto Regressive Moving Average) Method'], 'RMSE': [rmse],'MAPE': [mape]})
results = pd.concat([results, temp_results])
results = results[['Method', 'RMSE', 'MAPE']]
results

We observe that ARMA Forecast gives us an upward trending estimate for the next six months and take care of movements within the months.

The RMSE is 50,757 and MAPE is 77.7

ARMA is better than MA method but not as good as AR method.

#### 3.4 Auto Regressive Integrated Moving Average (ARIMA) Model

In [None]:
# We would pass on data_boxcox and not data_boxcox_diff because ARIMA takes care of differencing
model = ARIMA(train_data_boxcox, order = (1, 1, 1))
model_fit = model.fit()
print(model_fit.params)

#### Recover Original Time Series Forecast

In [None]:
APAC_Consumer_ARIMA = data_boxcox_diff.copy()
APAC_Consumer_ARIMA['ARIMA_Sales_Forecast_Boxcox_Diff'] = model_fit.predict(data_boxcox_diff.index.min(), data_boxcox_diff.index.max())
APAC_Consumer_ARIMA['ARIMA_Sales_Forecast_Boxcox'] = APAC_Consumer_ARIMA['ARIMA_Sales_Forecast_Boxcox_Diff'].cumsum()
APAC_Consumer_ARIMA['ARIMA_Sales_Forecast_Boxcox'] = APAC_Consumer_ARIMA['ARIMA_Sales_Forecast_Boxcox'].add(data_boxcox[0])
APAC_Consumer_ARIMA['ARIMA_Sales_Forecast'] = np.exp(APAC_Consumer_ARIMA['ARIMA_Sales_Forecast_Boxcox'])

#### Plotting the Train, Test and Forecasts

In [None]:
plt.figure(figsize=(16,4))
plt.plot(APAC_Consumer_Train['Sales'], label='Train')
plt.plot(APAC_Consumer_Test['Sales'], label='Test')
plt.plot(APAC_Consumer_ARIMA['ARIMA_Sales_Forecast'][APAC_Consumer_Test.index.min():], label='Auto Regressive Integrated Moving Average (MA) forecast')
plt.legend(loc='best')
plt.title('ARIMA (Auto Regressive Integrated Moving Average) forecast')
plt.show()

#### Calculating Errors in Forecasted series, using RMSE and MAPE

In [None]:
rmse = np.sqrt(mean_squared_error(APAC_Consumer_Test['Sales'], APAC_Consumer_ARIMA['ARIMA_Sales_Forecast'][APAC_Consumer_Test.index.min():])).round(2)
mape = np.round(np.mean(np.abs(APAC_Consumer_Test['Sales']-APAC_Consumer_ARIMA['ARIMA_Sales_Forecast'][APAC_Consumer_Test.index.min():])/APAC_Consumer_Test['Sales'])*100,2)
temp_results = pd.DataFrame({'Method':['ARIMA (Auto Regressive Integrated Moving Average) Method'], 'RMSE': [rmse],'MAPE': [mape]})
results = pd.concat([results, temp_results])
results = results[['Method', 'RMSE', 'MAPE']]
results

We observe that ARIMA Forecast gives us an upward trending estimate for the next six months and take care of movements within the months.

The RMSE is 50,757 and MAPE is 77.7

ARIMA has the same output as ARMA and hence, is as good as ARMA method.

ARIMA is better than MA method but not as good as AR method.

#### 4.5 Seasonal Auto Regressive Integrated Moving Average (SARIMA) Model

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(train_data_boxcox, order = (1, 1, 1), seasonal_order = (1,1,1,12))
model_fit = model.fit()
print(model_fit.params)

#### Recover Original Time Series Forecast

In [None]:
APAC_Consumer_SARIMA = data_boxcox_diff.copy()
# Integration is already taken care of in SARIMAX (unlike ARIMA)
APAC_Consumer_SARIMA['SARIMA_Sales_Forecast_Boxcox'] = model_fit.predict(data_boxcox_diff.index.min(), data_boxcox_diff.index.max())
APAC_Consumer_SARIMA['SARIMA_Sales_Forecast'] = np.exp(APAC_Consumer_SARIMA['SARIMA_Sales_Forecast_Boxcox'])

#### Plotting the Train, Test and Forecasts

In [None]:
plt.figure(figsize=(16,4))
plt.plot(APAC_Consumer_Train['Sales'], label='Train')
plt.plot(APAC_Consumer_Test['Sales'], label='Test')
plt.plot(APAC_Consumer_SARIMA['SARIMA_Sales_Forecast'][APAC_Consumer_Test.index.min():], label='SARIMA (Seasonal Auto Regressive Integrated Moving Average) forecast')
plt.legend(loc='best')
plt.title('Seasonal Auto Regressive Integrated Moving Average (MA) forecast')
plt.show()

#### Calculating Errors in Forecasted series, using RMSE and MAPE

In [None]:
rmse = np.sqrt(mean_squared_error(APAC_Consumer_Test['Sales'], APAC_Consumer_SARIMA['SARIMA_Sales_Forecast'][APAC_Consumer_Test.index.min():])).round(2)
mape = np.round(np.mean(np.abs(APAC_Consumer_Test['Sales']-APAC_Consumer_SARIMA['SARIMA_Sales_Forecast'][APAC_Consumer_Test.index.min():])/APAC_Consumer_Test['Sales'])*100,2)
temp_results = pd.DataFrame({'Method':['SARIMA (Seasonal Auto Regressive Integrated Moving Average) Method'], 'RMSE': [rmse],'MAPE': [mape]})
results = pd.concat([results, temp_results])
results = results[['Method', 'RMSE', 'MAPE']]
results

We observe that SARIMA Forecast gives us an upward trending estimate for the next six months and take care of movements within the months as well as Seasonality.

The RMSE is 11,179 and MAPE is 18.4

SARIMA has lowest MAPE value among AR, MA, ARMA, ARIMA and SARIMA.

#### Overall, among the 5 ARIMA techniques, SARIMA Forecasting has the least MAPE and hence, the best Forecasting technique for the give Time-Series data.

## Overall, the best forecasting methods are as below (based on their MAPE values):
### For Smoothing method, the best option is Holt Winter's Additive Method
### For Auto-Regressive methods, the best option is SARIMA