<a href="https://colab.research.google.com/github/ouhibyann/Sales-forecasting/blob/master/PDS_ING3_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://colab.research.google.com" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project : Machine Learning sales forecasting

## Aim : 


*   Predict sales' volume thanks to different Machine Learning algorithms in order to help managers to take actions
*   This is the last part of a bigger project which purpose was to monitor a whole shopping center throughout different metrics such as : it's sales, client reviews, product monitoring, ...


*The data has been imported from a .csv which has been generated from a MapReduce job from previous steps in the project.*



In [0]:
import csv
import pandas as pd
import numpy as np

from matplotlib import pyplot
import plotly.graph_objs as go
import plotly.offline as pyoff

import statsmodels.formula.api as smf
from sklearn.preprocessing import *
from sklearn.ensemble import *

import keras
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD 
from keras.callbacks import EarlyStopping
from keras.utils import np_utils
from keras.layers import LSTM

## Part 1 : Data loading, wrangling and visualization

In [0]:
# Reading the .csv file
with open('SalesCLEANED.csv', newline='\n', mode='r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    row_count = 1
    type_product = []
    type_sales = []
    date = []
    price = []
    volume = []
    for row in csv_reader:
        row_count = row_count + 1
        type_product.append(row[1])
        type_sales.append(row[2])
        date.append(row[3])
        price.append(float(row[-1]))
        volume.append(int(row[-3]))

csv_file.close()


In [0]:
# Putting the data in dataframe for better manipulation
df = pd.DataFrame(list(zip(type_product, type_sales, date, volume, price)), columns=['type_product', 'type_sales', 'Date', 'volume', 'price'])
df['Date'] = pd.to_datetime(df.Date)

# Aggregating by month to be more specific for decision making process
df['Monthly'] = df['Date'].dt.year.astype('str') + '-' + df['Date'].dt.month.astype('str') + '-01'
df['Monthly'] = pd.to_datetime(df['Monthly'])

df.head()


Unnamed: 0,type_product,type_sales,Date,volume,price,Monthly
0,Fruits,Offline,2012-07-27,6,55.98,2012-07-01
1,Clothes,Online,2013-09-14,8,874.24,2013-09-01
2,Meat,Offline,2015-05-15,0,0.0,2015-05-01
3,Clothes,Offline,2017-05-17,5,546.4,2017-05-01
4,Beverages,Offline,2016-10-26,9,427.05,2016-10-01


As one can observe, there are already a lot of features in the dataset however this study will consider the sales only as a time serie so features other than date and sales - wich is refered as 'volume' in the dataframe - won't be retained.
As a result, we've got the following **aggregated dataset** :


In [0]:
df_aggregated = df.groupby('Monthly').volume.sum().reset_index()

df_aggregated.head()

Unnamed: 0,Monthly,volume
0,2010-01-01,75189
1,2010-02-01,68422
2,2010-03-01,74926
3,2010-04-01,72102
4,2010-05-01,75304


In [0]:
plot_data = [go.Scatter(x=df_aggregated['Monthly'], y=df_aggregated['volume'])]
plot_layout = go.Layout(title='Ventes mensuelles')
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

The figure above demonstrates how the trend in sales is quite homogeneous. Also, febuary is more than often the month where sales are at their lowest. It can be explained as the month after Christimas and new year Eve which results in sales plummeting.

## Part 2 : features engineering 

In [0]:
# Create new dataframe to model the diff
df_diff = df_aggregated.copy()
df_diff['prev_sales'] = df_aggregated['volume'].shift(1)
df_diff = df_diff.dropna()
df_diff['diff'] = (df_diff['volume'] - df_diff['prev_sales'])

df_diff.head()

Unnamed: 0,Monthly,volume,prev_sales,diff
1,2010-02-01,68422,75189.0,-6767.0
2,2010-03-01,74926,68422.0,6504.0
3,2010-04-01,72102,74926.0,-2824.0
4,2010-05-01,75304,72102.0,3202.0
5,2010-06-01,72838,75304.0,-2466.0


Sales and the substract from the previous months.
It helps us to understand how sales are varying.

In [0]:
# Create dataframe for transformation from time series to supervised dataset
# We use previous monthly sales data as features so basically 12 months so 12 features
df_supervised = df_diff.drop(['prev_sales'], axis=1)
for i in range(1, 13):
    field_name = 'lag' + str(i)
    df_supervised[field_name] = df_supervised['diff'].shift(i)

df_supervised = df_supervised.dropna().reset_index(drop=True)

df_supervised.head(-10)

Unnamed: 0,Monthly,volume,diff,lag1,lag2,lag3,lag4,lag5,lag6,lag7,lag8,lag9,lag10,lag11,lag12
0,2011-02-01,68859,-7387.0,1178.0,2519.0,-1257.0,751.0,-3176.0,1034.0,2359.0,-2466.0,3202.0,-2824.0,6504.0,-6767.0
1,2011-03-01,76355,7496.0,-7387.0,1178.0,2519.0,-1257.0,751.0,-3176.0,1034.0,2359.0,-2466.0,3202.0,-2824.0,6504.0
2,2011-04-01,73853,-2502.0,7496.0,-7387.0,1178.0,2519.0,-1257.0,751.0,-3176.0,1034.0,2359.0,-2466.0,3202.0,-2824.0
3,2011-05-01,74843,990.0,-2502.0,7496.0,-7387.0,1178.0,2519.0,-1257.0,751.0,-3176.0,1034.0,2359.0,-2466.0,3202.0
4,2011-06-01,72383,-2460.0,990.0,-2502.0,7496.0,-7387.0,1178.0,2519.0,-1257.0,751.0,-3176.0,1034.0,2359.0,-2466.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63,2016-05-01,75944,2843.0,-3129.0,5897.0,-5351.0,106.0,2927.0,-2428.0,1659.0,-2274.0,13.0,2404.0,-1566.0,2615.0
64,2016-06-01,73462,-2482.0,2843.0,-3129.0,5897.0,-5351.0,106.0,2927.0,-2428.0,1659.0,-2274.0,13.0,2404.0,-1566.0
65,2016-07-01,76201,2739.0,-2482.0,2843.0,-3129.0,5897.0,-5351.0,106.0,2927.0,-2428.0,1659.0,-2274.0,13.0,2404.0
66,2016-08-01,75860,-341.0,2739.0,-2482.0,2843.0,-3129.0,5897.0,-5351.0,106.0,2927.0,-2428.0,1659.0,-2274.0,13.0


Using data shifting in order to generate features.
We use previous monthly sales data as features on a time frame of 12 months - so 12 features.
One could argue the 'diff' column should not be present as a feature but it might be removed later on.

In [0]:
# Adjusted R-squared 
model = smf.ols(formula='diff ~ lag1 + lag2 + lag3 + lag4 + lag5 + lag6 + lag7 + lag8 + lag9 +lag10 + lag11 + lag12', data=df_supervised)
model_fit = model.fit()

print("R² = ", model_fit.rsquared_adj)

R² =  0.8843458157025764


In [0]:
# Correlation matrix
df_supervised.corr()

Unnamed: 0,volume,diff,lag1,lag2,lag3,lag4,lag5,lag6,lag7,lag8,lag9,lag10,lag11,lag12
volume,1.0,0.856449,-0.348826,0.072238,0.09392,-0.245913,0.438726,-0.378651,0.15375,-0.026834,-0.136151,0.374142,-0.76363,0.764504
diff,0.856449,1.0,-0.703378,0.252804,0.004196,-0.189596,0.381141,-0.460664,0.311883,-0.112213,-0.06527,0.298778,-0.670813,0.909035
lag1,-0.348826,-0.703378,1.0,-0.711428,0.263011,-0.006384,-0.169979,0.367444,-0.461064,0.319611,-0.113625,-0.061349,0.293024,-0.668868
lag2,0.072238,0.252804,-0.711428,1.0,-0.710376,0.25937,0.002308,-0.188124,0.370954,-0.461777,0.323059,-0.116142,-0.049826,0.268841
lag3,0.09392,0.004196,0.263011,-0.710376,1.0,-0.70827,0.24998,0.021462,-0.189777,0.368596,-0.46134,0.321403,-0.115362,-0.039435
lag4,-0.245913,-0.189596,-0.006384,0.25937,-0.70827,1.0,-0.708084,0.234748,0.02166,-0.184715,0.366604,-0.45806,0.313278,-0.119332
lag5,0.438726,0.381141,-0.169979,0.002308,0.24998,-0.708084,1.0,-0.688925,0.237096,0.010242,-0.1879,0.368201,-0.460845,0.339208
lag6,-0.378651,-0.460664,0.367444,-0.188124,0.021462,0.234748,-0.688925,1.0,-0.708738,0.260615,0.003065,-0.179911,0.35826,-0.474372
lag7,0.15375,0.311883,-0.461064,0.370954,-0.189777,0.02166,0.237096,-0.708738,1.0,-0.711536,0.265968,-0.00293,-0.161827,0.332639
lag8,-0.026834,-0.112213,0.319611,-0.461777,0.368596,-0.184715,0.010242,0.260615,-0.711536,1.0,-0.713163,0.267663,-0.012881,-0.139857


R squared and correlations help to confirm our features are useful and not repetitive.
Thanks to regression of 'diff' on 'lag1' to 'lag12', we know the differences can be explained through those features.
The correlation matrix assures there are no strong correlation - the next cell will display the strongest correlations and as we will see, it is at most 0.9 - not very strong in our context.

In [0]:
# First 15 correlations
c = df_supervised.corr()
s = c.unstack()
so = s.sort_values(kind="quicksort")

print(so[-15:])

diff    lag12     0.909035
volume  volume    1.000000
lag10   lag10     1.000000
lag9    lag9      1.000000
lag8    lag8      1.000000
lag7    lag7      1.000000
lag6    lag6      1.000000
lag5    lag5      1.000000
lag4    lag4      1.000000
lag3    lag3      1.000000
lag2    lag2      1.000000
lag1    lag1      1.000000
diff    diff      1.000000
lag11   lag11     1.000000
lag12   lag12     1.000000
dtype: float64


Here are the correlations, obvisously the first 13 correlations are from the diagonal of the matrix - thus all equal to 1.
The next one is a feature correlated to 'diff', but as mentionned earlier the 'diff' column will be droped later on.

Next, let's split the dataset in training and testing - 6 years of training 1 year of testing.
Before all else, we are going to reshape the data by substracting the mean and dividing by the standart deviation, then reshape between [-1,1] for the activation function.

In [0]:
# Let's split train and test sets
df_model = df_supervised.drop(['volume','Monthly'],axis=1)

train_set, test_set = df_model[0:-13].values, df_model[-13:].values

In [0]:
# Scaling each features [-1;1]
# Thus will limit side effects in the neural net through the activation function
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = scaler.fit(train_set)

# Reshape training set
# Normalizing 
train_set = train_set.reshape(train_set.shape[0], train_set.shape[1])
train_set_scaled = scaler.transform(train_set)

# Reshape training set
# Normalizing 
test_set = test_set.reshape(test_set.shape[0], test_set.shape[1])
test_set_scaled = scaler.transform(test_set)

In [0]:
X_train, y_train = train_set_scaled[:, 1:], train_set_scaled[:, 0:1]
X_train = X_train.reshape(X_train.shape[0], 1, X_train.shape[1])
X_test, y_test = test_set_scaled[:, 1:], test_set_scaled[:, 0:1]
X_test = X_test.reshape(X_test.shape[0], 1, X_test.shape[1])

## Part 3 : RNN training and testing

Recurrent Neural Network is one of the most used model - with ARIMA model - for time series. Let's see how it can be used in our context and if the results are satisfying.

### The first model tried

In this part I'm going to present the results from the 'first' model - before tuning parameters.

In [0]:
# Intializing the neural net
model = Sequential()

# Adding LSTM to the neural net. 
# If needed, we will do some parameters tuning by changing activation, units, ...
model.add(LSTM(units = 4, activation='tanh', recurrent_activation='sigmoid', use_bias=True, batch_input_shape=(1, X_train.shape[1], X_train.shape[2])))
model.add(Dense(1))
# If overfitting happens, we can use dropout
# model.add(Dropout(0.3))

# Stochastic Gradient Descent for the moment event though it might not be the best optimizer and batch_size of 1.
model.compile(loss='mean_squared_error', optimizer='SGD')
model.fit(X_train, y_train, epochs=100, batch_size=1, verbose=1, shuffle=False) 



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.callbacks.History at 0x7f8a6b187ef0>

In [0]:
# Let's test the model
y_pred = model.predict(X_test,batch_size=1)

y_pred = y_pred.reshape(y_pred.shape[0], 1, y_pred.shape[1])

# Rebuilding test set
pred_test_set = []
for index in range(0,len(y_pred)):
    print(np.concatenate([y_pred[index],X_test[index]],axis=1))
    pred_test_set.append(np.concatenate([y_pred[index],X_test[index]],axis=1))

# Reshape pred_test_set
pred_test_set = np.array(pred_test_set)
pred_test_set = pred_test_set.reshape(pred_test_set.shape[0], pred_test_set.shape[2])
pred_test_set_inverted = scaler.inverse_transform(pred_test_set)

[[ 0.3105329  -0.28000965  0.36240801 -0.35806491  0.73084811 -0.62613102
   0.03221136  0.37254192 -0.27349499  0.2195681  -0.25491615  0.02099168
   0.30944625]]
[[-0.00686472  0.34986126 -0.28000965  0.36240801 -0.35806491  0.73084811
  -0.62613102  0.03221136  0.37254192 -0.27349499  0.2195681  -0.25491615
   0.02099168]]
[[-0.19671489 -0.02171553  0.34986126 -0.28000965  0.36240801 -0.35806491
   0.73084811 -0.62613102  0.03221136  0.37254192 -0.27349499  0.2195681
  -0.25491615]]
[[ 0.31484362 -0.30329352 -0.02171553  0.34986126 -0.28000965  0.36240801
  -0.35806491  0.73084811 -0.62613102  0.03221136  0.37254192 -0.27349499
   0.2195681 ]]
[[-0.29209313  0.26625648 -0.30329352 -0.02171553  0.34986126 -0.28000965
   0.36240801 -0.35806491  0.73084811 -0.62613102  0.03221136  0.37254192
  -0.27349499]]
[[ 0.25164807 -0.14501146  0.26625648 -0.30329352 -0.02171553  0.34986126
  -0.28000965  0.36240801 -0.35806491  0.73084811 -0.62613102  0.03221136
   0.37254192]]
[[ 0.02617502  0.

In [0]:
# Creating a dataframe containing the predictions
result_list = []
sales_dates = list(df_aggregated[-13:].Monthly)
act_sales = list(df_aggregated[-13:].volume)
for index in range(0,len(pred_test_set_inverted)):
    result_dict = {}
    result_dict['pred_value'] = int(pred_test_set_inverted[index][0] + act_sales[index])
    result_dict['Monthly'] = sales_dates[index]
    result_list.append(result_dict)
df_result = pd.DataFrame(result_list)
df_result.head()

Unnamed: 0,pred_value,Monthly
0,78614,2016-07-01
1,75642,2016-08-01
2,71393,2016-09-01
3,77679,2016-10-01
4,71285,2016-11-01


In [0]:
# We merge the previous sales and predicted sales for vizualisation
df_sales_pred = pd.merge(df_aggregated,df_result,on='Monthly',how='left')

# Plot actual and predicted
plot_data = [
    go.Scatter(
        x=df_sales_pred['Monthly'],
        y=df_sales_pred['volume'],
        name='actual'
    ),
        go.Scatter(
        x=df_sales_pred['Monthly'],
        y=df_sales_pred['pred_value'],
        name='predicted'
    )
    
]
plot_layout = go.Layout(
        title='Sales Prediction'
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

### The optimal model for RNN 

In this subpart, you will observe the model which gave the best results. As I said earlier, I only changed the batch_size to 32.

In [0]:
# Creating the neural net
model = Sequential()

# Adding LSTM to the neural net. 
# If needed, we will do some parameters tuning by changing activation, units, ...
model.add(LSTM(units = 4, activation='tanh', recurrent_activation='sigmoid', use_bias=True, batch_input_shape=(1, X_train.shape[1], X_train.shape[2])))

# If overfitting happens, we can use dropout
# model.add(Dropout(0.3))
model.add(Dense(1))

# Stochastic Gradient Descent for the moment event though it might not be the best optimizer and batch_size of 1.
model.compile(loss='mean_squared_error', optimizer='SGD')
model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=1, shuffle=False) 


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.callbacks.History at 0x7f8a65f7c940>

In [0]:
# Let's test the model
y_pred = model.predict(X_test,batch_size=1)

y_pred = y_pred.reshape(y_pred.shape[0], 1, y_pred.shape[1])

# Rebuilding test set
pred_test_set = []
for index in range(0,len(y_pred)):
    print(np.concatenate([y_pred[index],X_test[index]],axis=1))
    pred_test_set.append(np.concatenate([y_pred[index],X_test[index]],axis=1))

# Reshape pred_test_set
pred_test_set = np.array(pred_test_set)
pred_test_set = pred_test_set.reshape(pred_test_set.shape[0], pred_test_set.shape[2])
pred_test_set_inverted = scaler.inverse_transform(pred_test_set)

[[-0.14030437 -0.28000965  0.36240801 -0.35806491  0.73084811 -0.62613102
   0.03221136  0.37254192 -0.27349499  0.2195681  -0.25491615  0.02099168
   0.30944625]]
[[-0.14577715  0.34986126 -0.28000965  0.36240801 -0.35806491  0.73084811
  -0.62613102  0.03221136  0.37254192 -0.27349499  0.2195681  -0.25491615
   0.02099168]]
[[-0.08711375 -0.02171553  0.34986126 -0.28000965  0.36240801 -0.35806491
   0.73084811 -0.62613102  0.03221136  0.37254192 -0.27349499  0.2195681
  -0.25491615]]
[[-0.04487858 -0.30329352 -0.02171553  0.34986126 -0.28000965  0.36240801
  -0.35806491  0.73084811 -0.62613102  0.03221136  0.37254192 -0.27349499
   0.2195681 ]]
[[-0.15136863  0.26625648 -0.30329352 -0.02171553  0.34986126 -0.28000965
   0.36240801 -0.35806491  0.73084811 -0.62613102  0.03221136  0.37254192
  -0.27349499]]
[[-0.10093953 -0.14501146  0.26625648 -0.30329352 -0.02171553  0.34986126
  -0.28000965  0.36240801 -0.35806491  0.73084811 -0.62613102  0.03221136
   0.37254192]]
[[-0.06772801  0.

In [0]:
# Creating a dataframe containing the predictions
result_list = []
sales_dates = list(df_aggregated[-13:].Monthly)
act_sales = list(df_aggregated[-13:].volume)
for index in range(0,len(pred_test_set_inverted)):
    result_dict = {}
    result_dict['pred_value'] = int(pred_test_set_inverted[index][0] + act_sales[index])
    result_dict['Monthly'] = sales_dates[index]
    result_list.append(result_dict)
df_result = pd.DataFrame(result_list)
df_result.head()

Unnamed: 0,pred_value,Monthly
0,74877,2016-07-01
1,74490,2016-08-01
2,72301,2016-09-01
3,74698,2016-10-01
4,72452,2016-11-01


In [0]:
# We merge the previous sales and predicted sales for vizualisation
df_sales_pred = pd.merge(df_aggregated,df_result,on='Monthly',how='left')

# Plot actual and predicted
plot_data = [
    go.Scatter(
        x=df_sales_pred['Monthly'],
        y=df_sales_pred['volume'],
        name='actual'
    ),
        go.Scatter(
        x=df_sales_pred['Monthly'],
        y=df_sales_pred['pred_value'],
        name='predicted'
    )
    
]
plot_layout = go.Layout(
        title='Sales Prediction'
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

## Part 4 : Multilayer Perceptron training and testing

In this part, we are going to see the results using a multilayer perceptron. This part will be useful to compare with LSTM neural network and see if those where really useful. 

In [0]:
# # Creating the neural net
model = Sequential()

# Creating the multilayer perceptron. 
# Each '.add' add a new layer to the neural net
model.add(Dense(units=1, activation='sigmoid')) # Input layer
model.add(Dense(units=1, activation='relu')) # Hidden layer
model.add(Dense(units=1, activation='sigmoid')) # Output layer

# If overfitting happens, we can use dropout
# model.add(Dropout(0.3))

# Same loss function and same optimizer as the RNN ones'
model.compile(loss='mean_squared_error', optimizer='SGD')
model.fit(train_set_scaled, y_train, epochs=100, batch_size=32, verbose=1)



Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.callbacks.History at 0x7f8a6ecfb208>

In [0]:
# Let's test the model
y_pred = model.predict(test_set_scaled, batch_size=1)

y_pred = y_pred.reshape(y_pred.shape[0], 1, y_pred.shape[1])

# Rebuilding test set
pred_test_set = []
for index in range(0,len(y_pred)):
    print(np.concatenate([y_pred[index],X_test[index]],axis=1))
    pred_test_set.append(np.concatenate([y_pred[index],X_test[index]],axis=1))

# Reshape pred_test_set
pred_test_set = np.array(pred_test_set)
pred_test_set = pred_test_set.reshape(pred_test_set.shape[0], pred_test_set.shape[2])
pred_test_set_inverted = scaler.inverse_transform(pred_test_set)

[[ 0.3629193  -0.28000965  0.36240801 -0.35806491  0.73084811 -0.62613102
   0.03221136  0.37254192 -0.27349499  0.2195681  -0.25491615  0.02099168
   0.30944625]]
[[ 0.3629193   0.34986126 -0.28000965  0.36240801 -0.35806491  0.73084811
  -0.62613102  0.03221136  0.37254192 -0.27349499  0.2195681  -0.25491615
   0.02099168]]
[[ 0.3629193  -0.02171553  0.34986126 -0.28000965  0.36240801 -0.35806491
   0.73084811 -0.62613102  0.03221136  0.37254192 -0.27349499  0.2195681
  -0.25491615]]
[[ 0.3629193  -0.30329352 -0.02171553  0.34986126 -0.28000965  0.36240801
  -0.35806491  0.73084811 -0.62613102  0.03221136  0.37254192 -0.27349499
   0.2195681 ]]
[[ 0.3629193   0.26625648 -0.30329352 -0.02171553  0.34986126 -0.28000965
   0.36240801 -0.35806491  0.73084811 -0.62613102  0.03221136  0.37254192
  -0.27349499]]
[[ 0.3629193  -0.14501146  0.26625648 -0.30329352 -0.02171553  0.34986126
  -0.28000965  0.36240801 -0.35806491  0.73084811 -0.62613102  0.03221136
   0.37254192]]
[[ 0.3629193   0.

In [0]:
# Creating a dataframe containing the predictions
result_list = []
sales_dates = list(df_aggregated[-13:].Monthly)
act_sales = list(df_aggregated[-13:].volume)
for index in range(0,len(pred_test_set_inverted)):
    result_dict = {}
    result_dict['pred_value'] = int(pred_test_set_inverted[index][0] + act_sales[index])
    result_dict['Monthly'] = sales_dates[index]
    result_list.append(result_dict)
df_result = pd.DataFrame(result_list)
df_result.head()

Unnamed: 0,pred_value,Monthly
0,79048,2016-07-01
1,78707,2016-08-01
2,76032,2016-09-01
3,78078,2016-10-01
4,76715,2016-11-01


In [0]:
# We merge the previous sales and predicted sales for vizualisation
df_sales_pred = pd.merge(df_aggregated,df_result,on='Monthly',how='left')

# Plot actual and predicted
plot_data = [
    go.Scatter(
        x=df_sales_pred['Monthly'],
        y=df_sales_pred['volume'],
        name='actual'
    ),
        go.Scatter(
        x=df_sales_pred['Monthly'],
        y=df_sales_pred['pred_value'],
        name='predicted'
    )
    
]
plot_layout = go.Layout(
        title='Sales Prediction'
    )
fig = go.Figure(data=plot_data, layout=plot_layout)
pyoff.iplot(fig)

As you can see above, the multilayer perceptron is a bit optimistic. However, when we compare to the RNN with batch_size 32, we could add the two output - perceptron and RNN_32 - then divide them by 2 and see if the results are even better.

## Part 5 : Results

As the graphs illustrate, **the RNN with a batch_size of 32 gave the best results** - that can be explained because a higher batch gives more homogeneity.
Multilayers perceptron gave also a good resultats but with a bit of too much optimism.
**RNN with a batch size of 1 gave the worst results**, probably due to overfitting - as the loss being the lowest of 3. One could underline the fact we didn't use dropout on RNN with batch_size of 1, maybe it would have landed good relsults.
Also, it could be interesting to add the outputs of RNN + multilayers perceptron and divide them by 2 to see the results - as RNN is a bit unoptimistic and perceptron a bit too much.