In [None]:
#! pip install lale
#! pip install lime

# Store Sales Forecasting & Discount Strategy:  

**Goal:** 
* Exploratory Data Analysis to describe and clean the data, and to understand attributes
* Feature selection to keep only important attributes
* Developing a framework to evaluate and spot-check algorithms
* Predicting and explaining future sales
* Identifying the right time for discount strategies

**Data set description:**

* stores.csv

This file contains anonymized information about the 45 stores, indicating the type and size of the store.

* train.csv

This is the historical training data, which covers to 2010-02-05 to 2012-11-01. Within this file you will find the following fields:  

Store - the store number  
Dept - the department number   
Date - the week 
Weekly_Sales -  sales for the given department in the given store  
IsHoliday - whether the week is a special holiday week  

* test.csv

This file is identical to train.csv, except we have withheld the weekly sales. You must predict the sales for each triplet of store, department, and date in this file.  

* features.csv

This file contains additional data related to the store, department, and regional activity for the given dates. It contains the following fields:  

Store - the store number  
Date - the week  
Temperature - the average temperature in the region  
Fuel_Price - the cost of fuel in the region  
MarkDown1-5 - anonymized data related to promotional markdowns that Walmart is running. MarkDown data is only available after Nov 2011, and is not available for all stores all the time. Any missing value is marked with an NA.  
CPI - the consumer price index   
Unemployment - the unemployment rate   
IsHoliday - whether the week is a special holiday week  

> # 1. Prepare Problem
> # 1.a) Load libraries

In [None]:
import sys
import IPython
import pandas as pd
from pandas import set_option
set_option('display.width', 160)
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import display
%matplotlib inline
import plotly.offline as py
import plotly.graph_objs as go
import plotly.tools as tls
py.init_notebook_mode(connected=True)
import warnings
warnings.filterwarnings('ignore')
from scipy.stats import skew
from scipy.stats import kurtosis

> # 1.b) Load dataset

In [None]:
train=pd.read_csv("/kaggle/input/new-walmart-data/train.csv", parse_dates=["Date"])
test=pd.read_csv("/kaggle/input/new-walmart-data/test.csv", parse_dates=["Date"])
stores=pd.read_csv("/kaggle/input/walmart-recruiting-store-sales-forecasting/stores.csv")
features = pd.read_csv("/kaggle/input/new-walmart-data/features.csv", parse_dates=["Date"])
# merging attributes into one dataset
train = train.merge(stores, how='left').merge(features, how='left')
test = test.merge(stores, how='left').merge(features, how='left')

**Data analysis :**

Why? It is important to know your data, extract the maximum information from it and gather as many insights from it, to get the best results. We will summarize the data and try to understand the relationships in it, by using statistical tools.  

> # 2. Summarize Data  
> # 2.a) Cleaning data

In [None]:
train.info()
train.head()

In [None]:
test.info()
test.head()

We can see that the test dataset don't contain the features included in the train dataset, taking into consideration that these features (Temperature, Fuel price, MarkDowns, CPI and Unemployment) cannot be used in the test dataset due to their high dependences on the date, so it will be a good idea to delete them. but before that, we will make sure that these features don't provide any information on the target 'Weekly_Sales'. 

In [None]:
# Checking for null values
train.isnull().mean()*100

The MarkDowns attributes have more than 64% of null values, in addition, the lack of their meaning make it difficult to fill the columns with appropriate values. 
Let's check their correlations with the target 'Weekly_Sales'

In [None]:
corr = train[['Weekly_Sales','Temperature', 'Fuel_Price', 'MarkDown1', 'MarkDown2',
                 'MarkDown3', 'MarkDown4', 'MarkDown5', 'CPI', 'Unemployment']].corr()
fig, ax = plt.subplots(figsize=(18, 12))
ax = sns.heatmap(corr , vmin=-1, vmax=1, annot=True, fmt='.2f', cmap='coolwarm')
plt.show()

In [None]:
abs(corr["Weekly_Sales"])

Correlations of these features with the target 'WeeklySales' are approximately 0. So it's save to delete them.

In [None]:
train = train.drop(['Temperature', 'Fuel_Price', 'MarkDown1', 'MarkDown2',
                    'MarkDown3', 'MarkDown4', 'MarkDown5', 'CPI', 'Unemployment'], axis = 1)
test = test.drop(['Temperature', 'Fuel_Price', 'MarkDown1', 'MarkDown2',
                    'MarkDown3', 'MarkDown4', 'MarkDown5', 'CPI', 'Unemployment'], axis = 1)
train.head()

Another useful step is to facilate the acces to the 'Date' attribute by splitting it into its componenents (i.e. Year, Month and week).

In [None]:
train['week']=train['Date'].dt.week
train['year']=train['Date'].dt.year
test['week']=test['Date'].dt.week
test['year']=test['Date'].dt.year
del test['Date']
del train['Date']
train.head()

Last step is to treat the categorigal attributes, namely, Type and IsHoliday:

In [None]:
# Let's count and see the distinct value of Type :  
print(train[['Type', 'IsHoliday']].nunique())
Types = train['Type'].unique()
Holi = train['IsHoliday'].unique()
print(f'The 3 types of Store : {Types}')
print(f'The Holiday Flag : {Holi}')

In [None]:
from sklearn.preprocessing import LabelEncoder

train['Type']=LabelEncoder().fit_transform(train['Type'])
train['IsHoliday']=LabelEncoder().fit_transform(train['IsHoliday'])
Types = train['Type'].unique()
Holi = train['IsHoliday'].unique()
print(f'The new 3 types of Store : {Types}')
print(f'The Holiday Flag : {Holi}')

# Test set
test['IsHoliday']=LabelEncoder().fit_transform(test['IsHoliday'])
test['Type']=LabelEncoder().fit_transform(test['Type'])

Now we proceed to analyse each attribute.

> # 2.b) Descriptive statistics & data visualizations:
# Weekly_Sales

In [None]:
# Weekly_Sales statistics:
train['Weekly_Sales'].describe()

Weekly_Sales mean is much higher than the median (50%), which make its distribution skewed to the right.

In [None]:
sns.kdeplot(train['Weekly_Sales'], shade=True);
plt.show()
print("%s: mean (%f), variance (%f), skewness (%f), kurtosis (%f)" 
      % ('Weekly_Sales', np.mean(train['Weekly_Sales']), np.var(train['Weekly_Sales']),
         skew(train['Weekly_Sales']), kurtosis(train['Weekly_Sales'])))

The plot makes the right skewness clear, so most weeks have sales around the median.  
Also, we can see that the Weekly_Sales attribute has a large kurtosis which indicates the presence of extreme values, in other words, some weeks have high sales. It would be a good idea to know the origins of these extreme values. 

We can have some idea on attributes contributing to the target Weekly_Sales, by calculating correlations : 

In [None]:
# Weekly_Sales Correlations:
cor_Sales = abs(train.corr()['Weekly_Sales']).sort_values(ascending=False)
print(cor_Sales, '\n')

* At first glance, the characteristics of the store (Type, Size and Departments) appear to have some information on the target.

# Type

In [None]:
# Type Correlations:
cor_Type = abs(train.corr()['Type']).sort_values(ascending=False)
print(cor_Type, '\n')

The store Type is highly correlated with its Size.

In [None]:
# The portion of each type of store:
print(train.groupby('Type').size())
OutLabels = [str(train['Type'].unique()[i]) for i in range(train['Type'].nunique())]
OutValues = [train['Type'].value_counts()[i] for i in range(train['Type'].nunique())]
pie=go.Pie(labels=OutLabels,values=OutValues)
py.iplot([pie])

It's clear from the Pie chart that Type 0 is the main type with more than half stores, followed by Type 1 store, then Type 2.

Let's investigate the relation beetwen the store Type and other attributes.

In [None]:
def xyBox(X,Y,showfliers=True, hue=None, figsize=(9, 6)) :
    f, ax = plt.subplots(figsize=figsize)
    fig = sns.boxplot(x=X, y=Y, data=train, showfliers=showfliers,hue=hue, showmeans=True, meanline=True, meanprops = dict(linestyle='--', linewidth=2.5, color='red'))
    fig.text(-1.5, 0, 'mean : ---', color='red', weight='roman',size=14)

In [None]:
xyBox('Type', 'Size')

By neglecting the outliers, we can say that the store Type determines the size of it, Type 0 has the highest size and Type 2 has the lowest.  
-> The two attributes are highly correlated (~0.81)

In [None]:
xyBox('Type', 'Store')

Type 0 includes a large range of possible stores, which overlaps with other Types. As a result, we have a moderate correlation (~0.22) between Type and Store.  

In [None]:
xyBox('Type', 'Weekly_Sales',showfliers=False)

We can see that store Type 0 has the highest median (mean) in terms of weekly sales, and Type 2 has the lowest.  

Otherwise, the range of Weekly_Sales overlaps in the 3 Types, which reduces the correlation value to ~0.18.    

-> The store Type may provide information on the target.

**Conclusion:**
* The store Type may provide information on the target.  
* The store Type give implicitly information on the size of the store.
* Feature selection method to keep either the Size or the Type, later on.

# Size

In [None]:
# Size Correlations:
cor_Size = abs(train.corr()['Size']).sort_values(ascending=False)
print(cor_Size, '\n')

In [None]:
# Size statistics:
train['Size'].describe()

In [None]:
# Size distribution :
sns.distplot(train['Size']);
plt.show()

The size has a balanced distribution (mean ~ median), and a large range.

Roughly speaking, the store Size distribution has 3 "peaks", which can be attributed to the store Type as seen before.  

In [None]:
# Size relation with Weekly_Sales
sns.jointplot(x='Size', y='Weekly_Sales', data=train, height=8)
plt.show()

We can see that there is some random high sale values appear as the size increases, which corresponds to the moderate correlation between the Size and Weekly_Sales  (~0.24).

In [None]:
# Size relation with Store
sns.jointplot(x='Size', y='Store', data=train, height=8)
plt.show()

We can't recognize any pattern between the Size and Stores, which explains the poor correlation of ~0.18.

**Conclusion:**
* The Size may provide information on the target.  
* The Size is not correlated with other attributes, except the Type.

# Store

In [None]:
# Store Correlations:
cor_Store = abs(train.corr()['Store']).sort_values(ascending=False)
print(cor_Store, '\n')

There is only small correlations between Store and Type, Size.

In [None]:
fig, ax = plt.subplots(figsize=(18, 12))
fig = sns.boxplot(x='Store', y='Weekly_Sales', data=train, showfliers=False, showmeans=True, meanline=True, meanprops = dict(linestyle='--', linewidth=2.5, color='red'))
fig.text(-8, 0, 'mean : ---', color='red', weight='roman',size=14)
plt.show()

Sales are different for each store, so they may depend on the Store.

**Conclusion:**
* The Store may provide some information on Sales.

# Dept

In [None]:
# Dept Correlations:
cor_Dept = abs(train.corr()['Dept']).sort_values(ascending=False)
print(cor_Dept, '\n')

The Dept is moderately correlated with Weekly_Sales.

In [None]:
# Dept relation with Weekly_Sales
fig, ax = plt.subplots(figsize=(18, 12))
fig = sns.boxplot(x='Dept', y='Weekly_Sales', data=train, showfliers=False, showmeans=True, meanline=True, meanprops = dict(linestyle='--', linewidth=2.5, color='red'))
fig.text(-8, 0, 'mean : ---', color='red', weight='roman',size=14)
plt.show()

The Departements show distinct range of Sales, which may tell something about them.

**Conclusion:**  
* Each Departement may contain some information on the target. 

# Week

In [None]:
# Week Correlations:
cor_week = abs(train.corr()['week']).sort_values(ascending=False)
print(cor_week, '\n')

There is no correlation between the week and Sales.

In [None]:
# Distribution of sales over weeks:
fig, ax = plt.subplots(figsize=(18, 12))
palette = sns.color_palette("mako_r", 3)
fig = sns.lineplot(x='week', y='Weekly_Sales', data=train, hue='year', err_style=None, palette=palette)
plt.xticks(np.arange(1, 53, step=1))
plt.show()

The distribution of Sales over weeks is not impacted by the year, so the year is irrelevant to this situation.  

The Holidays by week:    
Week 6  : Super Bowl  
Week 36 : Labor Day  
Week 47 : Thanksgiving  
Week 52 : Christmas   
This explains peaks in Sales at these weeks. However there is a subtlety concerning the Christmas, the peak in Sales happens at the Week 51, one week before the Christmas week, this can be explained by the fact that people prepare for the Christmas days before, so shifting the Holiday flag for Christmas to Week 51 will be a good choice.  

Another high sales correspond to Holidays omitted from the train set, namely :  
Easter Day       : Week 13(2010), Week 16(2011), Week 14(2012), Week 13(2013) for test set  
Memorial Day     : Week 22  
Independence Day : Week 27  

Changing IsHoliday Flag to '1' in these Weeks, would be a good idea.

In [None]:
# Easter Day
train.loc[(train.year==2010) & (train.week==13), 'IsHoliday'] = 1
train.loc[(train.year==2011) & (train.week==16), 'IsHoliday'] = 1
train.loc[(train.year==2012) & (train.week==14), 'IsHoliday'] = 1
test.loc[(test.year==2013) & (test.week==13), 'IsHoliday'] = 1
# Memorial & Independence Day
train.loc[((train.week==22) | (train.week==27)), 'IsHoliday'] = 1
test.loc[((test.week==22) | (test.week==27)), 'IsHoliday'] = 1
# shifting the Christmas Week
train.loc[(train.week==52), 'IsHoliday'] = 0
train.loc[(train.week==51), 'IsHoliday'] = 1
test.loc[(test.week==52), 'IsHoliday'] = 0
test.loc[(test.week==51), 'IsHoliday'] = 1

**Conclusion:**
* The week doesn't tell much about Sales, except for weeks marked as HoliDays.

# HoliDays

Let's visualize the impact of Holidays on Sales:

In [None]:
fig, ax = plt.subplots(figsize=(12, 8))
fig = sns.stripplot(x='IsHoliday', y='Weekly_Sales', data=train)
plt.show()

As expected, Holiday Sales have some high values.

In [None]:
# Weekly Sales per Dept with the Holiday Flag:
fig, ax = plt.subplots(figsize=(18, 12))
fig = sns.boxplot(x='Store', y='Weekly_Sales', data=train, hue='IsHoliday', showfliers=False)
# Weekly Sales per Store with the Holiday Flag:
fig, ax = plt.subplots(figsize=(18, 12))
fig = sns.boxplot(x='Dept', y='Weekly_Sales', data=train, hue='IsHoliday', showfliers=False)

In general, Sales in Stores and Department increase slightly during Holidays, except for some small number of Departments, for instance, Weekly Sales median of Dept 38 decreases due to low sales during Holidays, maybe because of the nature of products in these Departments.  

In [None]:
# Weekly Sales per store Type with the Holiday Flag:
xyBox('Type','Weekly_Sales', hue='IsHoliday', showfliers=False)

Same result, Weekly Sales increase slightly at Holidays.

**Conclusion:**
* Sales increase at Holidays.

> # 3) Feature selection :  

Why?


After analyzing and knowing the data, it's time to keep only important features, because irrelevant or redundant features impact negatively the accuracy of algorithms and cause overfitting problems. Moreover, reducing features enable the algorithm to train faster and make the model easier to interpret.

First, let's see all features in the data.

In [None]:
# Attributes:
train.columns

Second, let's check features importance using the random forest algorithm.

In [None]:
# Separating the target from data:
X = train.drop('Weekly_Sales',axis=1)
y_train = train['Weekly_Sales']

# importing the random forest algorithm
from sklearn.ensemble import RandomForestRegressor
# fitting the model
rf = RandomForestRegressor()
rf.fit(X,y_train)
# Feature importance:
pd.DataFrame({'Features':X.columns,'Relative Importance':rf.feature_importances_}).sort_values(by='Relative Importance', ascending=False)

As we can see, the most important feature is Dept, followed by Size.  
As seen before, Type and Size are highly correlated and contain the same information, now we can decide what feature to keep, in this case 'Size'.  
As discussed previously, 'year' is irrelevant and don't contain any information on the target -> Drop it from data.

In [None]:
X_train = X.drop(['Type', 'year'],axis=1)

# 4) Evaluate Algorithms

After analysing, cleaning and preparing the data, the next step is to select the best algorithm with the optimal parameters to obtain the best results.  
This step requiers manually selecting the type of data normalization, manually selecting algorithms and tune all hyperparameters, which is a complex task to do. Instead, we will use Auto-ML, which is a series of concepts and techniques used to automate these processes and help reducing bias and errors.  
Auto-ML can be done using Lale, an easy-to-use library and a powerful tool wich serves for hyperparameter tuning and algorithm selection. 

Many algorithms assume normal distribution of the data, especially when features have different ranges like our case, so it is necessary to implement this step in our pipeline.  

For data normalization, Lale will have the following choices :  

* MinMaxscaler: scaling data values in the range [0,1]
* StandardScaler: data distribution will have a mean = 0 and std =1
* PCA: for linear dimensionality reduction
* NoOp: keep the data unchanged  

Algorithms used for spot-checking :  

* LinearRegression
* RandomForestRegressor
* GradientBoostingRegressor
* ExtraTreesRegressor
* KNeighborsRegressor
* SVR


In [None]:
import lale
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, make_scorer
from lale.lib.lale import Hyperopt, NoOp, GridSearchCV
from lale.pretty_print import ipython_display
import lale.schemas as schemas
import lale.helpers
lale.wrap_imported_operators()

For hyperparameter tuning, Lale give us the choice to use its search space or schemas as is, or we can customize the schemas to fit our purposes (e.g. reduce the search space to speed up the search).  
Let's see Lale schemas for the following algorithms and custumize it to reduce the training time.

In [None]:
# RandomForestRegressor hyperparameter to costumize:
print( 'RandomForestRegressor:\n')
ipython_display(RandomForestRegressor.hyperparam_schema('n_estimators'))
ipython_display(RandomForestRegressor.hyperparam_schema('min_samples_leaf'))
# ExtraTreesRegressor hyperparameter to costumize:
print( 'ExtraTreesRegressor:\n')
ipython_display(ExtraTreesRegressor.hyperparam_schema('n_estimators'))
ipython_display(ExtraTreesRegressor.hyperparam_schema('min_samples_leaf'))
# GradientBoostingRegressor hyperparameter to costumize:
print( 'GradientBoostingRegressor:\n')
ipython_display(GradientBoostingRegressor.hyperparam_schema('n_estimators'))
ipython_display(GradientBoostingRegressor.hyperparam_schema('min_samples_leaf'))

As you can see, the search space for number of trees in the forest is in the range [10, 100], for min_samples_leaf is in the range [1, 20] for all algorithms, so we will reduce them to [10, 20] and [1, 5] respectively. 

In [None]:
RandomForestRegressor = RandomForestRegressor.customize_schema(n_estimators=schemas.Int(min=10, max=20),
                                                               min_samples_leaf=schemas.Int(min=1, max=5))

ExtraTreesRegressor = ExtraTreesRegressor.customize_schema(n_estimators=schemas.Int(min=10, max=20),
                                                           min_samples_leaf=schemas.Int(min=1, max=5))

GradientBoostingRegressor = GradientBoostingRegressor.customize_schema(n_estimators=schemas.Int(min=10, max=20),
                                                                       min_samples_leaf=schemas.Int(min=1, max=5))                                                           

Now we will create our pipeline and visualize it easily with Lale.

In [None]:
pipeline = (StandardScaler | MinMaxScaler | PCA | NoOp)  >>  (LinearRegression | KNeighborsRegressor |
               RandomForestRegressor | GradientBoostingRegressor | ExtraTreesRegressor  | SVR )
pipeline.visualize()

We choose the negative mean absolute error metric (which is easier to understand and to interpret) to define the loss function, and let Hyperopt determine the optimal pipeline that minimize the loss.

In [None]:
scoring = 'neg_mean_absolute_error'
pip_selection =  Hyperopt(estimator = pipeline, cv = 3, max_evals = 20, scoring=scoring, max_eval_time=120)

In [None]:
pip_trained = pip_selection.fit(X_train, y_train)

Let's visualize the best pipeline with optimal parameters:

In [None]:
pip_trained.get_pipeline().visualize()
pip_trained.get_pipeline().pretty_print(ipython_display= True, show_imports= False)

The best pipeline don't use any data normalization, which is easy to understand since the selected ExtraTreesRegressor algorithm is not affected by any data transformation.

In [None]:
# Summary of all pipelines:
pip_trained.summary()

In [None]:
best = pip_trained.summary().loss.argmin()
loss_b = pip_trained.summary().loss.min()
worst = pip_trained.summary().loss.argmax()
loss_w = pip_trained.summary().loss.max()
print(f'The best pipeline is {best} with a loss of {loss_b}')
print(f'The worst pipeline is {worst} with a loss of {loss_w}')

# 5) Finalize Model : Predictions on test dataset

Now, we train the model on the entire training dataset using ExtraTreesRegressor algorithm with optimal parameters:

In [None]:
extra_trees_regressor = ExtraTreesRegressor(min_samples_leaf=3, min_samples_split=14, n_estimators=15)
extra_trees_regressor.fit(X_train, y_train)

Preparing the test dataset:

In [None]:
X_test = test.drop(['Type', 'year'], axis=1)

Make and evaluate predictions:

In [None]:
predictions = extra_trees_regressor.predict(X_test)

After making predictions, many questions one can ask, can we trust certain predictions and take actions based on them? 
The answer is provided by LIME, hence, we will use its techniques to explain some predictions.

Firstly, we define the explainer provided by LIME

In [None]:
import lime
import lime.lime_tabular
# Define the explainer:
X_train_arr = X_train.to_numpy()
explainer = lime.lime_tabular.LimeTabularExplainer(X_train_arr, feature_names=X_train.columns, 
            class_names=['Weekly_Sales'], verbose=False, mode='regression',
            categorical_features=['IsHoliday'], discretizer='decile', random_state=5)

Now we select 2 instances, the first isn't a Holiday week, the second is, and we show explanations.

In [None]:
X_test_arr = X_test.to_numpy()
i = 1
exp = explainer.explain_instance(X_test_arr[i], extra_trees_regressor.predict, num_features=5)
exp.show_in_notebook(labels=None, predict_proba=True, show_predicted_value=True)
print(f'Document id: {i}')
print('Weekly Sales prediction:', extra_trees_regressor.predict(X_test_arr)[i])
print ('Explanation for prediction:')
print ('\n'.join(map(str, exp.as_list())))

In [None]:
i = 10055
exp = explainer.explain_instance(X_test_arr[i], extra_trees_regressor.predict, num_features=5)
exp.show_in_notebook(labels=None, predict_proba=True, show_predicted_value=True)
print(f'Document id: {i}')
print('Weekly Sales prediction:', extra_trees_regressor.predict(X_test_arr)[i])
print ('Explanation for prediction:')
print ('\n'.join(map(str, exp.as_list())))

We can see how each attribute contributes to the prediction, either positively or negatively.  
The Holiday Flag is 0 in the first instance, which affects negatively predicted Sales, on the other hand, the Dept 1 has relatively high Sales which positively impacts predicted Sales.  
Otherwise, the large size of the second example contributes significantly to predicted Sales, in contrast to Dept 67, with its relatively low Sales, affects negatively predicted Sales.  
-> we can say that these predictions are trustworthy.

Can we say now that we trust the model to exploit it in real life? To see if we can trust the model, a global explanation of it is required, hence, we will select a representative and non-redundant set of explanations that provide a global perspective on the model.  
This is done using SP-LIME which is an extension of LIME.


In [None]:
# Code for SP-LIME
import warnings
from lime import submodular_pick

sp_obj = submodular_pick.SubmodularPick(explainer, X_test_arr, extra_trees_regressor.predict, num_features=5,num_exps_desired=8)

[exp.show_in_notebook() for exp in sp_obj.sp_explanations]

After analyzing these explanations we can say that they are trustworthy and the model can be trusted as well. 

Assuming the predictions are trustworthy, I will suggest some actions to take based on these predictions.  

I will first plot Weekly Sales over Week, knowing that predictions begin at Week 44, 2012 and end at Week 30, 2013.

In [None]:
# test data
predictions = pd.DataFrame(predictions, columns=['Weekly_Sales'])
test_ = pd.concat([test.drop('Type', axis=1),predictions],axis=1)
# training data
WS_train = pd.DataFrame(train.Weekly_Sales, columns=['Weekly_Sales'])
train_ = pd.concat([X.drop('Type', axis=1),WS_train],axis=1)
# data
data = pd.concat([test_,train_])

In [None]:
# Distribution of sales over weeks:
fig, ax = plt.subplots(figsize=(18, 12))
palette = sns.color_palette("mako_r", 4)
fig = sns.lineplot(x='week', y='Weekly_Sales', data = data, hue='year', err_style=None, palette=palette)
plt.xticks(np.arange(1, 53, step=1))
plt.show()

As you can see, the plot of predicted Sales has the same signature as the original Sales.  
The model predicts low Sales right after Easter Day, starting from Week 14, for that reason planning bundled and volume discounts during these Weeks may improve Sales. The same goes for Weeks before Thanksgiving Holiday.  Moreover, event discounts in some Weeks' Holiday wich have relatively low sales (SuperBowl, Easter Day, Memorial Day, Independence Day and Labor Day) compared to other Weeks' Holiday (Thanksgiving or Christmas), may potentially enhance Sales.  