<a href="https://colab.research.google.com/github/ikyath/M5-Forecasting-Accuracy-Kaggle/blob/master/M5_Forecast_Encoder_Decoder_BetterFeatures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



#**M5 Forecasting - Accuracy**

**Estimate the unit sales of Walmart retail goods**



How much camping gear will one store sell each month in a year? To the uninitiated, calculating sales at this level may seem as difficult as predicting the weather. Both types of forecasting rely on science and historical data. While a wrong weather forecast may result in you carrying around an umbrella on a sunny day, inaccurate business forecasts could result in actual or opportunity losses. 

The Makridakis Open Forecasting Center (MOFC) at the University of Nicosia conducts cutting-edge forecasting research and provides business forecast training. It helps companies achieve accurate predictions, estimate the levels of uncertainty, avoiding costly mistakes, and apply best forecasting practices. The MOFC is well known for its Makridakis Competitions, the first of which ran in the 1980s.

# Understand Business Problem 
In the challenge, you are predicting item sales at stores in various locations for two 28-day time periods. Information about the data is found in the [M5 Participants Guide](https://mofc.unic.ac.cy/m5-competition/).

As the given dataset is Multiple Parallel Input and Multi-Step Output prediction. 

We are using Encoder-Decoder LSTM Model [here](https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/)

# Data

The data: We are working with 42,840 hierarchical time series. The data were obtained in the 3 US states of California (CA), Texas (TX), and Wisconsin (WI). The sales information reaches back from Jan 2011 to June 2016. In addition to the sales numbers, we are also given corresponding data on prices, promotions, and holidays.

The data comprises 3049 individual products from 3 categories and 7 departments, sold in 10 stores in 3 states. The hierachical aggregation captures the combinations of these factors. For instance, we can create 1 time series for all sales, 3 time series for all sales per state, and so on. The largest category is sales of all individual 3049 products per 10 stores for 30490 time series.

The training data comes in the shape of 3 separate files:

sales_train.csv: this is our main training data. It has 1 column for each of the 1941 days from 2011-01-29 and 2016-05-22; not including the validation period of 28 days until 2016-06-19. It also includes the IDs for item, department, category, store, and state. The number of rows is 30490 for all combinations of 30490 items and 10 stores.

sell_prices.csv: the store and item IDs together with the sales price of the item as a weekly average.

calendar.csv: dates together with related features like day-of-the week, month, year, and an 3 binary flags for whether the stores in each state allowed purchases with SNAP food stamps at this date (1) or not (0).

In [0]:
pwd


'/content'

Connecting to Personal Google drive and accessing data path

In [0]:
cd /content/drive/My\ Drive/Data\ Science


/content/drive/My Drive/Data Science


Importing necessary libraries 

In [0]:
import pandas as pd
import numpy as np
import re
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
from tqdm.notebook import tqdm as tqdm
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM,Dropout
from keras.layers import RepeatVector,TimeDistributed
from numpy import array
from keras.models import Sequential, load_model
import plotly.graph_objects as go
from plotly.subplots import make_subplots

Using TensorFlow backend.


Importing the data from all the files

In [0]:
sales = pd.read_csv('sales_train_validation.csv')
calendar = pd.read_csv('calendar.csv')
selling_prices = pd.read_csv('sell_prices.csv')
submission_file = pd.read_csv('sample_submission.csv')

## EDA

Lets see data in each file

In [0]:
sales.head()

Unnamed: 0,id,item_id,dept_id,cat_id,store_id,state_id,d_1,d_2,d_3,d_4,d_5,d_6,d_7,d_8,d_9,d_10,d_11,d_12,d_13,d_14,d_15,d_16,d_17,d_18,d_19,d_20,d_21,d_22,d_23,d_24,d_25,d_26,d_27,d_28,d_29,d_30,d_31,d_32,d_33,d_34,...,d_1874,d_1875,d_1876,d_1877,d_1878,d_1879,d_1880,d_1881,d_1882,d_1883,d_1884,d_1885,d_1886,d_1887,d_1888,d_1889,d_1890,d_1891,d_1892,d_1893,d_1894,d_1895,d_1896,d_1897,d_1898,d_1899,d_1900,d_1901,d_1902,d_1903,d_1904,d_1905,d_1906,d_1907,d_1908,d_1909,d_1910,d_1911,d_1912,d_1913
0,HOBBIES_1_001_CA_1_validation,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,1,3,1,3,1,2,2,0,1,1,1,1,0,0,0,0,0,1,0,4,2,3,0,1,2,0,0,0,1,1,3,0,1,1,1,3,0,1,1
1,HOBBIES_1_002_CA_1_validation,HOBBIES_1_002,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,HOBBIES_1_003_CA_1_validation,HOBBIES_1_003,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,2,2,1,2,1,1,1,0,1,1,1
3,HOBBIES_1_004_CA_1_validation,HOBBIES_1_004,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,3,4,2,1,4,1,3,5,0,6,6,0,0,0,0,3,1,2,1,3,1,0,2,5,4,2,0,3,0,1,0,5,4,1,0,1,3,7,2
4,HOBBIES_1_005_CA_1_validation,HOBBIES_1_005,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,0,3,2,2,2,3,1,0,0,0,0,1,0,4,4,0,1,4,0,1,0,1,0,1,1,2,0,1,1,2,1,1,0,1,1,2,2,2,4


In [0]:
calendar.head()

Unnamed: 0,date,wm_yr_wk,weekday,wday,month,year,d,event_name_1,event_type_1,event_name_2,event_type_2,snap_CA,snap_TX,snap_WI
0,2011-01-29,11101,Saturday,1,1,2011,d_1,,,,,0,0,0
1,2011-01-30,11101,Sunday,2,1,2011,d_2,,,,,0,0,0
2,2011-01-31,11101,Monday,3,1,2011,d_3,,,,,0,0,0
3,2011-02-01,11101,Tuesday,4,2,2011,d_4,,,,,1,1,0
4,2011-02-02,11101,Wednesday,5,2,2011,d_5,,,,,1,0,1


In [0]:
selling_prices.head()

Unnamed: 0,store_id,item_id,wm_yr_wk,sell_price
0,CA_1,HOBBIES_1_001,11325,9.58
1,CA_1,HOBBIES_1_001,11326,9.58
2,CA_1,HOBBIES_1_001,11327,8.26
3,CA_1,HOBBIES_1_001,11328,8.26
4,CA_1,HOBBIES_1_001,11329,8.26


In [0]:
selling_prices['store_id'].unique()

array(['CA_1', 'CA_2', 'CA_3', 'CA_4', 'TX_1', 'TX_2', 'TX_3', 'WI_1',
       'WI_2', 'WI_3'], dtype=object)

#Lets look our EDA on one of the store - TX_1

Lets create a dataframe for sales table only for store TX_1 with necessary columns.

In [0]:
sales_melt = pd.melt(sales, id_vars=['id', 'item_id', 'dept_id', 'cat_id', 'store_id', 'state_id'], var_name='day', value_name='demand')

In [0]:
sales_TX_1 = sales_melt[sales_melt.store_id == "TX_1"]
new_TX_1 = pd.merge(sales_TX_1, calendar, left_on="day", right_on="d", how="left")
new_TX_1 = pd.merge(new_TX_1, selling_prices, left_on=["store_id", "item_id", "wm_yr_wk"],right_on=["store_id", "item_id", "wm_yr_wk"], how="left")
new_TX_1["day_int"] = new_TX_1.day.apply(lambda x: int(x.split("_")[-1]))

For each day

For each day we sum over products sell_price and demand

For each day we count_nonzeros over products sell_price and demand

In [0]:
day_sum = new_TX_1.groupby("day_int")[["sell_price", "demand"]].agg("sum").reset_index()

In [0]:
fig = make_subplots(rows=2, cols=1)

fig.add_trace(go.Scatter(x=day_sum.day_int, 
                         y=day_sum.demand,
                         mode="lines",
                         name="demand",
                         ),
              row=1,col=1         
              )

fig.add_trace(go.Scatter(x=day_sum.day_int, 
                         y=day_sum.sell_price,
                         mode="lines",
                         name="sell_price",
                         ),
              row=2,col=1           
              )

fig.update_layout(height=1000, title_text="SUM -> Demand  and Sell_price")
fig.show()

Observation :

From the above sum over product demand we observe that some days are "Zeros" because those are Christmas days, I think in chirstmas day the store was closed. and we observe some patterns over the years.

From the above sum over product sell_price we observe that day-by-day the sells are increasing. at the end its becaming constant.

##Lets take some features from calendar dataframe

Here we are creating a dataframe of 1969 days of all columns from calendar dataframe namely mentioning event types and names across all stores.

In [0]:
days = range(1, 1970)
time_series_columns = [f'd_{i}' for i in days]
transfer_cal = pd.DataFrame(calendar[['event_name_1','event_type_1','event_name_2','event_type_2','snap_CA','snap_TX','snap_WI']].values.T, index=['event_name_1','event_type_1','event_name_2','event_type_2','snap_CA','snap_TX','snap_WI'], columns= time_series_columns)
transfer_cal = transfer_cal.fillna(0)
event_name_1_se = transfer_cal.loc['event_name_1'].apply(lambda x: x if re.search("^\d+$", str(x)) else np.nan).fillna(10)
event_name_2_se = transfer_cal.loc['event_name_2'].apply(lambda x: x if re.search("^\d+$", str(x)) else np.nan).fillna(10)

Below we can see the data mentioned above :
Example SuperBowl event has influenced all stores

In [0]:
transfer_cal

Unnamed: 0,d_1,d_2,d_3,d_4,d_5,d_6,d_7,d_8,d_9,d_10,d_11,d_12,d_13,d_14,d_15,d_16,d_17,d_18,d_19,d_20,d_21,d_22,d_23,d_24,d_25,d_26,d_27,d_28,d_29,d_30,d_31,d_32,d_33,d_34,d_35,d_36,d_37,d_38,d_39,d_40,...,d_1930,d_1931,d_1932,d_1933,d_1934,d_1935,d_1936,d_1937,d_1938,d_1939,d_1940,d_1941,d_1942,d_1943,d_1944,d_1945,d_1946,d_1947,d_1948,d_1949,d_1950,d_1951,d_1952,d_1953,d_1954,d_1955,d_1956,d_1957,d_1958,d_1959,d_1960,d_1961,d_1962,d_1963,d_1964,d_1965,d_1966,d_1967,d_1968,d_1969
event_name_1,0,0,0,0,0,0,0,0,SuperBowl,0,0,0,0,0,0,0,ValentinesDay,0,0,0,0,0,0,PresidentsDay,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,LentStart,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,MemorialDay,0,0,NBAFinalsStart,0,0,0,0,Ramadan starts,0,0,0,0,0,0,0,0,0,0,0,NBAFinalsEnd
event_type_1,0,0,0,0,0,0,0,0,Sporting,0,0,0,0,0,0,0,Cultural,0,0,0,0,0,0,National,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Religious,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,National,0,0,Sporting,0,0,0,0,Religious,0,0,0,0,0,0,0,0,0,0,0,Sporting
event_name_2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Father's day
event_type_2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Cultural
snap_CA,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0
snap_TX,0,0,0,1,0,1,0,1,1,1,0,1,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,0,1,...,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,0,1,0,1,1,1,0,1,0,0,0,0
snap_WI,0,0,0,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,0,1,1,...,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,0,0,0


Below mentioned function is to transform the above mentioned dataframe into features.

Used LabelEncoder for event_name, event_type and stores

Weekday, month and day are cyclic features which are encoded by referring [here](https://ianlondon.github.io/blog/encoding-cyclical-features-24hour-time/)

In [0]:
def transform(data):
    
    nan_features = ['event_name_1', 'event_type_1', 'event_name_2', 'event_type_2']
    for feature in nan_features:
        data[feature].fillna('unknown', inplace = True)
        
    cat = ['event_name_1','event_type_1','event_name_2','event_type_2','snap_CA','snap_TX','snap_WI']
    for feature in cat:
        encoder = preprocessing.LabelEncoder()
        data[feature] = encoder.fit_transform(data[feature]) 

    
    
    data['wday_cos'] = np.cos(data['wday']/(7/(2*np.pi)))
    data['wday_sin'] = np.sin(data['wday']/(7/(2*np.pi))) 

    data['month_cos'] = np.cos(data['month']/(12/(2*np.pi)))
    data['month_sin'] = np.sin(data['month']/(12/(2*np.pi)))

    data['day_cos'] = np.cos(data["date"].dt.day/(31/(2*np.pi)))
    data['day_sin'] = np.sin(data["date"].dt.day/(31/(2*np.pi)))

    return data

In [0]:
calendar['date'] = pd.to_datetime(calendar['date'])
calendar = calendar[calendar['date']>= '2015-1-27']  #We are considering less data due to memory constraints
calendar= transform(calendar)
# Attempts to convert events into time series data.


### Lets look into the transformed data of calendar

In [0]:
calendar.head()

Unnamed: 0,date,wm_yr_wk,weekday,wday,month,year,d,event_name_1,event_type_1,event_name_2,event_type_2,snap_CA,snap_TX,snap_WI,wday_cos,wday_sin,month_cos,month_sin,day_cos,day_sin
1459,2015-01-27,11452,Tuesday,4,1,2015,d_1460,30,4,1,1,0,0,0,-0.900969,-0.4338837,0.866025,0.5,0.688967,-0.7247928
1460,2015-01-28,11452,Wednesday,5,1,2015,d_1461,30,4,1,1,0,0,0,-0.222521,-0.9749279,0.866025,0.5,0.820763,-0.5712682
1461,2015-01-29,11452,Thursday,6,1,2015,d_1462,30,4,1,1,0,0,0,0.62349,-0.7818315,0.866025,0.5,0.918958,-0.3943559
1462,2015-01-30,11452,Friday,7,1,2015,d_1463,30,4,1,1,0,0,0,1.0,-1.133108e-15,0.866025,0.5,0.97953,-0.2012985
1463,2015-01-31,11501,Saturday,1,1,2015,d_1464,30,4,1,1,0,0,0,0.62349,0.7818315,0.866025,0.5,1.0,-2.449294e-16


## Lets filter the required columns and transform the data for better retrieval in future.

In [0]:
transfer_cal = pd.DataFrame(calendar[['event_name_1','event_type_1','event_name_2','event_type_2','snap_CA','snap_TX','snap_WI','day_cos','day_sin','wday_cos','wday_sin','month_cos','month_sin']].values.T,
                            index=['event_name_1','event_type_1','event_name_2','event_type_2','snap_CA','snap_TX','snap_WI','day_cos','day_sin','wday_cos','wday_sin','month_cos','month_sin'])
transfer_cal

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,...,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509
event_name_1,30.0,30.0,30.0,30.0,30.0,26.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,28.0,30.0,22.0,30.0,11.0,30.0,30.0,30.0,30.0,30.0,30.0,12.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,23.0,30.0,30.0,...,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,14.0,30.0,30.0,17.0,30.0,30.0,30.0,30.0,24.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,16.0
event_type_1,4.0,4.0,4.0,4.0,4.0,3.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,0.0,4.0,1.0,4.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,2.0,4.0,4.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,1.0,4.0,4.0,3.0,4.0,4.0,4.0,4.0,2.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,3.0
event_name_2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
event_type_2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
snap_CA,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
snap_TX,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,...,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
snap_WI,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,...,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0
day_cos,0.688967,0.820763,0.918958,0.9795299,1.0,0.97953,0.918958,0.820763,0.688967,0.528964,0.3473053,0.151428,-0.050649,-0.250653,-0.440394,-0.612106,-0.758758,-0.8743466,-0.954139,-0.994869,-0.994869,-0.954139,-0.874347,-0.758758,-0.612106,-0.440394,-0.250653,-0.050649,0.151428,0.347305,0.528964,0.6889669,0.820763,0.9795299,0.9189578,0.8207634,0.6889669,0.528964,0.3473053,0.1514278,...,-0.612106,-0.758758,-0.8743466,-0.954139,-0.994869,-0.994869,-0.954139,-0.874347,-0.758758,-0.612106,-0.440394,-0.250653,-0.050649,0.151428,0.347305,0.528964,0.6889669,0.820763,0.918958,0.97953,1.0,0.9795299,0.9189578,0.8207634,0.6889669,0.528964,0.3473053,0.1514278,-0.05064917,-0.2506525,-0.4403942,-0.612106,-0.7587581,-0.8743466,-0.9541393,-0.9948693,-0.9948693,-0.9541393,-0.8743466,-0.7587581
day_sin,-0.724793,-0.571268,-0.394356,-0.2012985,-2.449294e-16,0.201299,0.394356,0.571268,0.724793,0.848644,0.9377521,0.988468,0.998717,0.968077,0.897805,0.790776,0.651372,0.485302,0.299363,0.101168,-0.101168,-0.299363,-0.485302,-0.651372,-0.7907757,-0.897805,-0.968077,-0.998717,-0.988468,-0.937752,-0.848644,-0.7247928,-0.571268,0.2012985,0.3943559,0.5712682,0.7247928,0.8486443,0.9377521,0.9884683,...,0.790776,0.651372,0.485302,0.299363,0.101168,-0.101168,-0.299363,-0.485302,-0.651372,-0.7907757,-0.897805,-0.968077,-0.998717,-0.988468,-0.937752,-0.848644,-0.7247928,-0.571268,-0.394356,-0.201299,-2.449294e-16,0.2012985,0.3943559,0.5712682,0.7247928,0.8486443,0.9377521,0.9884683,0.9987165,0.9680771,0.8978045,0.7907757,0.6513725,0.485302,0.2993631,0.1011683,-0.1011683,-0.2993631,-0.485302,-0.6513725
wday_cos,-0.900969,-0.222521,0.62349,1.0,0.6234898,-0.222521,-0.900969,-0.900969,-0.222521,0.62349,1.0,0.62349,-0.222521,-0.900969,-0.900969,-0.222521,0.62349,1.0,0.62349,-0.222521,-0.900969,-0.900969,-0.222521,0.62349,1.0,0.62349,-0.222521,-0.900969,-0.900969,-0.222521,0.62349,1.0,0.62349,-0.2225209,-0.9009689,-0.9009689,-0.2225209,0.6234898,1.0,0.6234898,...,-0.222521,0.62349,1.0,0.62349,-0.222521,-0.900969,-0.900969,-0.222521,0.62349,1.0,0.62349,-0.222521,-0.900969,-0.900969,-0.222521,0.62349,1.0,0.62349,-0.222521,-0.900969,-0.9009689,-0.2225209,0.6234898,1.0,0.6234898,-0.2225209,-0.9009689,-0.9009689,-0.2225209,0.6234898,1.0,0.6234898,-0.2225209,-0.9009689,-0.9009689,-0.2225209,0.6234898,1.0,0.6234898,-0.2225209


In [0]:
transfer_cal.shape

(13, 510)

 So far we have 13 features 

#Lets look into Selling prices dataframe for extra features

Lets join the calendar and selling price dataframe based on date columns

In [0]:
price_feature = calendar[['wm_yr_wk','date']].merge(selling_prices, on = ['wm_yr_wk'], how = 'left')
price_feature['id'] = price_feature['item_id']+'_'+price_feature['store_id']+'_validation'


Lets view some values in new dataframe

In [0]:
price_feature.head()

Unnamed: 0,wm_yr_wk,date,store_id,item_id,sell_price,id
0,11452,2015-01-27,CA_1,HOBBIES_1_001,8.26,HOBBIES_1_001_CA_1_validation
1,11452,2015-01-27,CA_1,HOBBIES_1_002,3.97,HOBBIES_1_002_CA_1_validation
2,11452,2015-01-27,CA_1,HOBBIES_1_003,2.97,HOBBIES_1_003_CA_1_validation
3,11452,2015-01-27,CA_1,HOBBIES_1_004,4.64,HOBBIES_1_004_CA_1_validation
4,11452,2015-01-27,CA_1,HOBBIES_1_005,2.88,HOBBIES_1_005_CA_1_validation


We will convert the dataframe over reduced time which will be helpful in connecting features from various dataframes easily

In [0]:
df = price_feature.pivot('id','date','sell_price')

In [0]:
df.head()

date,2015-01-27,2015-01-28,2015-01-29,2015-01-30,2015-01-31,2015-02-01,2015-02-02,2015-02-03,2015-02-04,2015-02-05,2015-02-06,2015-02-07,2015-02-08,2015-02-09,2015-02-10,2015-02-11,2015-02-12,2015-02-13,2015-02-14,2015-02-15,2015-02-16,2015-02-17,2015-02-18,2015-02-19,2015-02-20,2015-02-21,2015-02-22,2015-02-23,2015-02-24,2015-02-25,2015-02-26,2015-02-27,2015-02-28,2015-03-01,2015-03-02,2015-03-03,2015-03-04,2015-03-05,2015-03-06,2015-03-07,...,2016-05-11,2016-05-12,2016-05-13,2016-05-14,2016-05-15,2016-05-16,2016-05-17,2016-05-18,2016-05-19,2016-05-20,2016-05-21,2016-05-22,2016-05-23,2016-05-24,2016-05-25,2016-05-26,2016-05-27,2016-05-28,2016-05-29,2016-05-30,2016-05-31,2016-06-01,2016-06-02,2016-06-03,2016-06-04,2016-06-05,2016-06-06,2016-06-07,2016-06-08,2016-06-09,2016-06-10,2016-06-11,2016-06-12,2016-06-13,2016-06-14,2016-06-15,2016-06-16,2016-06-17,2016-06-18,2016-06-19
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
FOODS_1_001_CA_1_validation,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,...,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24
FOODS_1_001_CA_2_validation,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,...,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24
FOODS_1_001_CA_3_validation,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,...,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24
FOODS_1_001_CA_4_validation,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,...,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24
FOODS_1_001_TX_1_validation,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,...,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24,2.24


In [0]:
price_df = sales.merge(df,on=['id'],how= 'left').iloc[:,-300:]
price_df.index = sales.id
price_df.head()

Unnamed: 0_level_0,2015-08-25,2015-08-26,2015-08-27,2015-08-28,2015-08-29,2015-08-30,2015-08-31,2015-09-01,2015-09-02,2015-09-03,2015-09-04,2015-09-05,2015-09-06,2015-09-07,2015-09-08,2015-09-09,2015-09-10,2015-09-11,2015-09-12,2015-09-13,2015-09-14,2015-09-15,2015-09-16,2015-09-17,2015-09-18,2015-09-19,2015-09-20,2015-09-21,2015-09-22,2015-09-23,2015-09-24,2015-09-25,2015-09-26,2015-09-27,2015-09-28,2015-09-29,2015-09-30,2015-10-01,2015-10-02,2015-10-03,...,2016-05-11,2016-05-12,2016-05-13,2016-05-14,2016-05-15,2016-05-16,2016-05-17,2016-05-18,2016-05-19,2016-05-20,2016-05-21,2016-05-22,2016-05-23,2016-05-24,2016-05-25,2016-05-26,2016-05-27,2016-05-28,2016-05-29,2016-05-30,2016-05-31,2016-06-01,2016-06-02,2016-06-03,2016-06-04,2016-06-05,2016-06-06,2016-06-07,2016-06-08,2016-06-09,2016-06-10,2016-06-11,2016-06-12,2016-06-13,2016-06-14,2016-06-15,2016-06-16,2016-06-17,2016-06-18,2016-06-19
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
HOBBIES_1_001_CA_1_validation,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,8.26,...,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38,8.38
HOBBIES_1_002_CA_1_validation,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,...,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97,3.97
HOBBIES_1_003_CA_1_validation,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,...,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97,2.97
HOBBIES_1_004_CA_1_validation,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,...,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64,4.64
HOBBIES_1_005_CA_1_validation,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,...,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88,2.88


Here we get demand values for all products for complete 1913 days

In [0]:
days = range(1, 1913 + 1)
time_series_columns = [f'd_{i}' for i in days]
time_series_data = sales[time_series_columns]  #Get time series data

In [0]:
time_series_data.shape

(30490, 1913)

Lets create a Numpy array which consists of all features and demand(Dependent variable) for all products in last 100 days

In [0]:
X = []   #build a data with two features(salse and event1)
for i in tqdm(range(time_series_data.shape[0])):
    X.append([list(t) for t in zip(transfer_cal.loc['event_name_1'][-(100+28):-(28)],
                                   transfer_cal.loc['event_type_1'][-(100+28):-(28)],
                                   transfer_cal.loc['event_name_2'][-(100+28):-(28)],     
                                   transfer_cal.loc['event_type_2'][-(100+28):-(28)],
                                   transfer_cal.loc['snap_CA'][-(100+28):-(28)],
                                   transfer_cal.loc['snap_TX'][-(100+28):-(28)],
                                   transfer_cal.loc['snap_WI'][-(100+28):-(28)],
                                   transfer_cal.loc['day_sin'][-(100+28):-(28)],
                                   transfer_cal.loc['day_cos'][-(100+28):-(28)],
                                   transfer_cal.loc['wday_sin'][-(100+28):-(28)],
                                   transfer_cal.loc['wday_cos'][-(100+28):-(28)],
                                   transfer_cal.loc['month_sin'][-(100+28):-(28)],
                                   transfer_cal.loc['month_cos'][-(100+28):-(28)],
                                   price_df.iloc[i][-(100+28):-(28)],
                                   time_series_data.iloc[i][-100:])]) 

X = np.asarray(X, dtype=np.float32)

HBox(children=(IntProgress(value=0, max=30490), HTML(value='')))

In [0]:
X.shape

We have a numpy array of 30490 Products, 100 time steps data with 14 features and 1 dependent variable

Below is the function to normalize and renormalize the data, we have used min-max normalization

In [0]:
def Normalize(list):
    list = np.array(list)
    low, high = np.percentile(list, [0, 100])
    delta = high - low
    if delta != 0:
        for i in range(0, len(list)):
            list[i] = (list[i]-low)/delta
    return  list,low,high

def ReverseNoramlize(list,low,high):
    delta = high - low
    if delta != 0:
        for i in range(0, len(list)):
            list[i] = list[i]*delta + low
    return list

Lets Build Encoder and Decoder model with repeat vector and time distributed layers

Here we are training data to predict last 28days by inputing data of 56 days prior.


In [0]:
np.random.seed(7)

 ## we only used the last 84 days for train_data.
if __name__ == '__main__':
    train_n,train_low,train_high = Normalize(X[:,-(28*3):,:])
    X_train = train_n[:,-28*3:-28,:14]
    print(X_train.shape)
    y = train_n[:,-28:,14]  #     
    # reshape from [samples, timesteps] into [samples, timesteps, features]
    n_features = 15
    n_out_seq_length =28
    num_y = 1
    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], n_features-1))
    y = y.reshape((y.shape[0], y.shape[1], 1))
    print(X_train.shape)
    print(y.shape)
    # define model

    model = Sequential()

    
    model.add(LSTM(128, activation='tanh', input_shape=(56, n_features-1),return_sequences=True))
    model.add(LSTM(64, activation='tanh',return_sequences=False))
    model.add(RepeatVector(n_out_seq_length)) 
    model.add(LSTM(32, activation='tanh',return_sequences=True))
    model.add(LSTM(16, activation='tanh',return_sequences=True))
    model.add(Dropout(0.1))  
    model.add(TimeDistributed(Dense(num_y)))
    model.compile(optimizer='adam', loss='mse')
    # demonstrate prediction
    model.fit(X_train, y, epochs=20, batch_size=1000)

(30490, 56, 14)
(30490, 56, 14)
(30490, 28, 1)
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


Now we feed the last 56days to the model to predict upcoming 28days

In [0]:
x_input = array(X_train[:,-56:])
x_input = x_input.reshape((30490, 56, n_features-1))
print(x_input.shape)
#x_input = Normalize2(x_input,train_low,train_high)
yhat = model.predict(x_input[:,-56:], verbose=0)
x_input=np.concatenate((x_input[:,:,13].reshape(x_input.shape[0],x_input.shape[1]),yhat.astype(np.float32).reshape(x_input.shape[0],x_input.shape[1]-28)),axis=1).reshape((x_input.shape[0],x_input.shape[1]+28,1));
#print(yhat)
print(x_input.shape)

(30490, 56, 14)
(30490, 84, 1)


We reverse normalize the predicted output

In [0]:
x_input = ReverseNoramlize(x_input,train_low,train_high)
# x_input = np.rint(x_input)

 Lets look into the output data

In [0]:
forecast = pd.DataFrame(x_input.reshape(x_input.shape[0],x_input.shape[1])).iloc[:,-28:]
forecast.columns = [f'F{i}' for i in range(1, forecast.shape[1] + 1)]
forecast[forecast < 0] =0
forecast.head()

Unnamed: 0,F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12,F13,F14,F15,F16,F17,F18,F19,F20,F21,F22,F23,F24,F25,F26,F27,F28
0,1.018306,1.021447,1.011335,0.999818,0.991536,0.987529,0.987243,0.989598,0.993497,0.998041,1.002583,1.006711,1.010201,1.012966,1.01501,1.016392,1.017201,1.017536,1.017493,1.017164,1.016628,1.01595,1.015184,1.014374,1.013553,1.012745,1.011967,1.011233
1,1.136404,1.293931,1.432064,1.54377,1.628143,1.687643,1.726271,1.748441,1.758339,1.759598,1.755185,1.747401,1.737951,1.728034,1.718448,1.709681,1.701994,1.695486,1.69015,1.68591,1.682652,1.680244,1.678553,1.677449,1.676817,1.676553,1.676567,1.676786
2,1.163452,1.356358,1.528486,1.668473,1.774134,1.848249,1.895858,1.922631,1.933961,1.93452,1.928102,1.91763,1.905249,1.892457,1.880231,1.869159,1.859542,1.851481,1.844942,1.839813,1.835937,1.833136,1.831235,1.830066,1.829477,1.829336,1.829531,1.829967
3,1.118311,1.252176,1.367579,1.460382,1.530533,1.580276,1.612916,1.632025,1.640981,1.642723,1.639663,1.633688,1.626206,1.61822,1.610406,1.603185,1.596792,1.591327,1.586797,1.583153,1.580308,1.578163,1.576612,1.575553,1.574891,1.574543,1.574436,1.57451
4,1.165889,1.361982,1.537175,1.67971,1.787291,1.862725,1.911144,1.938334,1.949794,1.950291,1.943693,1.93298,1.920336,1.907285,1.894822,1.883543,1.873753,1.865551,1.858905,1.853696,1.849764,1.846928,1.845008,1.843833,1.843248,1.843118,1.843329,1.843784


Steps to submit the csv file to kaggle

In [0]:
validation_ids = sales['id'].values
evaluation_ids = [i.replace('validation', 'evaluation') for i in validation_ids]

In [0]:
ids = np.concatenate([validation_ids, evaluation_ids])

In [0]:
predictions = pd.DataFrame(ids, columns=['id'])
forecast = pd.concat([forecast]*2).reset_index(drop=True)
predictions = pd.concat([predictions, forecast], axis=1)

In [0]:
predictions.to_csv('submission.csv', index=False)  #Generate the csv file.

## Conclusion

We are only using last 56 days of data to predict upcoming 28 days product demand which is of not a good step due to memory issues we are not able to do that extent.

In Future, we want to add more feature such as rolling selling prices etc., which might help to improve the score

Currently, we are learning PyTorch we are sure that by replacing numpy with tensors will help us to use more extent of data in training the model.
Which will eventually help in predicting better outputs.

# References

https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

https://www.kaggle.com/gopidurgaprasad/m5-encoder-decoder-pytorch

https://machinelearningmastery.com/encoder-decoder-recurrent-neural-network-models-neural-machine-translation/

https://www.kaggle.com/tarunpaparaju/m5-competition-eda-models

https://datascience.stackexchange.com/questions/46491/what-is-the-job-of-repeatvector-and-timedistributed

https://www.kaggle.com/headsortails/back-to-predict-the-future-interactive-m5-eda

#License


Copyright 2020 Satya Ikyath Varma Dantuluri, Harika Reddy Gurram

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.