# Introduction

**Task:** create a model to forecast the total amount of pieces of *each item* sold in every *shop* in a month and apply it on the test set

**Files descriptions :**
* **sales_train.csv** - the training set. Daily historical data from January 2013 to October 2015.
* **test.csv** - the test set. You need to forecast the sales for these shops and products for November 2015.
* **sample_submission.csv** - a sample submission file in the correct format.
* **items.csv** - supplemental information about the items/products.
* **item_categories.csv**  - supplemental information about the items categories.
* **shops.csv**- supplemental information about the shops.

**Data fields**
* **ID** - an Id that represents a (Shop, Item) tuple within the test set
* **shop_id** - unique identifier of a shop
* **item_id** - unique identifier of a product
* **item_category_id** - unique identifier of item category
* **item_cnt_day** - number of products sold. You are predicting a monthly amount of this measure
* **item_price** - current price of an item
* **date** - date in format dd/mm/yyyy
* **date_block_num** - a consecutive month number, used for convenience. January 2013 is 0, February 2013 is 1,..., October 2015 is 33
* **item_name** - name of item
* **shop_name** - name of shop
* **item_category_name** - name of item category

# Importing Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt 
import tensorflow as tf
import pandas as pd
from tensorflow import keras
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense


# Data Importation

In [None]:
train_ds = pd.read_csv('../input/competitive-data-science-predict-future-sales/sales_train.csv')
train_ds.head(10)

In [None]:
test_ds = pd.read_csv('../input/competitive-data-science-predict-future-sales/test.csv')
test_ds.head(10)

# Data Preparation

We create a new dataset that contains the total amount of each item sold for each shop in each month:

In [None]:
monthly_data = train_ds.pivot_table(
    index = ['shop_id','item_id'],
    values = ['item_cnt_day'],
    columns = ['date_block_num'],
    fill_value = 0,
    aggfunc='sum')

In [None]:
monthly_data.head(10)

In [None]:
monthly_data.tail(10)

In [None]:
monthly_data.reset_index(inplace = True)
monthly_data.head()

In [None]:
train_data = monthly_data.drop(columns= ['shop_id','item_id'], level=0)

In [None]:
train_data.head()

In [None]:
train_data.fillna(0,inplace = True)
train_data.head()

In [None]:
train_data.head()

In [None]:
x_train = np.expand_dims(train_data.values[:,:-1],axis = 2)
y_train = train_data.values[:,-1:]

In [None]:
test_rows = monthly_data.merge(
    test_ds,
    on = ['item_id','shop_id'],
    how = 'right')

In [None]:
test_rows.head()

In [None]:
x_test = test_rows.drop(test_rows.columns[:5], axis=1).drop('ID', axis=1)

In [None]:
x_test.fillna(0,inplace = True)

In [None]:
x_test.head()

In [None]:
x_test = np.expand_dims(x_test,axis = 2)

In [None]:
print(x_train.shape,y_train.shape,x_test.shape)

# Model Creation and Training

In [None]:
model = tf.keras.models.Sequential()    
model.add(LSTM(64, input_shape=(33, 1), return_sequences=False))
model.add(Dense(1))
    
model.compile(
    loss = 'mse',
    optimizer = 'adam', 
    metrics = ['mean_squared_error']        
)

In [None]:
history = model.fit(
    x_train, 
    y_train, 
    epochs=10, 
    batch_size=4096,
    verbose=1, 
    shuffle=True,
    validation_split=0.4)

# Evaluation

In [None]:
plt.plot(history.history["loss"], color="r")
plt.plot(history.history["val_loss"], color="g")
plt.legend(["Training", "Validation"])
plt.xlabel("epochs")
plt.ylabel("loss")
plt.show()

# Forecasting

In [None]:
test_predict = model.predict(x_test)


In [None]:
submission = pd.DataFrame({'ID':test_ds['ID'],'item_cnt_month':test_predict.ravel()})
submission['item_cnt_month'] = submission['item_cnt_month']
submission.to_csv('submission.csv',index = False)