データ
1. date - every date of items sold
2. date_block_num - this number given to every month
3. shop_id - unique number of every shop
4. item_id - unique number of every item
5. item_price - price of every item
6. item_cnt_day - number of items sold on a particular day 

テストデータ
1. ID - unique for every (shop_id,item_id) pair.
2. shop_id - unique number of every shop
3. item_id - unique number of every item

来月、それぞれのshopからどれだけitemが売れたか
IDとitem_cnt_month

アプローチ
特徴量：最終月を除いた１ヶ月に売れたitem量
テストでは最初の月を除く


In [None]:
import numpy as np
import pandas as pd 
import os

In [None]:
#データ読み込み
os.listdir('../input')
sales_data = pd.read_csv('../input/sales_train.csv')
item_cat = pd.read_csv('../input/item_categories.csv')
items = pd.read_csv('../input/items.csv')
shops = pd.read_csv('../input/shops.csv')
sample_submission = pd.read_csv('../input/sample_submission.csv')
test_data = pd.read_csv('../input/test.csv')


In [None]:
def basic_eda(df):
    print("----------TOP 5 RECORDS--------")
    print(df.head(5))
    print("----------INFO-----------------")
    print(df.info())
    print("----------Describe-------------")
    print(df.describe())
    print("----------Columns--------------")
    print(df.columns)
    print("----------Data Types-----------")
    print(df.dtypes)
    print("-------Missing Values----------")
    print(df.isnull().sum())
    print("-------NULL values-------------")
    print(df.isna().sum())
    print("-----Shape Of Data-------------")
    print(df.shape)
    
    

In [None]:
#Litle bit of exploration of data

print("=============================Sales Data=============================")
basic_eda(sales_data)
print("=============================Test data=============================")
basic_eda(test_data)
print("=============================Item Categories=============================")
basic_eda(item_cat)
print("=============================Items=============================")
basic_eda(items)
print("=============================Shops=============================")
basic_eda(shops)
print("=============================Sample Submission=============================")
basic_eda(sample_submission)



In [None]:
#'date'がobjectになっているので直す
sales_data['date'] = pd.to_datetime(sales_data['date'],format = '%d.%m.%Y')

In [None]:
#月次の売上データが欲しいので横にdate_block_num（月）を、縦にshop_id,item_id,item_cnt_day（日次の売上）
dataset = sales_data.pivot_table(index = ['shop_id','item_id'],values = ['item_cnt_day'],columns = ['date_block_num'],fill_value = 0,aggfunc='sum')
dataset.reset_index(inplace = True)
dataset.head()

In [None]:
#テストデータとmerge
dataset = pd.merge(test_data,dataset,on = ['item_id','shop_id'],how = 'left')

In [None]:
# 欠損値を０に
dataset.fillna(0,inplace = True)
dataset.head()

In [None]:
# shop_idとitem_idはいらないので落とす
dataset.drop(['shop_id','item_id','ID'],inplace = True, axis = 1)
dataset.head()

In [None]:
# 最終月を除いた月から、 
X_train = np.expand_dims(dataset.values[:,:-1],axis = 2)
# 最終月を求める
y_train = dataset.values[:,-1:]

# テストでは最初の月だけ省く
X_test = np.expand_dims(dataset.values[:,1:],axis = 2)

print(X_train.shape,y_train.shape,X_test.shape)


**RNNとLSTM**

時系列データの解析が可能

https://www.renom.jp/ja/notebooks/tutorial/basic_algorithm/LSTM/notebook.html

In [None]:
# ライブラリインポート
from keras.models import Sequential
from keras.layers import LSTM,Dense,Dropout

In [None]:
# モデルを定義
my_model = Sequential()
my_model.add(LSTM(units = 64,input_shape = (33,1)))
my_model.add(Dropout(0.4))
my_model.add(Dense(1))

my_model.compile(loss = 'mse',optimizer = 'adam', metrics = ['mean_squared_error'])
my_model.summary()


**バッチサイズ、エポック数**

https://qiita.com/kenta1984/items/bad75a37d552510e4682

In [None]:
my_model.fit(X_train,y_train,batch_size = 2048,epochs = 5)

In [None]:
#提出ファイル作成
submission_pfs = my_model.predict(X_test)
submission_pfs = submission_pfs.clip(0,20)
submission = pd.DataFrame({'ID':test_data['ID'],'item_cnt_month':submission_pfs.ravel()})
submission.to_csv('sub_pfs.csv',index = False)
