**This page shows how to use Prophet function provided by FaceBook with short comment.**

**If you need to more clarify, you can drop a message to me and/or search "Prophet" using Google.**

**When I get a message, I'll try to reply the answer as much as I can.**

Prophetと呼ばれるFaceBook社が開発した機能を用いた手法で予測を行なったものです。

もしより詳細な情報が必要であればWEBで”Prophet"を検索するか、ここにメッセージを残してもらえれば

可能な限りの回答はできるかもしれません。

**Define use function and configulation**

必要なライブラリなどを読み込みます。

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from IPython.display import display

np.set_printoptions(suppress=True,precision=4)
pd.options.display.float_format='{:.4f}'.format
pd.set_option("display.max_columns",None)
plt.rcParams["font.size"]=14
random_seed=123

from datetime import datetime
from matplotlib.dates import MonthLocator, num2date
from matplotlib.ticker import FuncFormatter

Modify data format for deviding data as train and test.

day, month and year in sales_train.csv is defined as '%d.%m.%Y'.
This is changed to year-month-day.

データフォーマットを変換します。

day.month.yearデータをyear-month-day形式にします。

In [None]:
df1 = pd.read_csv('../input/competitive-data-science-predict-future-sales/sales_train.csv',parse_dates=[0])
from datetime import datetime
f1 = '%d.%m.%Y'
my_parser = lambda date: pd.datetime.strptime(date, f1)
df1m = pd.read_csv('../input/competitive-data-science-predict-future-sales/sales_train.csv', index_col=[0], parse_dates=[0], date_parser=my_parser)
df1m=df1m.reset_index()

Read other datas from csv files

他のデータ類も読み込みます

In [None]:
df2=pd.read_csv("../input/competitive-data-science-predict-future-sales/item_categories.csv")
df3=pd.read_csv("../input/competitive-data-science-predict-future-sales/items.csv")
df4=pd.read_csv("../input/competitive-data-science-predict-future-sales/shops.csv")
df5=pd.read_csv("../input/competitive-data-science-predict-future-sales/test.csv")
df6=pd.read_csv("../input/competitive-data-science-predict-future-sales/sample_submission.csv")

Merge all files to one package(df1m324)

全てのデータを一つのファイルに合体します。

In [None]:
df1m3=pd.merge(df1m,df3,on="item_id",how="left")
df1m32=pd.merge(df1m3,df2,on="item_category_id",how="left")
df1m324=pd.merge(df1m32,df4,on="shop_id",how="left")

Grouping

グルーピングします。

In [None]:
DF1=df1m324.groupby(['shop_id', 'item_id','date_block_num']).agg({'item_price':'mean','item_cnt_day':'sum'}).sort_values(by=['date_block_num','shop_id','item_id']).reset_index()
DF2=df1m324[['date','date_block_num','shop_id', 'item_id','item_price','item_cnt_day']].sort_values(['date','shop_id','item_id']).reset_index()
DF3=DF2.groupby(["date"]).sum()["item_cnt_day"]

Change format for using Prophet function.
Prophet function requires to "ds" and "y" (Only "ds" and "y" permitted)
This is a rule when Prophet function use.

これはProphetで必須の処理で、dsとyで定義する必要があり、それぞれdataとitem_cnt_dayを置き換えます。

In [None]:
DF4=DF2.rename(columns={"date":"ds","item_cnt_day":"y"})
DF4=DF2.rename(columns={"date":"ds","item_cnt_day":"y"}).groupby(["ds"]).sum()["y"].reset_index()
DF5=DF4[["ds","y"]]
DF5=pd.DataFrame(DF5)

In [None]:
plt.plot(DF5["ds"],DF5["y"])

Devide the data as training and testing

トレーニングデータとテストデータを分割します。

In [None]:
mday=pd.to_datetime("2015-08-01")
train_index=DF5["ds"]<mday
test_index=DF5["ds"]>=mday
x_train=DF5[train_index]
x_test=DF5[test_index]
dates_test=DF5["ds"][test_index]

Import/Call Prophet function

Prophetを読み込みます。

In [None]:
from fbprophet import Prophet
m1=Prophet(yearly_seasonality=True,weekly_seasonality=True,
          daily_seasonality=False,
          seasonality_mode="multiplicative")

Run Prophet function

Prophetを実行します。

In [None]:
m1.fit(x_train)

In [None]:
future1=m1.make_future_dataframe(periods=92,freq="D")
display(future1.head())
display(future1.tail())

In [None]:
fcst1=m1.predict(future1)
fig=m1.plot_components(fcst1)
plt.show()

In [None]:
fig,ax=plt.subplots(figsize=(10,6))
m1.plot(fcst1,ax=ax)
plt.show()

In [None]:
ypred1=fcst1[-92:][["yhat"]].values
ytest1=x_test["y"].values

from sklearn.metrics import r2_score
score=r2_score(ytest1,ypred1)

print(f'R2 score:{score:.4f}')

In [None]:
import matplotlib.dates as mdates
fig,ax=plt.subplots(figsize=(8,4))

ax.plot(dates_test,ytest1,label="actual",c="k")
ax.plot(dates_test,ypred1,label="predict",c="b")

weeks=mdates.WeekdayLocator(byweekday=mdates.TH)
ax.xaxis.set_major_locator(weeks)

ax.tick_params(axis="x",rotation=90)

ax.grid()
ax.legend()
ax.set_title("Result actual vs predict")

plt.show()

Predict future dataframe

モデルをベースに実際に予測をします。

In [None]:
future_data = m1.make_future_dataframe(periods=122, freq = 'D')
forecast_data = m1.predict(future_data)

m1.plot(forecast_data)
m1.plot_components(forecast_data)
plt.show()

In [None]:
FC2=forecast_data[['ds','yhat']].sort_values(['ds'])

In [None]:
mday=pd.to_datetime("2015-11-01")
train_index2=FC2["ds"]<mday
test_index2=FC2["ds"]>=mday
x_train2=FC2[train_index2]
x_test2=FC2[test_index2]
dates_test2=FC2["ds"][test_index2]

In [None]:
FC3=x_test2.sum()
FC3=pd.DataFrame(FC3)
FC3

In [None]:
DF1m=DF1.groupby(['shop_id']).count()["item_cnt_day"]
DF1mm=(DF1m/(DF1m.sum()))

In [None]:
DF2m=DF1.groupby(['item_id']).count()["item_cnt_day"]
DF2mm=DF2m/(DF2m.sum())

In [None]:
df5m=df5.groupby(['shop_id']).count()["item_id"]
print(df5m.head())
print(df5m.tail())
df5mm=df5.groupby(['item_id']).count()["shop_id"]
print(df5mm.head())
print(df5mm.tail())

In [None]:
join_data=pd.merge(df5,DF1mm,left_on="shop_id",right_on="shop_id",how="left")
join_data=pd.merge(join_data,DF2mm,left_on="item_id",right_on="item_id",how="left")
join_data=join_data.fillna(join_data.mean())

In [None]:
join_data["item_cnt_month"]=np.nan
join_data.head(2)
join_data["item_cnt_month"]=(join_data["item_cnt_day_x"])*(join_data["item_cnt_day_y"])*42*5100
join_data.sum()

In [None]:
issue_data=join_data[["ID","item_cnt_month"]]
issue_data

In [None]:
issue_data.to_csv("result-of-competitive-data-science-predict-future-sales-ProphetFunction.csv",index=False)