Predict 3 months of item sales at different stores
https://www.kaggle.com/c/demand-forecasting-kernels-only/notebooks

This competition is provided as a way to explore different time series techniques on a relatively simple and clean dataset.

You are given 5 years of store-item sales data, and asked to predict 3 months of sales for 50 different items at 10 different stores.

What's the best way to deal with seasonality? Should stores be modeled separately, or can you pool them together? Does deep learning work better than ARIMA? Can either beat xgboost?

This is a great competition to explore different models and improve your skills in forecasting.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
train = pd.read_csv(dirname + "/train.csv",parse_dates=['date'],index_col=['date'])
test = pd.read_csv(dirname  + "/test.csv"  ,parse_dates=['date'],index_col=['date'])
submission = pd.read_csv(dirname + "/sample_submission.csv")
print(f"Shape: train={train.shape}, test={test.shape}, submission={submission.shape}")

In [None]:
train.head(3)

In [None]:
train['day'] = train.index.day
train['month'] = train.index.month
train['year'] = train.index.year
train['dayofweek'] = train.index.dayofweek
train['weekofyear']  = train.index.isocalendar().week
train['is_weekend'] = train.index.dayofweek // 5
test['day'] = test.index.day
test['month'] = test.index.month
test['year'] = test.index.year
test['dayofweek'] = test.index.dayofweek
test['weekofyear']  = test.index.isocalendar().week
test['is_weekend'] = test.index.dayofweek // 5

In [None]:
train.head(3)

In [None]:
import warnings
warnings.simplefilter(action="ignore")
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error
rf = RandomForestRegressor(n_estimators=50, min_samples_leaf = 7, random_state=123)

# Add weights to most recent data
train_main = pd.concat([train,
                        train.loc[train.index > train.index.max()- pd.DateOffset(30, 'D')],
                        train.loc[train.index > train.index.max()- pd.DateOffset(30, 'D')],
                        train.loc[train.index > train.index.max()- pd.DateOffset(15, 'D')]
                       ])

# Train a model
rf.fit(X=train_main[['store', 'item','year','month','dayofweek','is_weekend']], y=train_main['sales'])

# Get predictions for the test set
test['sales'] = rf.predict(test[['store', 'item','year','month','dayofweek','is_weekend']])

test[['id', 'sales']].to_csv("submission.csv", index=False)

In [None]:
test.head()

https://dalspace.library.dal.ca/bitstream/handle/10222/73170/Harris-Jay-MEC-August-2017.pdf?sequence=1&isAllowed=y