<a href="https://colab.research.google.com/github/sahug/time-series/blob/main/TSA%20-%20RandomForestRegressor%20-%20Multiple%20Time%20Series%20Forecasting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**TSA - RandomForestRegressor - Multiple Time Series Forecasting**

**Get Dataset**

In [1]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00396/Sales_Transactions_Dataset_Weekly.csv

--2022-06-12 17:31:39--  https://archive.ics.uci.edu/ml/machine-learning-databases/00396/Sales_Transactions_Dataset_Weekly.csv
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 317399 (310K) [application/x-httpd-php]
Saving to: ‘Sales_Transactions_Dataset_Weekly.csv’


2022-06-12 17:31:40 (1.51 MB/s) - ‘Sales_Transactions_Dataset_Weekly.csv’ saved [317399/317399]



**Load Dataset**

In [2]:
import pandas as pd
data = pd.read_csv("/content/Sales_Transactions_Dataset_Weekly.csv")
data.head()

Unnamed: 0,Product_Code,W0,W1,W2,W3,W4,W5,W6,W7,W8,...,Normalized 42,Normalized 43,Normalized 44,Normalized 45,Normalized 46,Normalized 47,Normalized 48,Normalized 49,Normalized 50,Normalized 51
0,P1,11,12,10,8,13,12,14,21,6,...,0.06,0.22,0.28,0.39,0.5,0.0,0.22,0.17,0.11,0.39
1,P2,7,6,3,2,7,1,6,3,3,...,0.2,0.4,0.5,0.1,0.1,0.4,0.5,0.1,0.6,0.0
2,P3,7,11,8,9,10,8,7,13,12,...,0.27,1.0,0.18,0.18,0.36,0.45,1.0,0.45,0.45,0.36
3,P4,12,8,13,5,9,6,9,13,13,...,0.41,0.47,0.06,0.12,0.24,0.35,0.71,0.35,0.29,0.35
4,P5,8,5,13,11,6,7,9,14,9,...,0.27,0.53,0.27,0.6,0.2,0.2,0.13,0.53,0.33,0.4


**Filter Data**

In [4]:
data = data.filter(regex="Product|W")
data.head()

Unnamed: 0,Product_Code,W0,W1,W2,W3,W4,W5,W6,W7,W8,...,W42,W43,W44,W45,W46,W47,W48,W49,W50,W51
0,P1,11,12,10,8,13,12,14,21,6,...,4,7,8,10,12,3,7,6,5,10
1,P2,7,6,3,2,7,1,6,3,3,...,2,4,5,1,1,4,5,1,6,0
2,P3,7,11,8,9,10,8,7,13,12,...,6,14,5,5,7,8,14,8,8,7
3,P4,12,8,13,5,9,6,9,13,13,...,9,10,3,4,6,8,14,8,7,8
4,P5,8,5,13,11,6,7,9,14,9,...,7,11,7,12,6,6,5,11,8,9


**Melt Dataset**

In [5]:
melt = data.melt(id_vars="Product_Code", var_name="Week", value_name="Sales")
melt.head()

Unnamed: 0,Product_Code,Week,Sales
0,P1,W0,11
1,P2,W0,7
2,P3,W0,7
3,P4,W0,12
4,P5,W0,8


**Encoding**

In [7]:
melt["Product_Code"] = melt["Product_Code"].str.extract("(\d+)", expand=False).astype(int)
melt["Week"] = melt["Week"].str.extract("(\d+)", expand=False).astype(int)
melt = melt.sort_values(["Week", "Product_Code"])
melt.head()

Unnamed: 0,Product_Code,Week,Sales
0,1,0,11
1,2,0,7
2,3,0,7
3,4,0,12
4,5,0,8


**Split Data**

In TSA we cannot random split the data into Train and Test because we will loose the pattern which is required for TSA. Here we are split the data after and before week 40. We cannot split it between the weeks. We will split it at the end or begening of the week.

In [8]:
split_point = 40
melt_train = melt[melt["Week"] < split_point].copy()
melt_valid = melt[melt["Week"] >= split_point].copy()

**Setup 1 Step Target**

Using the next 1 week as a target or y variable.

In [9]:
melt_train["Sales_Next_Week"] = melt_train.groupby("Product_Code")["Sales"].shift(-1)

In [10]:
melt_train[melt_train["Product_Code"] == 1].head()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week
0,1,0,11,12.0
811,1,1,12,10.0
1622,1,2,10,8.0
2433,1,3,8,13.0
3244,1,4,13,12.0


In [11]:
melt_valid["Sales_Next_Week"] = melt_valid.groupby("Product_Code")["Sales"].shift(-1)
melt_valid[melt_valid["Product_Code"] == 1].head()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week
32440,1,40,7,11.0
33251,1,41,11,4.0
34062,1,42,4,7.0
34873,1,43,7,8.0
35684,1,44,8,10.0


**Null Check**

In [12]:
melt_train.isna().sum()

Product_Code         0
Week                 0
Sales                0
Sales_Next_Week    811
dtype: int64

In [14]:
melt_train = melt_train.dropna()

In [15]:
melt_train.isna().sum()

Product_Code       0
Week               0
Sales              0
Sales_Next_Week    0
dtype: int64

**Create Features**

**Lag**

In [16]:
melt_train["Sales_Prev_Week"] = melt_train.groupby("Product_Code")["Sales"].shift(1)
melt_valid["Sales_Prev_Week"] = melt_valid.groupby("Product_Code")["Sales"].shift(1)

In [22]:
melt_train[melt_train["Product_Code"] == 1].head()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week,Sales_Prev_Week,Sales_Diff
0,1,0,11,12.0,,
811,1,1,12,10.0,11.0,1.0
1622,1,2,10,8.0,12.0,-2.0
2433,1,3,8,13.0,10.0,-2.0
3244,1,4,13,12.0,8.0,5.0


**Difference**

In [19]:
melt_train["Sales_Diff"] = melt_train.groupby("Product_Code")["Sales"].diff(1)
melt_valid["Sales_Diff"] = melt_valid.groupby("Product_Code")["Sales"].diff(1)

In [23]:
melt_train[melt_train["Product_Code"] == 1].head()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week,Sales_Prev_Week,Sales_Diff
0,1,0,11,12.0,,
811,1,1,12,10.0,11.0,1.0
1622,1,2,10,8.0,12.0,-2.0
2433,1,3,8,13.0,10.0,-2.0
3244,1,4,13,12.0,8.0,5.0


**Rolling Stats**

In [24]:
melt_train["Sales_Mean"] = melt_train.groupby("Product_Code")["Sales"].rolling(4).mean().reset_index(level=0, drop=True)
melt_valid["Sales_Mean"] = melt_valid.groupby("Product_Code")["Sales"].rolling(4).mean().reset_index(level=0, drop=True)

In [25]:
melt_train[melt_train["Product_Code"] == 1].head()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week,Sales_Prev_Week,Sales_Diff,Sales_Mean
0,1,0,11,12.0,,,
811,1,1,12,10.0,11.0,1.0,
1622,1,2,10,8.0,12.0,-2.0,
2433,1,3,8,13.0,10.0,-2.0,10.25
3244,1,4,13,12.0,8.0,5.0,10.75


**Metric**

In [26]:
import numpy as np

In [35]:
def mape(y_true, y_pred):
  ape = np.abs((y_true - y_pred) / y_true)
  ape[~np.isfinite(ape)] = 1.
  return np.mean(ape)

In [32]:
def wmape(y_true, y_pred):
  return np.sum(np.abs(y_true - y_pred)) / np.sum(np.abs(y_true))

**Establish Baseline**

In [33]:
y_pred = melt_train["Sales"]
y_true = melt_train["Sales_Next_Week"]

In [36]:
mape(y_true, y_pred)

0.6721872645511479

In [37]:
wmape(y_true, y_pred)

0.30816465612331645

**Model**

In [38]:
melt_train.head()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week,Sales_Prev_Week,Sales_Diff,Sales_Mean
0,1,0,11,12.0,,,
1,2,0,7,6.0,,,
2,3,0,7,11.0,,,
3,4,0,12,8.0,,,
4,5,0,8,5.0,,,


In [39]:
features = ["Sales", "Sales_Prev_Week", "Sales_Diff", "Sales_Mean"]

In [40]:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer()
Xtr = imputer.fit_transform(melt_train[features])
Ytr = melt_train["Sales_Next_Week"]

In [42]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100, random_state=0, n_jobs=6)
model.fit(Xtr, Ytr)

RandomForestRegressor(n_jobs=6, random_state=0)

**Evaluate**

In [43]:
Xval = imputer.transform(melt_valid[features])
Yval = melt_valid["Sales_Next_Week"]

In [44]:
pred = model.predict(Xval)

In [45]:
mape(Yval, pred)

0.6463137461455442

In [46]:
wmape(Yval, pred)

0.3004969729507602

**Extend Model**

Extend model to predict n-steps.

In [48]:
melt_train["Sales_Next_2_Week"] = melt_train.groupby("Product_Code")["Sales"].shift(-2)
melt_valid["Sales_Next_2_Week"] = melt_valid.groupby("Product_Code")["Sales"].shift(-2)

In [49]:
melt_train[melt_train["Product_Code"] == 1].head()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week,Sales_Prev_Week,Sales_Diff,Sales_Mean,Sales_Next_2_Week
0,1,0,11,12.0,,,,10.0
811,1,1,12,10.0,11.0,1.0,,8.0
1622,1,2,10,8.0,12.0,-2.0,,13.0
2433,1,3,8,13.0,10.0,-2.0,10.25,12.0
3244,1,4,13,12.0,8.0,5.0,10.75,14.0


In [50]:
melt_train = melt_train.dropna(subset=["Sales_Next_Week", "Sales_Next_2_Week"])

In [51]:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer()
Xtr = imputer.fit_transform(melt_train[features])
Ytr = melt_train[["Sales_Next_Week", "Sales_Next_2_Week"]]

In [52]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100, random_state=0, n_jobs=6)
model.fit(Xtr, Ytr)

RandomForestRegressor(n_jobs=6, random_state=0)

In [53]:
Xval = imputer.transform(melt_valid[features])
Yval = melt_valid[["Sales_Next_Week", "Sales_Next_2_Week"]]

In [54]:
pred = model.predict(Xval)

In [55]:
mape(Yval, pred)

Sales_Next_Week      0.647034
Sales_Next_2_Week    0.681146
dtype: float64

In [56]:
wmape(Yval, pred)

Sales_Next_Week      0.300301
Sales_Next_2_Week    0.310315
dtype: float64

**Predict New Data**

As long as we have the same features we used to train, we can predict for any period.

In [57]:
melt_valid.tail()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week,Sales_Prev_Week,Sales_Diff,Sales_Mean,Sales_Next_2_Week
42167,815,51,0,,2.0,-2.0,0.5,
42168,816,51,5,,6.0,-1.0,5.25,
42169,817,51,3,,4.0,-1.0,1.75,
42170,818,51,0,,2.0,-2.0,0.5,
42171,819,51,1,,0.0,1.0,0.25,


In [58]:
new_example = melt_valid[melt_valid["Week"] == 51].copy()
new_example.head()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week,Sales_Prev_Week,Sales_Diff,Sales_Mean,Sales_Next_2_Week
41361,1,51,10,,5.0,5.0,7.0,
41362,2,51,0,,6.0,-6.0,3.0,
41363,3,51,7,,8.0,-1.0,9.25,
41364,4,51,8,,7.0,1.0,9.25,
41365,5,51,9,,8.0,1.0,8.25,


In [60]:
pred = model.predict(new_example[features])
pred

  f"X has feature names, but {self.__class__.__name__} was fitted without"


array([[ 4.70014286,  8.85766667],
       [ 1.69609452,  3.01583297],
       [10.90119048,  5.80107143],
       ...,
       [ 0.74560423,  2.15993741],
       [ 0.52880028,  0.49030189],
       [ 0.27852871,  0.30830163]])

In [61]:
new_example["P_Sales_Next_Week"] = pred[:, 0]
new_example["P_Sales_Next_2_Week"] = pred[:, 1]

In [62]:
new_example.head()

Unnamed: 0,Product_Code,Week,Sales,Sales_Next_Week,Sales_Prev_Week,Sales_Diff,Sales_Mean,Sales_Next_2_Week,P_Sales_Next_Week,P_Sales_Next_2_Week
41361,1,51,10,,5.0,5.0,7.0,,4.700143,8.857667
41362,2,51,0,,6.0,-6.0,3.0,,1.696095,3.015833
41363,3,51,7,,8.0,-1.0,9.25,,10.90119,5.801071
41364,4,51,8,,7.0,1.0,9.25,,8.4925,13.952071
41365,5,51,9,,8.0,1.0,8.25,,9.38156,8.619524
