# Deep Learning vs. Traditional Machine Learning
## Advanced Machine Learning in Finance - Individual Project

The aim of this project is to compare the performance of two machine learning models, N-Beats and XGBoost, for predicting sales figures. The challenge is to use available data that has a correlation with the sales of various companies, such as credit card transactions and geographical data, to predict the quarterly sales figures.

The goal is to compare the accuracy of predictions for the current quarter, as well as the next couple of quarters, between N-Beats and XGBoost.

N-Beats is a deep learning model designed specifically for time-series forecasting, while XGBoost is a popular tree-based model used for various types of predictive modeling. This project aim to give insights into comparative forecasting results from these two models and their applications in the field of sales prediction.

The outcome of this project will be a comparison between the two models, including a discussion of the results and their implications for businesses and organizations looking to improve their financial forecasting capabilities.

In [37]:
import darts
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import torch 
import os
import sys
from utils import Processor
from darts.dataprocessing.transformers import (
    Scaler,
    MissingValuesFiller,
    Mapper,
    InvertibleMapper,
)
from darts import TimeSeries
import darts.metrics as metrics
from darts.models import NBEATSModel
from darts.dataprocessing.transformers import Scaler

In [38]:
model = Processor('exabel_data')

In [39]:
model.files

['revenue.xlsx', '.DS_Store', 'spend_amounts_aggregated.xlsx']

In [41]:
revenue = model.create_df(model.revenue)
spendings = model.create_df(model.spendings)
df = model.merge_spendings_revenue(spendings, revenue)
df = model.create_stationary_covariates(df)

In [42]:
import holidays

In [55]:
def add_holiday(dff):
       
    us_holidays = holidays.US()

    def is_holiday(date):
        if date in us_holidays:
            return 1
        else:
            return 0
    dff['time'] = pd.to_datetime(dff['time'])
    dff["is_holiday"] = dff["time"].apply(lambda x: is_holiday(x))
    dff['is_weekend'] = dff['time'].dt.weekday.isin([5,6])

# add a column indicating if the date is a workday
    dff['is_workday'] = (~dff['is_holiday']) & (~dff['is_weekend']).astype(int)
    dff['is_weekend'] = dff['time'].dt.weekday.isin([5,6]).astype(int)

    
    return dff

df = add_holiday(df)
df.head(10)

Unnamed: 0,mic,ticker,time,nw_total_sales_a_total,nw_total_sales_b_total,Sales_Actual_fiscal,Sales_Estimate_fiscal,year,month,quarter,date,is_holiday,is_weekend,is_workday
0,XNAS,FLWS,2019-03-31,54356.41,,248.413,243.11867,2019,3,1,2019-03-31,0,1,0
1,XNAS,FLWS,2019-06-30,54514.94,,259.398,255.147,2019,6,2,2019-06-30,0,1,0
2,XNAS,FLWS,2019-12-29,41047.13,,605.642,610.4694,2019,12,4,2019-12-29,0,1,0
3,XNAS,FLWS,2020-03-29,60962.1,,278.776,266.56613,2020,3,1,2020-03-29,0,1,0
4,XNAS,FLWS,2020-06-28,58101.76,33479.59,417.956,372.81314,2020,6,2,2020-06-28,0,1,0
5,XNAS,FLWS,2020-09-27,,16974.41,283.772,266.052,2020,9,3,2020-09-27,0,1,0
6,XNAS,FLWS,2020-12-27,27247.28,19563.16,877.256,755.28217,2020,12,4,2020-12-27,0,1,0
7,XNAS,FLWS,2021-03-28,34826.96,30724.94,474.234,412.80344,2021,3,1,2021-03-28,0,1,0
8,XNAS,FLWS,2021-06-27,39150.59,29499.7,486.983,474.08414,2021,6,2,2021-06-27,0,1,0
9,XNAS,FLWS,2021-09-26,52785.6,,309.373,295.86444,2021,9,3,2021-09-26,0,1,0


In [48]:
np.unique(df['is_holiday'])

array([0, 1])

In [54]:
df.iloc[3000:3300,:]

Unnamed: 0,mic,ticker,time,nw_total_sales_a_total,nw_total_sales_b_total,Sales_Actual_fiscal,Sales_Estimate_fiscal,year,month,quarter,date,is_holiday,is_weekend,is_workday
3000,XNYS,SNA,2019-12-28,912840.31,178849.37,955.2,965.57840,2019,12,4,2019-12-28,0,1,0
3001,XNYS,SNA,2020-03-28,899439.87,193961.24,852.2,877.69165,2020,3,1,2020-03-28,0,1,0
3002,XNYS,SNA,2020-06-27,846255.13,179664.88,724.3,726.66850,2020,6,2,2020-06-27,0,1,0
3003,XNYS,SNA,2020-09-26,904680.97,162231.58,941.6,817.13544,2020,9,3,2020-09-26,0,1,0
3004,XNYS,SNA,2021-01-02,941790.85,175937.69,1074.4,942.78125,2021,1,1,2021-01-02,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3295,XNYS,HD,2018-07-29,12239590.41,,30463.0,30036.09600,2018,7,3,2018-07-29,0,1,0
3296,XNYS,HD,2018-10-28,11223884.69,,26302.0,26242.13500,2018,10,4,2018-10-28,0,1,0
3297,XNYS,HD,2019-02-03,10434209.09,,26491.0,26577.60400,2019,2,1,2019-02-03,0,1,0
3298,XNYS,HD,2019-05-05,11319447.13,3819356.39,26381.0,26373.12500,2019,5,2,2019-05-05,0,1,0


In [27]:
# Group the data by the 'ticker' column
grouped = df.groupby('ticker')
grouped.head(10)
grouped = grouped.isna()

Unnamed: 0,mic,ticker,time,nw_total_sales_a_total,nw_total_sales_b_total,Sales_Actual_fiscal,Sales_Estimate_fiscal,year,month,quarter,date
0,XNAS,FLWS,2019-03-31,54356.41,,248.413,243.11867,2019,3,1,2019-03-31
1,XNAS,FLWS,2019-06-30,54514.94,,259.398,255.14700,2019,6,2,2019-06-30
2,XNAS,FLWS,2019-12-29,41047.13,,605.642,610.46940,2019,12,4,2019-12-29
3,XNAS,FLWS,2020-03-29,60962.10,,278.776,266.56613,2020,3,1,2020-03-29
4,XNAS,FLWS,2020-06-28,58101.76,33479.59,417.956,372.81314,2020,6,2,2020-06-28
...,...,...,...,...,...,...,...,...,...,...,...
3996,XNAS,LULU,2019-11-03,83532.26,37889.27,916.138,899.27386,2019,11,4,2019-11-03
3997,XNAS,LULU,2020-02-02,170870.52,70525.88,1397.491,1382.60900,2020,2,1,2020-02-02
3998,XNAS,LULU,2020-05-03,91155.77,38798.29,651.962,691.62920,2020,5,2,2020-05-03
3999,XNAS,LULU,2020-08-02,139472.10,60155.86,902.942,844.47534,2020,8,3,2020-08-02


In [None]:
test_a.tail(30)

In [None]:
series = model.create_darts_series_from_df(df)
series = model.series_fill_missing_vals(series)
series_train, series_val = model.series_train_test(series=series)

In [None]:
model = NBEATSModel(input_chunk_length=24, output_chunk_length=12)
model.fit(series_train)

In [None]:
preds = model.predict(len(series_val), series=series);
preds.plot(label='preds');
series_val.plot(label='ground truth');
plt.legend()

In [None]:
preds

In [None]:
preds = fill_missing_vals(preds)

In [None]:
metrics.mape(actual_series=series_val, pred_series=preds)