-----
# Baseline Forecasting
-----

## Set Up
---

In [1]:
import numpy as np
import pandas as pd

# plotting
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import seaborn as sns

# stats
from statsmodels.api import tsa # time series analysis
import statsmodels.api as sm

## Utility Functions
-----

In [43]:
def calculate_mape(actual, predicted):

    err= actual - predicted
    abs_percent_err = np.abs(err/actual)
    mape = abs_percent_err.mean() * 100
    
    return mape

## Data Loading
----

In [5]:
# Loading of train and test data from data folder
train_df = pd.read_csv('../../data/train_set.csv', index_col=0)
test_df = pd.read_csv('../../data/test_set.csv', index_col=0)

## Baseline Foreacasting
----

### Using the Mean

In [15]:
# Concatenating train/test df to get full date range - required for predictions
full_data_range = pd.concat([train_df, test_df]).index

In [16]:
# Creating baseline predictions, filling array with mean of training set
# Making assumption that future predictions are close to the mean of the training set
baseline_pred = np.full(full_data_range.shape, np.mean(train_df))

In [17]:
predictions = pd.Series(data=baseline_pred, index=full_data_range)

In [28]:
# Plot to visualise the training data, test data and baseline prediction
fig = go.Figure()
fig.add_trace(go.Scatter(x=train_df.index, y=train_df['seasonal_difference'], mode='lines', name="Train"))
fig.add_trace(go.Scatter(x=test_df.index, y=test_df['seasonal_difference'], mode='lines', name="Test"))
fig.add_trace(go.Scatter(x=predictions.index, y=predictions, mode='lines', name="Mean Prediction"))

fig.update_layout(
    yaxis_title="Seasonal Difference", 
    xaxis_title="Date",
    title="Baseline Predictions"
)
fig.show()

**Plot Description:**

Plot shows the baseline forecasting where the prediction for future values is the mean of the entire dataset. Here the assumption is that future values are equal to the average value of historical observations. 



## Evaluation
------

### MAPE - Mean Absolute Percentage Error

In [49]:
# Calculating MAPE using calculate_mape function
train_data_mape = calculate_mape(train_df['seasonal_difference'], predictions[train_df.index])
test_data_mape = calculate_mape(test_df['seasonal_difference'], predictions[test_df.index])

# Printing scores
print(f"Train MAPE Score: {round(train_data_mape, 2)}%")
print(f"Test MAPE Score:  {round(test_data_mape, 2)}%")

Train MAPE Score: 208.28%
Test MAPE Score:  231.94%


**Comment:**

MAPE calcualtes the absolute percentage error between predicted and actual values. 

Both MAPE scores for the training and test set are high, for both on average the predicted values are approximately twice as far from actual values. 

There is a slight difference between the MAPE scores for training and test data suggesting slight overfitting.

Siunce this is the score for the baseline forecast, it is expected especially for a stock dataset the MAPE scores to be bad. This shows the need for complex forecasting methods to be used. 

## Conclusion
-----

#TODOS

- intro/conc
- code comments