Naïve Model: Predict the next point in the series to be the last observed value.

Mean Model: Predict the next point in the series to be the mean of all previous observed values.

In [None]:
import pandas as pd

# Load the uploaded CSV file
data = pd.read_csv('incoming_daily_till2023may_interpolated.csv')

# Display the first few rows of the dataset
data.head()


In [5]:
# Splitting data into training and test sets (80% train, 20% test)
train_size = int(len(data) * 0.8)
train, test = data[0:train_size], data[train_size:]

# Display the size of each dataset
len(train), len(test)


(704, 177)

In [6]:
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error

# Naïve forecast
naive_forecast = test.copy()
naive_forecast['forecast'] = train['volume'].iloc[-1]  # Last observed value from training set

# Mean forecast
mean_forecast = test.copy()
mean_forecast['forecast'] = train['volume'].mean()  # Mean of all observed values in training set

def mean_absolute_percentage_error(y_true, y_pred): 
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def theil_u(y_true, y_pred):
    n = len(y_true)
    num = np.sqrt((1/n) * np.sum(np.square(y_true - y_pred)))
    den = np.sqrt((1/n) * np.sum(np.square(y_true))) + np.sqrt((1/n) * np.sum(np.square(y_pred)))
    return num / den

# Calculate evaluation metrics for each model
metrics = {
    'Model': ['Naïve', 'Mean'],
    'MAE': [
        mean_absolute_error(test['volume'], naive_forecast['forecast']),
        mean_absolute_error(test['volume'], mean_forecast['forecast'])
    ],
    'RMSE': [
        np.sqrt(mean_squared_error(test['volume'], naive_forecast['forecast'])),
        np.sqrt(mean_squared_error(test['volume'], mean_forecast['forecast']))
    ],
    'MAPE': [
        mean_absolute_percentage_error(test['volume'], naive_forecast['forecast']),
        mean_absolute_percentage_error(test['volume'], mean_forecast['forecast'])
    ],
    'MSE': [
        mean_squared_error(test['volume'], naive_forecast['forecast']),
        mean_squared_error(test['volume'], mean_forecast['forecast'])
    ],
    "Theil's U2": [
        theil_u(test['volume'], naive_forecast['forecast']),
        theil_u(test['volume'], mean_forecast['forecast'])
    ]
}

# Convert metrics to DataFrame for better display
metrics_df = pd.DataFrame(metrics)
metrics_df


Unnamed: 0,Model,MAE,RMSE,MAPE,MSE,Theil's U2
0,Naïve,571.637006,718.251915,803.773886,515885.813206,0.302777
1,Mean,686.395017,750.143559,379.550004,562715.358706,0.440389


The Naïve model has a lower MAE, RMSE, MSE, and Theil's U2 compared to the Mean model, indicating that it performs better in this dataset.
However, the Mean model has a lower MAPE, meaning it's better in terms of percentage errors for this dataset.
These baseline models provide a starting point for evaluating more sophisticated forecasting models. Depending on the context and the domain, you might want to consider more advanced models like ARIMA, Exponential Smoothing, or machine learning-based approaches for forecasting.