# Model Definition and Evaluation
## Table of Contents
1. [Model Selection](#model-selection)
2. [Feature Engineering](#feature-engineering)
3. [Hyperparameter Tuning](#hyperparameter-tuning)
4. [Implementation](#implementation)
5. [Evaluation Metrics](#evaluation-metrics)
6. [Comparative Analysis](#comparative-analysis)


In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_percentage_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer, Dense, BatchNormalization, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
# Import models you're considering


## Model Selection

After creating the OLS regression model that explains approximately 75% of the variance, it assumes linear relationships between predictors and Revenue. However since the revenue generation is likely influenced by other factors like lagged demand, seasonal components, product categories, and holiday effects. Therefore, we created a Neural Network, which can capture complex, non-linear relationships that linear regression cannot model.



## Feature Engineering

First, we sorted the dataset chronologically to preserve temporal dependencies. A product group label was created from one-hot encoded product indicators and then we converted boolean variables to integers, and forward-filled missing values. A time-based split (70% train, 15% validation, 15% test) was applied and finally, we standardized all input features using StandardScaler to improve neural network convergence.


In [5]:
# Load the dataset

df = pd.read_csv('/workspaces/ml-project-template/final_dataset.csv')
TARGET_COL = "Revenue"
DATE_COL = "Date"

EPOCHS = 80
BATCH_SIZE = 64
LEARNING_RATE = 0.001

TRAIN_FRAC = 0.70
VAL_FRAC = 0.15
TEST_FRAC = 0.15

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

df[DATE_COL] = pd.to_datetime(df[DATE_COL])
df = df.sort_values(DATE_COL).reset_index(drop=True)

print("Loaded data shape:", df.shape)
print(df.head())

# product groups

product_cols = [c for c in df.columns if c.startswith("Product_")]

# If none of Product_2..Product_6 is True -> Product_1
def infer_product_group(row):
    for c in product_cols:
        if bool(row[c]):
            return c
    return "Product_1"

df["Product_Group"] = df.apply(infer_product_group, axis=1)

print("\nProduct group distribution:")
print(df["Product_Group"].value_counts())



# Convert boolean columns to int
bool_cols = df.select_dtypes(include=["bool"]).columns.tolist()
for c in bool_cols:
    df[c] = df[c].astype(int)


# Features exclude target, date, product group label
feature_cols = [c for c in df.columns if c not in [TARGET_COL, DATE_COL, "Product_Group"]]

X = df[feature_cols].copy()
y = df[TARGET_COL].copy()


# splitting
n = len(df)
train_end = int(n * TRAIN_FRAC)
val_end = int(n * (TRAIN_FRAC + VAL_FRAC))

X_train, y_train = X.iloc[:train_end], y.iloc[:train_end]
X_val, y_val = X.iloc[train_end:val_end], y.iloc[train_end:val_end]
X_test, y_test = X.iloc[val_end:], y.iloc[val_end:]

val_groups = df["Product_Group"].iloc[train_end:val_end].reset_index(drop=True)

print("\nSplit sizes:")
print("Train:", X_train.shape, y_train.shape)
print("Val:  ", X_val.shape, y_val.shape)
print("Test: ", X_test.shape, y_test.shape)

#scaling the feautes
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)



Loaded data shape: (10896, 22)
        Date  Holiday  NextDayHoliday  IsWeekend  Month  KielerWeek  \
0 2013-07-01        1               1          0      7           0   
1 2013-07-01        1               1          0      7           0   
2 2013-07-01        1               1          0      7           0   
3 2013-07-01        1               1          0      7           0   
4 2013-07-01        1               1          0      7           0   

   IsNewYearsEve  IsHalloween  t      lag_1  ...  year_sin1  year_cos1  \
0              0            0  0  1269.2491  ...        0.0        1.0   
1              0            0  0  1269.2491  ...        0.0        1.0   
2              0            0  0  1269.2491  ...        0.0        1.0   
3              0            0  0  1269.2491  ...        0.0        1.0   
4              0            0  0  1269.2491  ...        0.0        1.0   

   year_sin2  year_cos2    Revenue  Product_2  Product_3  Product_4  \
0        0.0        1.0  1

## Hyperparameter Tuning

We tuned the neural network architecture manually by adjusting number of hidden layers and neurons, the dropout and learning rate and the batch size. Normalization was used to stabilize training. 20% dropout was used to reduce overfitting. We also chose to include EarlyStopping (patience = 12) in our model to prevent unnecessary training and overfitting.

## Implementation

[Implement the final model(s) you've selected based on the above steps.]


In [15]:
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Dense, InputLayer, BatchNormalization, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

model = Sequential()
model.add(InputLayer(input_shape=(X_train_scaled.shape[1],)))
model.add(Dense(128, activation="relu"))
model.add(BatchNormalization())
model.add(Dropout(0.2))

model.add(Dense(64, activation="relu"))
model.add(BatchNormalization())
model.add(Dropout(0.2))

model.add(Dense(32, activation="relu"))
model.add(Dense(1))  # regression output

model.compile(loss="mse", optimizer=Adam(learning_rate=LEARNING_RATE))

model.summary()

early_stop = EarlyStopping(
    monitor="val_loss",
    patience=12,
    restore_best_weights=True
)

history = model.fit(
    X_train_scaled, y_train,
    validation_data=(X_val_scaled, y_val),
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    callbacks=[early_stop],
    verbose=1
)
     
train_preds = model.predict(X_train_scaled).reshape(-1)
val_preds = model.predict(X_val_scaled).reshape(-1)
test_preds = model.predict(X_test_scaled).reshape(-1)




Epoch 1/80
[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - loss: 50473.0078 - val_loss: 38337.4688
Epoch 2/80
[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 19510.5117 - val_loss: 12013.6328
Epoch 3/80
[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 7426.3403 - val_loss: 5864.0796
Epoch 4/80
[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 5289.8110 - val_loss: 4021.8914
Epoch 5/80
[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 4386.0137 - val_loss: 3838.3735
Epoch 6/80
[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 4076.4854 - val_loss: 4523.5854
Epoch 7/80
[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 3810.4397 - val_loss: 3705.8474
Epoch 8/80
[1m120/120[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 3439.4719 - val_loss

## Evaluation Metrics

Model performance was evaluated using Mean Absolute Percentage Error (MAPE), excluding zero-revenue observations. Rows with zero revenue were excluded from the calculation. MAPE provides an interpretable percentage-based error metric, allowing comparison across products and time periods.


In [18]:
# Evaluate the model using your chosen metrics
# Example for classification
# y_pred = model.predict(X_test)
# print(classification_report(y_test, y_pred))

# Example for regression
# mse = mean_squared_error(y_test, y_pred)

# Your evaluation code here

# evaluating with MAPE
def mape_excluding_zeros(y_true, y_pred):
    y_true = np.array(y_true)
    y_pred = np.array(y_pred)
    mask = y_true != 0
    if mask.sum() == 0:
        return np.nan
    return mean_absolute_percentage_error(y_true[mask], y_pred[mask]) * 100

from sklearn.metrics import mean_absolute_percentage_error

train_mape = mape_excluding_zeros(y_train, train_preds)
val_mape = mape_excluding_zeros(y_val, val_preds)
test_mape = mape_excluding_zeros(y_test, test_preds)

print("\nMAPE (excluding y=0):")
print(f"Training MAPE:   {train_mape:.3f}%")
print(f"Validation MAPE: {val_mape:.3f}%")
print(f"Test MAPE:       {test_mape:.3f}%")


val_results = pd.DataFrame({
    "y_true": y_val.reset_index(drop=True),
    "y_pred": val_preds,
    "Product_Group": val_groups
})

def group_mape(g):
    g = g[g["y_true"] != 0]
    if len(g) == 0:
        return np.nan
    return mean_absolute_percentage_error(g["y_true"], g["y_pred"]) * 100

group_mape_scores = val_results.groupby("Product_Group").apply(group_mape).sort_index()

print("\nValidation MAPE by Product Group (excluding y=0):")
print(group_mape_scores)




MAPE (excluding y=0):
Training MAPE:   18.087%
Validation MAPE: 21.961%
Test MAPE:       36.391%

Validation MAPE by Product Group (excluding y=0):
Product_Group
Product_1    30.845304
Product_2    15.065353
Product_3    27.906969
Product_4    21.325962
Product_5    14.638818
Product_6          NaN
dtype: float64


## Comparative Analysis

The neural network achieved a training MAPE of 18.61%, validation MAPE of 24.19% and test MAPE of 37.76%. Compared to the OLS baseline (RMSE ≈ 76.82, R² ≈ 0.75), the neural network significantly reduced validation MSE (≈ 3049). This indicates that the neural network captures non-linear relationships between seasonal patterns, lag features, and product categories more effectively than the linear model. However, the increase in error from training to test suggests some degree of overfitting. Despite this, the neural network demonstrates better predictive performance compared to the baseline.