# Model Definition and Evaluation
## Table of Contents
1. [Model Selection](#model-selection)
2. [Feature Engineering](#feature-engineering)
3. [Hyperparameter Tuning](#hyperparameter-tuning)
4. [Implementation](#implementation)
5. [Evaluation Metrics](#evaluation-metrics)
6. [Comparative Analysis](#comparative-analysis)


In [6]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report
# Import models you're considering


## Model Selection

[For this revenue forecasting task, the most relevant model families are linear regression (high interpretability, strong baseline), tree ensembles like Random Forest/GBM (excellent for tabular nonlinear interactions), and neural networks (good when many engineered features interact in complex ways). We selected a fast-foward neural network (Keras Sequential MLP), defined with Dense hidden layers and Batch Normalization, as the primarz model. We prefered that model, because the feautres are tabular, but interaction heavy (e.g. calendar and weather effects), and MLP can learn nonlinear cross effects better than a linear model.]



## Feature Engineering

[Describe any additional feature engineering you've performed beyond what was done for the baseline model.]


In [7]:
# Load the dataset
# Replace 'your_dataset.csv' with the path to your actual dataset
df = pd.read_csv('your_dataset.csv')

# Perform any feature engineering steps
# Example: df['new_feature'] = df['feature1'] + df['feature2']

# Feature and target variable selection
X = df[['your', 'selected', 'features']]
y = df['target_variable']

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


FileNotFoundError: [Errno 2] No such file or directory: 'your_dataset.csv'

## Hyperparameter Tuning

[Discuss any hyperparameter tuning methods you've applied, such as Grid Search or Random Search, and the rationale behind them.]


In [None]:
# Implement hyperparameter tuning
# Example using GridSearchCV with a DecisionTreeClassifier
# param_grid = {'max_depth': [2, 4, 6, 8]}
# grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5)
# grid_search.fit(X_train, y_train)


## Implementation

[Implement the final model(s) you've selected based on the above steps.]


In [None]:
# Implement the final model (MLP from MF_neural_net_estimation)
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer, Dense, BatchNormalization
from tensorflow.keras.optimizers import Adam

# Load prepared features/labels from pickle files
subdirectory = "pickle_data"
training_features = pd.read_pickle(f"{subdirectory}/training_features.pkl")
validation_features = pd.read_pickle(f"{subdirectory}/validation_features.pkl")
training_labels = pd.read_pickle(f"{subdirectory}/training_labels.pkl")
validation_labels = pd.read_pickle(f"{subdirectory}/validation_labels.pkl")

model = Sequential([
    InputLayer(shape=(training_features.shape[1],)),
    BatchNormalization(),
    Dense(10, activation="relu"),
    Dense(4, activation="relu"),
    Dense(1),
])

model.compile(loss="mse", optimizer=Adam(learning_rate=0.001))
history = model.fit(
    training_features,
    training_labels,
    epochs=20,
    validation_data=(validation_features, validation_labels),
)

model.save("python_model.h5")

2026-02-23 18:29:51.521992: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2026-02-23 18:29:51.549483: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2026-02-23 18:29:53.225268: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2026-02-23 18:29:56.117744: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To tur

Epoch 1/20


2026-02-23 18:29:58.201608: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


[1m239/239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - loss: 53055.5273 - val_loss: 49176.6992
Epoch 2/20
[1m239/239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 29690.0840 - val_loss: 16204.8867
Epoch 3/20
[1m239/239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 12076.5068 - val_loss: 10658.3516
Epoch 4/20
[1m239/239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 8096.8789 - val_loss: 7847.6914
Epoch 5/20
[1m239/239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 6660.0972 - val_loss: 7088.0107
Epoch 6/20
[1m239/239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 6033.3896 - val_loss: 6798.0093
Epoch 7/20
[1m239/239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 5964.6709 - val_loss: 6641.3784
Epoch 8/20
[1m239/239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 5831.7998 - val_loss: 6468.52



## Evaluation Metrics

We evaluate performance using Mean Absolute Percentage Error (MAPE) and Mean Squared Error (MSE). MAPE is the primary metric because it is scale-free and easy to interpret as a percent error across product groups and time. MSE is tracked during training because it aligns with the loss function used to optimize the neural network and is sensitive to larger errors, which helps penalize large revenue misses. We report metrics on both training and validation sets to monitor generalization and detect overfitting.

In [None]:
# Evaluate the model using your chosen metrics
# Example for classification
# y_pred = model.predict(X_test)
# print(classification_report(y_test, y_pred))

# Example for regression
# mse = mean_squared_error(y_test, y_pred)

# Your evaluation code here

import numpy as np
import pandas as pd

# -----------------------------
# MAPE FUNCTION (SAFE VERSION)
# -----------------------------
def mape(y_true, y_pred, eps=1e-8):
    y_true = np.asarray(y_true).reshape(-1)
    y_pred = np.asarray(y_pred).reshape(-1)
    denom = np.maximum(np.abs(y_true), eps)
    return 100.0 * np.mean(np.abs(y_true - y_pred) / denom)


# -----------------------------
# PREDICTIONS
# -----------------------------
training_predictions = model.predict(training_features).reshape(-1)
validation_predictions = model.predict(validation_features).reshape(-1)

training_labels = np.asarray(training_labels).reshape(-1)
validation_labels = np.asarray(validation_labels).reshape(-1)


# -----------------------------
# OVERALL MAPE
# -----------------------------
print(f"MAPE on the Training Data: {mape(training_labels, training_predictions):.2f}%")
print(f"MAPE on the Validation Data: {mape(validation_labels, validation_predictions):.2f}%")


# -----------------------------
# RECONSTRUCT WARENGRUPPE FROM DUMMIES
# WG_2 ... WG_6 exist, WG_1 is the reference group
# -----------------------------
def reconstruct_wg(df):
    wg_cols = [c for c in df.columns if c.startswith("WG_")]
    
    # Default group = 1 (reference)
    wg = np.ones(len(df), dtype=int)
    
    for col in wg_cols:
        group_number = int(col.split("_")[1])
        wg[df[col] == 1] = group_number
    
    return wg


training_wg = reconstruct_wg(training_features)
validation_wg = reconstruct_wg(validation_features)


# -----------------------------
# BUILD DATAFRAMES FOR GROUPED EVALUATION
# -----------------------------
train_df = pd.DataFrame({
    "Warengruppe": training_wg,
    "y_true": training_labels,
    "y_pred": training_predictions
})

val_df = pd.DataFrame({
    "Warengruppe": validation_wg,
    "y_true": validation_labels,
    "y_pred": validation_predictions
})


# -----------------------------
# MAPE PER WARENGRUPPE
# -----------------------------


mape_val_wg = (
    val_df
    .groupby("Warengruppe")
    .apply(lambda x: mape(x["y_true"], x["y_pred"]))
    .reset_index(name="MAPE_Validation")
)

mape_per_wg = (
    mape_val_wg.sort_values("Warengruppe")
)

print("\nMAPE per product category (Warengruppe):")
print(mape_per_wg)


[1m239/239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step
[1m69/69[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 988us/step
MAPE on the Training Data: 6957365970.34%
MAPE on the Validation Data: 4494038732.43%

MAPE per product category (Warengruppe):
   Warengruppe  MAPE_Validation
0            1     4.494039e+09


## Comparative Analysis

We compare the pooled linear regression in MF_20251113 with the MLP neural network from MF_neural_net_estimation. The linear model is a log1p-OLS with calendar effects (weekend, holidays, Kieler Woche), Fourier seasonality terms, lagged revenue features, and Warengruppe dummies, plus a group-wise bias correction after back-transforming predictions. This gives strong interpretability (coefficients by feature) and stable behavior, but it is limited to additive linear effects in log space and may miss higher-order interactions.

The neural network uses the same prepared feature set (stored in pickle files) and learns nonlinear interactions through hidden layers and batch normalization. It is typically more flexible for complex, interaction-heavy effects (e.g., weather x calendar x product group), but it is less interpretable and more sensitive to data scaling and training choices. We evaluate both models with MAPE on training/validation and can also compare MSE to align with the NN loss. The final choice balances interpretability and operational stability (linear model) against potential accuracy gains from nonlinear effects (neural network).

In [None]:
# Comparative analysis: linear regression (MF_20251113-style) vs neural net (MF_neural_net_estimation)
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from tensorflow.keras.models import load_model
from tensorflow.keras.optimizers import Adam

def mape(y_true, y_pred, eps=1e-8):
    y_true = np.asarray(y_true).reshape(-1)
    y_pred = np.asarray(y_pred).reshape(-1)
    denom = np.maximum(np.abs(y_true), eps)
    return 100.0 * np.mean(np.abs(y_true - y_pred) / denom)

# -----------------------------
# Load tabular features used by the linear regression
# -----------------------------
model_df = pd.read_csv("model_df.csv")
model_df = model_df.replace([np.inf, -np.inf], np.nan).dropna()

exclude_cols = {"Umsatz", "Warengruppe", "log_Umsatz"}
predictors = [c for c in model_df.columns if c not in exclude_cols]

# Reproduce the same split procedure (seed=42, shuffle, 70/20/10)
model_df = model_df.sample(frac=1, random_state=42).reset_index(drop=True)
n_total = len(model_df)
n_train = int(0.7 * n_total)
n_val = int(0.20 * n_total)

train_df = model_df.iloc[:n_train]
val_df = model_df.iloc[n_train:n_train + n_val]

X_train = train_df[predictors]
X_val = val_df[predictors]
y_train_log = np.log1p(train_df["Umsatz"])
y_val = val_df["Umsatz"].to_numpy()

# Fit linear regression on log1p(Umsatz) like MF_20251113
try:
    import statsmodels.api as sm
    X_train_sm = sm.add_constant(X_train)
    X_val_sm = sm.add_constant(X_val, has_constant="add")
    results = sm.OLS(y_train_log, X_train_sm).fit()
    y_pred_log = results.predict(X_val_sm)
except ModuleNotFoundError:
    from sklearn.linear_model import LinearRegression
    lr = LinearRegression()
    lr.fit(X_train, y_train_log)
    y_pred_log = lr.predict(X_val)

y_pred_lr = np.expm1(y_pred_log)
mape_lr = mape(y_val, y_pred_lr)
mse_lr = mean_squared_error(y_val, y_pred_lr)

# -----------------------------
# Neural network predictions on the same validation split (pickle files)
# -----------------------------
subdirectory = "pickle_data"
validation_features = pd.read_pickle(f"{subdirectory}/validation_features.pkl")
validation_labels = pd.read_pickle(f"{subdirectory}/validation_labels.pkl")

# Load without compilation to avoid legacy H5 metric deserialization issues
nn_model = load_model("python_model.h5", compile=False)
nn_model.compile(loss="mse", optimizer=Adam(learning_rate=0.001))
nn_pred_val = nn_model.predict(validation_features).reshape(-1)
nn_true_val = np.asarray(validation_labels).reshape(-1)

mape_nn = mape(nn_true_val, nn_pred_val)
mse_nn = mean_squared_error(nn_true_val, nn_pred_val)

comparison = pd.DataFrame({
    "Model": ["Linear Regression (log1p)", "Neural Net (MLP)"],
    "MAPE_Validation": [mape_lr, mape_nn],
    "MSE_Validation": [mse_lr, mse_nn],
})

print(comparison)

[1m69/69[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
                       Model  MAPE_Validation  MSE_Validation
0  Linear Regression (log1p)     2.213580e+01     4498.273482
1           Neural Net (MLP)     4.494039e+09     5245.824342
