# üéØ Retail Inventory Forecast - FINAL OPTIMAL

**Projekt:** Vorhersage von Units Sold f√ºr Retail Stores  
**Modell:** Bidirektionales LSTM mit 2 Layern  
**Status:** Optimale Balance zwischen Varianz (Std ~10) und Overfitting (<1.3)

**Optimierungen gegen√ºber Experiment 2.1:**
- Batch Size: 256 ‚Üí 384 (glattere Gradienten)
- Dropout: 0.2 ‚Üí 0.25 (weniger Overfitting)
- L2 Reg: 0.0001 ‚Üí 0.00015 (st√§rkere Regularisierung)
- Patience: 10 ‚Üí 8 (fr√ºher stoppen)


In [None]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from dataclasses import dataclass
from typing import Tuple
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping

import warnings
warnings.filterwarnings('ignore')

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 5)

In [None]:
@dataclass
class Config:
    """Zentrale Konfiguration - alle Parameter hier definiert!"""
    # Pfade
    data_path: str
    target_col: str
    
    # Sequenz & Training
    seq_length: int
    test_size: float
    batch_size: int
    epochs: int
    patience: int
    
    # Model Architektur
    use_bidirectional: bool
    lstm_layers: int
    lstm_units_1: int
    lstm_units_2: int
    dense_units: int
    dense_activation: str
    dropout: float
    l2_reg: float
    
    # Optimizer
    learning_rate: float
    use_lr_scheduler: bool
    lr_factor: float
    lr_patience: int
    lr_min: float
    
    # Feature Engineering
    lag_periods: list
    rolling_windows: list

# üéØ FINAL OPTIMAL CONFIG - Balance zwischen Varianz und Overfitting
config = Config(
    # Pfade
    data_path="/Users/mag/Library/Mobile Documents/com~apple~CloudDocs/Studium/7. Semester/Machine und Deep Learning/Bestands_Forecast/retail_store_inventory.csv",
    target_col='Units Sold',
    
    # Sequenz & Training
    seq_length=60,              # Bew√§hrt: genug Kontext ohne zu viel Gl√§ttung
    test_size=0.2,
    batch_size=384,             # üî• Optimal zwischen 256 und 512
    epochs=100,
    patience=8,                 # üî• Fr√ºher stoppen: 10 ‚Üí 8
    
    # Model Architektur - GROSSE KAPAZIT√ÑT (wichtig f√ºr Varianz!)
    use_bidirectional=True,     # Mehr Patterns erkennbar
    lstm_layers=2,              # Hierarchisches Lernen
    lstm_units_1=256,           # Gro√üe Kapazit√§t f√ºr Varianz (BEHALTEN!)
    lstm_units_2=128,           # Bew√§hrt
    dense_units=64,
    dense_activation='relu',    # Standard f√ºr Dense Layers
    dropout=0.25,               # üî• Leicht erh√∂ht: 0.2 ‚Üí 0.25
    l2_reg=0.00015,             # üî• Leicht erh√∂ht: 0.0001 ‚Üí 0.00015
    
    # Optimizer
    learning_rate=0.0002,       # Niedrig f√ºr stabiles Training
    use_lr_scheduler=True,
    lr_factor=0.7,
    lr_patience=8,
    lr_min=0.00001,
    
    # Feature Engineering
    lag_periods=[1, 7, 30],     # 1 Tag, 1 Woche, 1 Monat
    rolling_windows=[7, 30]     # Woche & Monat
)

print("="*70)
print("üéØ FINAL OPTIMAL: Std 10+ & Overfitting <1.3")
print("="*70)
print(f"Sequenz:          {config.seq_length} Tage")
print(f"LSTM:             {config.lstm_layers} Layers, {config.lstm_units_1}‚Üí{config.lstm_units_2} Units ({'Bidirektional' if config.use_bidirectional else 'Unidirektional'})")
print(f"Dense:            {config.dense_units} Units, {config.dense_activation}")
print(f"Regularisierung:  Dropout={config.dropout} üî•, L2={config.l2_reg} üî•")
print(f"Learning Rate:    {config.learning_rate}")
print(f"Training:         {config.epochs} Epochen, Patience={config.patience} üî•")
print(f"Batch Size:       {config.batch_size} üî•")
print()
print("üî• = Optimiert gegen√ºber vorheriger Version")
print("Ziel: Overfitting 1.43 ‚Üí <1.3, Std ~10 beibehalten")
print("="*70)


## 1. Daten laden

In [None]:
def load_data(path: str) -> pd.DataFrame:
    """L√§dt und bereitet die Rohdaten vor."""
    df = pd.read_csv(path)
    df['Date'] = pd.to_datetime(df['Date'])
    
    print(f"‚úì Daten geladen: {df.shape}")
    print(f"  Zeitraum: {df['Date'].min()} bis {df['Date'].max()}")
    print(f"  Stores: {df['Store ID'].nunique()}, Products: {df['Product ID'].nunique()}")
    
    return df

df = load_data(config.data_path)

## 2. Feature Engineering

In [None]:
def create_temporal_features(df: pd.DataFrame, config: Config) -> pd.DataFrame:
    """Erstellt zeitbasierte Features PRO Store-Product Gruppe."""
    
    for (store, product), group in df.groupby(['Store_ID_Encoded', 'Product_ID_Encoded']):
        idx = group.index
        
        # Lag Features
        for lag in config.lag_periods:
            df.loc[idx, f'{config.target_col}_lag_{lag}'] = group[config.target_col].shift(lag)
        
        # Rolling Features
        for window in config.rolling_windows:
            df.loc[idx, f'{config.target_col}_rolling_mean_{window}'] = group[config.target_col].rolling(window).mean()
            df.loc[idx, f'{config.target_col}_rolling_std_{window}'] = group[config.target_col].rolling(window).std()
        
        # Diff Features
        df.loc[idx, f'{config.target_col}_diff_1'] = group[config.target_col].diff(1)
    
    return df

def engineer_features(df: pd.DataFrame, config: Config) -> pd.DataFrame:
    """Erstellt Features und encodiert kategoriale Variablen."""
    df['Store_ID_Encoded'] = df['Store ID'].astype('category').cat.codes
    df['Product_ID_Encoded'] = df['Product ID'].astype('category').cat.codes
    
    df = df.sort_values(['Store_ID_Encoded', 'Product_ID_Encoded', 'Date']).reset_index(drop=True)
    
    df = create_temporal_features(df, config)
    
    df = df.dropna().reset_index(drop=True)
    df = pd.get_dummies(df, columns=['Category', 'Region', 'Weather Condition', 'Seasonality'])
    df = df.drop(columns=['Store ID', 'Product ID'])
    
    print(f"‚úì Features: {df.shape[1]} Spalten | Zeilen: {df.shape[0]}")
    return df

df = engineer_features(df, config)


## 3. Train/Test Split

In [None]:
def train_test_split(df: pd.DataFrame, test_size: float) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """Teilt Daten zeitbasiert in Train/Test."""
    split_idx = int(len(df) * (1 - test_size))
    df_train = df.iloc[:split_idx].copy()
    df_test = df.iloc[split_idx:].copy()
    
    print(f"‚úì Train: {len(df_train)}, Test: {len(df_test)}")
    return df_train, df_test

df_train, df_test = train_test_split(df, config.test_size)

## 4. Skalierung

In [None]:
def scale_data(df_train: pd.DataFrame, df_test: pd.DataFrame, target_col: str) -> Tuple[pd.DataFrame, pd.DataFrame, StandardScaler, StandardScaler, list]:
    """Skaliert Features und Target."""
    feature_cols = [col for col in df_train.columns 
                    if col not in [target_col, 'Date', 'Store_ID_Encoded', 'Product_ID_Encoded']]
    
    scaler_X = StandardScaler()
    scaler_y = StandardScaler()
    
    df_train[feature_cols] = scaler_X.fit_transform(df_train[feature_cols])
    df_test[feature_cols] = scaler_X.transform(df_test[feature_cols])
    
    df_train[[target_col]] = scaler_y.fit_transform(df_train[[target_col]])
    df_test[[target_col]] = scaler_y.transform(df_test[[target_col]])
    
    print(f"‚úì {len(feature_cols)} Features skaliert")
    return df_train, df_test, scaler_X, scaler_y, feature_cols

df_train, df_test, scaler_X, scaler_y, feature_cols = scale_data(df_train, df_test, config.target_col)

## 5. Sequenzen erstellen

In [None]:
def create_sequences(df: pd.DataFrame, feature_cols: list, target_col: str, seq_length: int) -> Tuple[np.ndarray, np.ndarray]:
    """Erstellt Sequenzen PRO Store-Product Gruppe."""
    X_all, y_all = [], []
    
    for (store, product), group in df.groupby(['Store_ID_Encoded', 'Product_ID_Encoded']):
        features = group[feature_cols].values
        target = group[target_col].values
        
        for i in range(len(group) - seq_length):
            X_all.append(features[i:i + seq_length])
            y_all.append(target[i + seq_length])
    
    return np.array(X_all), np.array(y_all)

X_train, y_train = create_sequences(df_train, feature_cols, config.target_col, config.seq_length)
X_test, y_test = create_sequences(df_test, feature_cols, config.target_col, config.seq_length)

print(f"‚úì Sequenzen: Train {X_train.shape} | Test {X_test.shape}")

## 6. LSTM Modell (Fast Version)

In [None]:
def build_lstm_model(config: Config, n_features: int) -> models.Sequential:
    """Erstellt LSTM-Modell basierend auf Config."""
    
    l2_regularizer = tf.keras.regularizers.l2(config.l2_reg) if config.l2_reg > 0 else None
    
    model_layers = [layers.Input(shape=(config.seq_length, n_features))]
    
    # Erster LSTM Layer
    if config.use_bidirectional:
        model_layers.append(layers.Bidirectional(
            layers.LSTM(
                config.lstm_units_1, 
                return_sequences=(config.lstm_layers > 1),
                kernel_regularizer=l2_regularizer,
                recurrent_regularizer=l2_regularizer
            )
        ))
    else:
        model_layers.append(
            layers.LSTM(
                config.lstm_units_1, 
                return_sequences=(config.lstm_layers > 1),
                kernel_regularizer=l2_regularizer,
                recurrent_regularizer=l2_regularizer
            )
        )
    
    model_layers.append(layers.Dropout(config.dropout))
    
    # Zweiter LSTM Layer (optional)
    if config.lstm_layers > 1:
        if config.use_bidirectional:
            model_layers.append(layers.Bidirectional(
                layers.LSTM(config.lstm_units_2, return_sequences=False,
                           kernel_regularizer=l2_regularizer, recurrent_regularizer=l2_regularizer)
            ))
        else:
            model_layers.append(
                layers.LSTM(config.lstm_units_2, return_sequences=False,
                           kernel_regularizer=l2_regularizer, recurrent_regularizer=l2_regularizer)
            )
        
        model_layers.append(layers.Dropout(config.dropout))
    
    # Dense Layers
    model_layers.append(layers.Dense(config.dense_units, activation=config.dense_activation, 
                                     kernel_regularizer=l2_regularizer))
    model_layers.append(layers.Dense(1))
    
    model = models.Sequential(model_layers)
    
    optimizer = tf.keras.optimizers.Adam(learning_rate=config.learning_rate)
    model.compile(optimizer=optimizer, loss='mse', metrics=['mae'])
    
    return model

model = build_lstm_model(config, n_features=len(feature_cols))
model.summary()


## 7. Training

In [None]:
def train_model(model: models.Sequential, X_train: np.ndarray, y_train: np.ndarray, 
                X_test: np.ndarray, y_test: np.ndarray, config: Config):
    """Trainiert das Modell mit Early Stopping und LR Scheduler."""
    callbacks = [
        EarlyStopping(patience=config.patience, restore_best_weights=True, monitor='val_loss', verbose=1)
    ]
    
    if config.use_lr_scheduler:
        callbacks.append(
            tf.keras.callbacks.ReduceLROnPlateau(
                monitor='val_loss', factor=config.lr_factor, 
                patience=config.lr_patience, min_lr=config.lr_min, verbose=1
            )
        )
    
    print(f"üöÄ Training startet:")
    print(f"   Epochen: {config.epochs} | Batch: {config.batch_size} | LR: {config.learning_rate}")
    print("-" * 50)
    
    history = model.fit(
        X_train, y_train,
        validation_data=(X_test, y_test),
        epochs=config.epochs,
        batch_size=config.batch_size,
        callbacks=callbacks,
        verbose=1
    )
    return history

history = train_model(model, X_train, y_train, X_test, y_test, config)


In [None]:
# Training Summary
print("\n" + "="*50)
print("üìà TRAINING ABGESCHLOSSEN")
print("="*50)
print(f"Beste Val Loss: {min(history.history['val_loss']):.4f}")
print(f"Beste Val MAE: {min(history.history['val_mae']):.4f}")

train_loss = history.history['loss'][-1]
val_loss = history.history['val_loss'][-1]
ratio = val_loss / train_loss

print(f"\nüîç Overfitting-Check: Ratio = {ratio:.2f}")
if ratio < 1.1:
    print("  ‚úì Kein Overfitting")
elif ratio < 1.3:
    print("  ‚ö†Ô∏è  Leichtes Overfitting")
else:
    print("  ‚ùå Starkes Overfitting")

## 8. Evaluation

In [None]:
def evaluate_model(model: models.Sequential, X_test: np.ndarray, y_test: np.ndarray, 
                   scaler_y: StandardScaler) -> Tuple[np.ndarray, np.ndarray, dict]:
    """Evaluiert das Modell."""
    y_pred = model.predict(X_test)
    
    y_test_original = scaler_y.inverse_transform(y_test.reshape(-1, 1)).flatten()
    y_pred_original = scaler_y.inverse_transform(y_pred).flatten()
    
    mae = np.mean(np.abs(y_test_original - y_pred_original))
    rmse = np.sqrt(np.mean((y_test_original - y_pred_original)**2))
    
    metrics = {
        'mae': mae, 'rmse': rmse,
        'pred_mean': y_pred_original.mean(), 'pred_std': y_pred_original.std(),
        'pred_min': y_pred_original.min(), 'pred_max': y_pred_original.max(),
        'actual_mean': y_test_original.mean(), 'actual_std': y_test_original.std()
    }
    
    return y_test_original, y_pred_original, metrics

y_test_original, y_pred_original, metrics = evaluate_model(model, X_test, y_test, scaler_y)

print("="*50)
print("üìä ERGEBNISSE")
print("="*50)
print(f"MAE:  {metrics['mae']:.2f} | RMSE: {metrics['rmse']:.2f}")
print(f"\nPredictions: Mean={metrics['pred_mean']:.2f}, Std={metrics['pred_std']:.2f}")
print(f"Actual:      Mean={metrics['actual_mean']:.2f}, Std={metrics['actual_std']:.2f}")

## 9. Visualisierung

In [None]:
def plot_results(history, y_test_original: np.ndarray, y_pred_original: np.ndarray):
    """Erstellt Visualisierungen."""
    fig, axes = plt.subplots(1, 3, figsize=(16, 5))
    
    # Loss
    axes[0].plot(history.history['loss'], label='Train')
    axes[0].plot(history.history['val_loss'], label='Val')
    axes[0].set_title('Loss √ºber Epochen')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('MSE Loss')
    axes[0].legend()
    axes[0].grid(alpha=0.3)
    
    # Scatter
    idx = np.random.choice(len(y_test_original), min(500, len(y_test_original)), replace=False)
    axes[1].scatter(y_test_original[idx], y_pred_original[idx], alpha=0.5, s=20)
    axes[1].plot([50, 500], [50, 500], 'r--', lw=2)
    axes[1].set_title('Predicted vs Actual')
    axes[1].set_xlabel('Actual')
    axes[1].set_ylabel('Predicted')
    axes[1].grid(alpha=0.3)
    
    # Zeitreihe
    n = min(200, len(y_test_original))
    axes[2].plot(y_test_original[:n], label='Actual', alpha=0.7)
    axes[2].plot(y_pred_original[:n], label='Predicted', alpha=0.7)
    axes[2].set_title(f'Zeitreihe (erste {n} Samples)')
    axes[2].set_xlabel('Sample')
    axes[2].set_ylabel('Units Sold')
    axes[2].legend()
    axes[2].grid(alpha=0.3)
    
    plt.tight_layout()
    plt.show()

plot_results(history, y_test_original, y_pred_original)

---

## üìà Modell-Architektur

**LSTM Netzwerk:**
- 2 Bidirektionale LSTM Layers (256‚Üí128 Units)
- Dropout: 0.2 nach jedem Layer
- Dense Layer: 64 Units mit ReLU
- Output: 1 Unit (Regression)

**Training:**
- Optimizer: Adam (LR: 0.0002)
- Loss: MSE mit MAE Metrik
- Early Stopping: Patience 10
- LR Scheduler: ReduceLROnPlateau

**Daten:**
- 60-Tage-Sequenzen
- 100 Gruppen (5 Stores √ó 20 Products)
- Features: Lag, Rolling, Diff + kategoriale Variablen
- 80/20 Train/Test Split (zeitbasiert)

**Ziel-Metriken:**
- Prediction Std > 10 (Varianz)
- MAE < 85
- Overfitting Ratio < 1.3


---

## üöÄ Ausf√ºhrung

**Vollst√§ndiger Durchlauf:**
1. `Run` ‚Üí `Run All Cells`
2. Dauer: ~9-12 Minuten
3. Ergebnisse in Zelle 18 (Training) und 20 (Evaluation)

**Parameter anpassen:**
1. √Ñndere Werte in Config-Zelle (Zelle 3)
2. F√ºhre ab Zelle 13 neu aus (Sequenzen, Modell, Training)

**Key-Metriken pr√ºfen:**
- Overfitting Ratio (Zelle 18): sollte < 1.3 sein
- Prediction Std (Zelle 20): sollte > 10 sein
- MAE (Zelle 20): sollte < 90 sein
