<a href="https://colab.research.google.com/github/laraAkg/Data-Science-Project/blob/main/Model_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This code block is designed to **initialize the project environment** and **manage file paths**, specifically for use within **Google Colab when integrated with Google Drive**.

**Key Functions:**

-   **Colab Detection:** Automatically checks if the notebook is executing in a Google Colab environment.
-   **Google Drive Integration:** If in Colab, it mounts Google Drive to enable persistent storage and file operations.
-   **Directory Structure:** Defines a `BASE_DIR` (either in Google Drive or a local directory) and creates essential subdirectories (e.g., `datasets`, `plots`, `models_tf`) to maintain an organized project output structure.
-   **Standard Image Size:** Sets `IMG_SIZE` (128x128), establishing a global configuration for image processing tasks later in the project.

In [None]:
# === 0) Colab/Drive Setup & Project Paths ===
import shutil
from pathlib import Path

try:
    import google.colab  # type: ignore
    IN_COLAB = True
except Exception:
    IN_COLAB = False

if IN_COLAB:
    from google.colab import drive  # type: ignore
    drive.mount('/content/drive', force_remount=True)

DEFAULT_PROJECT_DIR = "MyDrive/Generated Data for Data science project"
BASE_DIR = Path("/content/drive") / DEFAULT_PROJECT_DIR if IN_COLAB else Path("./project_outputs")

DATA_DIR   = BASE_DIR / "datasets"
PLOTS_DIR  = BASE_DIR / "plots"
META_DIR   = BASE_DIR / "metadata"
MODELS_DIR = BASE_DIR / "models_tf"
REPORTS_DIR= BASE_DIR / "reports"
REAL_DIR   = BASE_DIR / "real"
BEST_MODEL_PATH = MODELS_DIR / "best_model_keras.h5"
BEST_MODEL_META = REPORTS_DIR / "best_model_meta.json"

folders_to_reset = [MODELS_DIR, REPORTS_DIR]

for folder in folders_to_reset:
    if folder.exists():
        print(f"[INFO] Entferne alten Ordner: {folder}")
        shutil.rmtree(folder)
    folder.mkdir(parents=True, exist_ok=True)
    print(f"[INFO] Neu erstellt: {folder}")

for p in [DATA_DIR, PLOTS_DIR, META_DIR, MODELS_DIR, REPORTS_DIR, REAL_DIR]:
    p.mkdir(parents=True, exist_ok=True)

IMG_SIZE = (128, 128)  # (H, W)

print("BASE_DIR:", BASE_DIR)

Mounted at /content/drive
[INFO] Entferne alten Ordner: /content/drive/MyDrive/Generated Data for Data science project/models_tf
[INFO] Neu erstellt: /content/drive/MyDrive/Generated Data for Data science project/models_tf
[INFO] Entferne alten Ordner: /content/drive/MyDrive/Generated Data for Data science project/reports
[INFO] Neu erstellt: /content/drive/MyDrive/Generated Data for Data science project/reports
BASE_DIR: /content/drive/MyDrive/Generated Data for Data science project


In [None]:
# === 2.1) Imports & global config (TF/Keras) ===
import os, json, math, time, random, csv
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers

from sklearn.metrics import accuracy_score, f1_score, roc_auc_score, average_precision_score, confusion_matrix
from sklearn.model_selection import StratifiedKFold

SEED = 42
random.seed(SEED); np.random.seed(SEED); tf.random.set_seed(SEED)

DEVICE = "GPU" if len(tf.config.list_physical_devices('GPU'))>0 else "CPU"
BATCH_SIZE_DEFAULT = 32
EPOCHS_DEFAULT = 8

print("TF device:", DEVICE)

TF device: GPU


This cell loads the previously saved metadata and creates a table of samples used for training and evaluating the model. Each sample represents a set of plots (original and augmentations) and its corresponding label (heavy-tailed or not).

- `INDEX_JSON`: Path to the metadata file.
- `records`: Loads the metadata from the JSON file.
- `samples`: A list of dictionaries containing each individual sample (plot set) and its label.
- `uniq_rows`: A list of dictionaries representing each unique dataset, to perform cross-validation at the dataset level later.

In [None]:
# === 2.2) Load metadata & build samples table (incl. augs) ===
INDEX_JSON = META_DIR / "datasets_index.json"
with open(INDEX_JSON, "r", encoding="utf-8") as f:
    records = json.load(f)

samples = []
uniq_rows = []
for r in records:
    ds_id = r["dataset_id"]
    label = int(r["heavy_tailed"])
    uniq_rows.append({"dataset_id": ds_id, "label": label})
    samples.append({"dataset_id": ds_id, "variant": "original", "paths": r["plots"]["original"], "label": label})
    for aug_name, aug_paths in r["plots"]["aug"].items():
        samples.append({"dataset_id": ds_id, "variant": aug_name, "paths": aug_paths, "label": label})

print("Unique dataset_ids:", len(uniq_rows))
print("Total samples (incl. augs):", len(samples))

Unique dataset_ids: 600
Total samples (incl. augs): 2400


This cell contains helper functions for loading and preprocessing image data (plots) and for creating TensorFlow `Dataset` objects for training.

- `load_gray_resized`: Loads an image, converts it to grayscale, resizes it, and normalizes pixel values.
- `stack_zipf_qq_me`: Loads the three plots (Zipf, QQ, ME) for a sample, converts them to grayscale arrays, and stacks them into a single 3-channel image array.
- `sample_to_example`: Takes a sample row and creates the input image array (`x`) and the label (`y`) for the neural network.
- `rows_for_ids`: Filters the sample rows based on a list of dataset IDs.
- `make_tf_dataset`: Creates a TensorFlow `Dataset` from a list of sample rows. It handles batching the data and optional shuffling.

In [None]:
# === 2.3) Image helpers & tf.data builder ===
def load_gray_resized(path):
    img = Image.open(path).convert("L")
    img = img.resize((IMG_SIZE[1], IMG_SIZE[0]))
    arr = np.asarray(img).astype("float32") / 255.0
    return arr

def stack_zipf_qq_me(paths_dict):
    z = load_gray_resized(paths_dict["zipf"])
    q = load_gray_resized(paths_dict["qq_exp"])
    m = load_gray_resized(paths_dict["me"])
    return np.stack([z,q,m], axis=-1)

def sample_to_example(row):
    x = stack_zipf_qq_me(row["paths"])
    y = np.float32(row["label"])
    return x, y

def rows_for_ids(id_set):
    s = set(id_set)
    return [row for row in samples if row["dataset_id"] in s]

def make_tf_dataset(rows, batch_size=BATCH_SIZE_DEFAULT, shuffle=False):
    xs, ys = [], []
    for r in rows:
        x,y = sample_to_example(r); xs.append(x); ys.append(y)
    xs = np.stack(xs, axis=0); ys = np.array(ys, dtype=np.float32)
    ds = tf.data.Dataset.from_tensor_slices((xs, ys))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(xs), seed=SEED, reshuffle_each_iteration=True)
    return ds.batch(batch_size).prefetch(tf.data.AUTOTUNE)

This cell defines the architectures of the neural networks (Keras models) used for classification. There are two variants: a "Baseline" with standard convolutional layers and a "Separable" with separable convolutional layers, which are often more efficient. Dropout is added for regularization.

- `ConvBlock`: Defines a block consisting of convolution, batch normalization, and ReLU activation.
- `SepConvBlock`: Defines a block consisting of separable convolution, batch normalization, and ReLU activation.
- `build_baseline`: Creates the "Baseline" CNN model.
- `build_separable`: Creates the "Separable" CNN model.
- Both models end with GlobalAveragePooling2D, Dropout, and a Dense layer with one output for binary classification (heavy-tailed or not). L2 regularization and Dropout can be configured.

In [None]:
# === 2.4) Keras models (Baseline & Separable) with Dropout ===
def ConvBlock(x, filters, k=3, s=1, l2=0.0):
    x = layers.Conv2D(filters, k, strides=s, padding="same",
                      use_bias=False, kernel_regularizer=regularizers.l2(l2))(x)
    x = layers.BatchNormalization()(x); x = layers.ReLU()(x); return x

def SepConvBlock(x, filters, k=3, s=1, l2=0.0):
    x = layers.SeparableConv2D(filters, k, strides=s, padding="same", use_bias=False,
                               depthwise_regularizer=regularizers.l2(l2), pointwise_regularizer=regularizers.l2(l2))(x)
    x = layers.BatchNormalization()(x); x = layers.ReLU()(x); return x

def build_baseline(input_shape=(128,128,3), l2reg=0.0, dropout=0.0):
    inp = keras.Input(shape=input_shape)
    x = ConvBlock(inp, 16, l2=l2reg)
    x = ConvBlock(x, 32, s=2, l2=l2reg)
    x = ConvBlock(x, 64, s=2, l2=l2reg)
    x = ConvBlock(x, 128, s=2, l2=l2reg)
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dropout(dropout)(x)
    logit = layers.Dense(1, kernel_regularizer=regularizers.l2(l2reg))(x)
    return keras.Model(inp, logit, name="CNNBaseline")

def build_separable(input_shape=(128,128,3), l2reg=0.0, dropout=0.0):
    inp = keras.Input(shape=input_shape)
    x = SepConvBlock(inp, 16, l2=l2reg)
    x = SepConvBlock(x, 32, s=2, l2=l2reg)
    x = SepConvBlock(x, 64, s=2, l2=l2reg)
    x = SepConvBlock(x, 128, s=2, l2=l2reg)
    x = SepConvBlock(x, 128, s=1, l2=l2reg)
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dropout(dropout)(x)
    logit = layers.Dense(1, kernel_regularizer=regularizers.l2(l2reg))(x)
    return keras.Model(inp, logit, name="CNNSeparable")

This cell contains helper functions for training and evaluating the models, as well as an implementation for temperature scaling to improve the calibration of model probabilities.

- `bce_logits`: Defines the loss function (Binary Crossentropy) for model compilation.
- `compile_model`: Compiles a Keras model with an Adam optimizer and the defined loss function.
- `predict_logits`: Performs predictions with the model and returns the logits (unscaled outputs before the sigmoid function) and the true labels.
- `evaluate_numpy`: Calculates various metrics such as accuracy, F1-score, ROC-AUC, and PR-AUC based on the logits and true labels. It also returns the logits, probabilities, and true labels.
- `expected_calibration_error`: Calculates the Expected Calibration Error (ECE), a metric to assess the calibration of prediction probabilities.
- `TemperatureScalerTF`: A small Keras model that only has a trainable scaling factor (temperature T) for the logits.
- `fit_temperature_tf`: Adjusts the temperature (`T`) by using a separate dataset (validation data) to improve the model's calibration. This is important so that the predicted probabilities better match the actual probability.

In [None]:
# === 2.5) Train/Eval utils + Temperature Scaling (binary & multiclass safe) ===
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score, average_precision_score, confusion_matrix, precision_score, recall_score

bce_logits = keras.losses.BinaryCrossentropy(from_logits=True)

def compile_model(model, lr=1e-3):
    opt = keras.optimizers.Adam(learning_rate=lr)
    model.compile(optimizer=opt, loss=bce_logits)
    return model

def predict_logits(model, ds):
    y_true, logits = [], []
    for x, y in ds:
        y_true.append(y.numpy())
        logit_batch = model(x, training=False)
        if logit_batch.shape[-1] == 1:
            logit_batch = tf.squeeze(logit_batch, axis=-1)
        logits.append(logit_batch.numpy())
    logits = np.concatenate(logits)
    y_true = np.concatenate(y_true)
    return logits, y_true

def evaluate_numpy(logits, y_true, threshold=0.5):
    if logits.ndim == 1:  # binär
        probs = 1/(1+np.exp(-logits))
        preds = (probs >= threshold).astype(int)
        yb = (y_true > 0.5).astype(int)
        out = {
            "acc": float(accuracy_score(yb, preds)),
            "f1": float(f1_score(yb, preds)),
            "roc_auc": float(roc_auc_score(yb, probs)) if len(np.unique(yb))>1 else float("nan"),
            "pr_auc": float(average_precision_score(yb, probs)) if len(np.unique(yb))>1 else float("nan"),
            "cm": confusion_matrix(yb, preds).tolist(),
            "logits": logits, "probs": probs, "y_true": yb
        }
        return out
    else:
        y_true_int = y_true.astype(int)
        probs = tf.nn.softmax(logits, axis=-1).numpy()
        preds = probs.argmax(axis=-1)
        out = {
            "acc": float(accuracy_score(y_true_int, preds)),
            "f1": float(f1_score(y_true_int, preds, average="macro")),
            "cm": confusion_matrix(y_true_int, preds).tolist(),
            "logits": logits, "probs": probs, "y_true": y_true_int
        }
        return out

def evaluate_from_probs(probs, y_true, threshold=0.5):
    """
    Evaluate binary classification metrics starting from probabilities (already sigmoid-ed),
    not logits. This is useful for calibrated probabilities.
    """
    probs = np.asarray(probs).ravel()
    y = (np.asarray(y_true) > 0.5).astype(int)
    preds = (probs >= threshold).astype(int)

    out = {
        "acc": float(accuracy_score(y, preds)),
        "f1": float(f1_score(y, preds)),
        "roc_auc": float(roc_auc_score(y, probs)) if len(np.unique(y)) > 1 else float("nan"),
        "pr_auc": float(average_precision_score(y, probs)) if len(np.unique(y)) > 1 else float("nan"),
        "cm": confusion_matrix(y, preds).tolist(),
        "probs": probs,
        "y_true": y,
    }
    return out


def aggregate_by_dataset(probs, y_true, ds_ids):
    """
    Aggregate probabilities and labels per dataset_id.

    For each dataset_id we take:
      - mean probability over all its image variants (original + augmentations)
      - label: we assume all variants have the same label, so we take the first.
    """
    probs = np.asarray(probs).ravel()
    y_true = np.asarray(y_true)
    ds_ids = np.asarray(ds_ids)

    assert len(probs) == len(y_true) == len(ds_ids), "Lengths of probs, y_true and ds_ids must match."

    uniq = np.unique(ds_ids)
    probs_ds = []
    y_ds = []
    for ds in uniq:
        m = (ds_ids == ds)
        if not np.any(m):
            continue
        probs_ds.append(probs[m].mean())
        y_ds.append(y_true[m][0])

    return np.asarray(probs_ds), np.asarray(y_ds)


def evaluate_dataset_level(probs, y_true, ds_ids, threshold=0.5):
    """
    Compute metrics after aggregating probabilities per dataset_id.
    """
    probs_ds, y_ds = aggregate_by_dataset(probs, y_true, ds_ids)
    return evaluate_from_probs(probs_ds, y_ds, threshold=threshold)


def find_best_threshold_f1(probs, y_true, thresholds=None):
    """Grid-search decision threshold between 0.01 and 0.99 to maximize F1 on a validation set.

    Returns a dict with the best threshold and the corresponding accuracy, precision,
    recall and F1-score. The true labels are assumed to be binary (0/1 or probabilities)."""
    probs = np.asarray(probs).ravel()
    y = (np.asarray(y_true) > 0.5).astype(int)

    if thresholds is None:
        thresholds = np.linspace(0.01, 0.99, 99)

    best_f1 = None
    best_info = {
        "threshold": 0.5,
        "precision": float("nan"),
        "recall": float("nan"),
        "f1": float("nan"),
        "acc": float("nan"),
    }

    for t in thresholds:
        preds = (probs >= t).astype(int)
        prec = precision_score(y, preds, zero_division=0)
        rec = recall_score(y, preds, zero_division=0)
        f1 = f1_score(y, preds, zero_division=0)
        acc = accuracy_score(y, preds)

        if (best_f1 is None) or (f1 > best_f1):
            best_f1 = f1
            best_info = {
                "threshold": float(t),
                "precision": float(prec),
                "recall": float(rec),
                "f1": float(f1),
                "acc": float(acc),
            }

    return best_info


def expected_calibration_error(probs, labels, n_bins=15
):
    bins = np.linspace(0,1,n_bins+1)
    idx = np.digitize(probs, bins) - 1
    ece = 0.0
    labels = (labels > 0.5).astype(int)
    for b in range(n_bins):
        m = (idx == b)
        if not np.any(m):
            continue
        conf = probs[m].mean()
        acc = ((probs[m] >= 0.5).astype(int) == labels[m]).mean()
        ece += (np.sum(m)/len(probs)) * abs(acc - conf)
    return float(ece)

class TemperatureScalerTF(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.logT = tf.Variable(0.0, dtype=tf.float32)

    def call(self, logits):
        return logits / tf.exp(self.logT)

def fit_temperature_tf(logits, labels, steps=200, lr=0.01):
    logits = tf.convert_to_tensor(logits, dtype=tf.float32)
    if logits.ndim == 1:
        logits = tf.expand_dims(logits, axis=-1)
    n_classes = logits.shape[-1]

    scaler = TemperatureScalerTF()
    opt = tf.keras.optimizers.Adam(learning_rate=lr)

    if n_classes == 1:
        labels = (np.asarray(labels) > 0.5).astype(int)
        lb = tf.convert_to_tensor(labels, dtype=tf.float32)
        loss_fn = lambda s, y: tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=s[...,0]))
    else:
        if labels.ndim == 2:
            y_int = labels.argmax(axis=-1)
        else:
            y_int = labels.astype(np.int32)
        lb = tf.convert_to_tensor(y_int)
        loss_fn = lambda s, y: tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=s))

    for _ in range(steps):
        with tf.GradientTape() as tape:
            s = scaler(logits)
            loss = loss_fn(s, lb)
        grads = tape.gradient(loss, scaler.trainable_variables)
        if not grads or any(g is None for g in grads):
            return 1.0
        opt.apply_gradients(zip(grads, scaler.trainable_variables))

    T = float(tf.exp(scaler.logT).numpy())
    return max(0.5, min(T, 10.0))


def ternary_from_probs(probs, t_low=0.30, t_high=0.70):
    """
    Mappt Binär-Probability auf 3 Klassen:
      0 = 'sicher keine fat tails'    (p <= t_low)
      1 = 'unsicher / Grauzone'       (t_low < p < t_high)
      2 = 'sicher fat tails'          (p >= t_high)
    """
    probs = np.asarray(probs).ravel()
    out = np.full(probs.shape, 1, dtype=int)
    out[probs <= t_low] = 0
    out[probs >= t_high] = 2
    return out

def evaluate_ternary(probs, y_true, t_low=0.30, t_high=0.70):
    y = (np.asarray(y_true) > 0.5).astype(int)
    p = np.asarray(probs).ravel()
    pred3 = ternary_from_probs(p, t_low, t_high)
    decided = pred3 != 1
    abstain_rate = float((~decided).mean())
    acc_decided = float(((pred3[decided] == (y[decided]*2)).mean()) if decided.any() else np.nan)
    cm = np.zeros((2,3), dtype=int)
    for yi, pi in zip(y, pred3):
        cm[yi, pi] += 1
    return {
        "t_low": float(t_low), "t_high": float(t_high),
        "abstain_rate": abstain_rate,
        "acc_decided": acc_decided,
        "cm_2x3": cm.tolist(),
        "pred3": pred3
    }

def _prec_pos(p, y, thr):
    m = p >= thr
    if not m.any():
        return np.nan, 0
    tp = ((y == 1) & m).sum()
    fp = ((y == 0) & m).sum()
    prec = tp / (tp + fp) if (tp + fp) > 0 else np.nan
    return float(prec), int(m.sum())

def _prec_neg(p, y, thr):
    m = p <= thr
    if not m.any():
        return np.nan, 0
    tn = ((y == 0) & m).sum()
    fn = ((y == 1) & m).sum()
    prec = tn / (tn + fn) if (tn + fn) > 0 else np.nan
    return float(prec), int(m.sum())

def find_gray_zone_thresholds(probs, y_true, target_precision=0.90, min_points_each_side=5, grid_quantiles=99):
    p = np.asarray(probs).ravel()
    y = (np.asarray(y_true) > 0.5).astype(int)

    qs = np.linspace(0.01, 0.99, grid_quantiles)
    pts = np.unique(np.quantile(p, qs))
    lefts  = pts[pts < 0.5]
    rights = pts[pts > 0.5]

    best = None
    for tl in lefts:
        prec0, n0 = _prec_neg(p, y, tl)
        if np.isnan(prec0) or n0 < min_points_each_side or prec0 < target_precision:
            continue
        for th in rights:
            if th <= tl:
                continue
            prec1, n1 = _prec_pos(p, y, th)
            if np.isnan(prec1) or n1 < min_points_each_side or prec1 < target_precision:
                continue
            pred3 = ternary_from_probs(p, tl, th)
            decided = pred3 != 1
            abstain = 1.0 - decided.mean()
            acc_decided = ((pred3[decided] == (y[decided]*2)).mean()) if decided.any() else np.nan
            width = th - tl
            key = (abstain, -np.nan_to_num(acc_decided, nan=-1.0), width, tl, th,
                   {"prec0": float(prec0), "n0": int(n0), "prec1": float(prec1), "n1": int(n1)})
            if (best is None) or (key < best):
                best = key

    if best is None:
        print("[warn] Keine (t_low,t_high) gefunden, die target_precision erfüllen. Fallback auf (0.30, 0.70).")
        return 0.30, 0.70, {"fallback": True}

    _, _, _, tl, th, extra = best
    info = {"fallback": False, "target_precision": float(target_precision), **extra}
    return float(tl), float(th), info


This cell performs 5-fold cross-validation (CV) and includes a small hyperparameter search, including optimizing the dropout rate.

- `uniq_ids`, `uniq_label`: Arrays of unique dataset IDs and their labels for the CV.
- `skf`: StratifiedKFold object for splitting the data into training and test folds while maintaining the label distribution.
- `MODEL_CHOICES`, `LR_CHOICES`, etc.: Define the search space for hyperparameters.
- `search_space`: Combines all hyperparameter options.
- `MAX_TRIALS_PER_FOLD`: Limits the number of hyperparameter combinations per fold.
- `build_model_keras`: A helper function to build the Keras model based on the name and hyperparameters.
- The loop iterates through each fold of the CV. Within each fold, an inner CV is performed for hyperparameter search.
- For each hyperparameter combination, a model is trained, evaluated on the validation data, and temperature scaling is performed.
- The best models (based on ROC-AUC on the validation data) are saved and the results are logged.
- `soft_vote`: A function for averaging the probabilities of multiple models (Soft Voting).
- `voting_summary`: Summarizes the results of soft voting for each fold.
- `report_csv`: Saves the detailed results of the CV and hyperparameter search in a CSV file.

In [None]:
# === 2.6) 5-fold CV + kleine Hyperparameter-Suche (inkl. Dropout) ===
from itertools import product
import time, math

DEFAULT_GRAY_T_LOW  = 0.30
DEFAULT_GRAY_T_HIGH = 0.70
TARGET_PRECISION    = 0.90

class EpochTimer(keras.callbacks.Callback):
    def __init__(self, total_epochs:int, label:str=""):
        super().__init__()
        self.total_epochs = int(total_epochs)
        self.label = label
        self.epoch_durations = []
        self._epoch_t0 = None
        self._t0 = None

    def on_train_begin(self, logs=None):
        self._t0 = time.time()

    def on_epoch_begin(self, epoch, logs=None):
        self._epoch_t0 = time.time()

    def on_epoch_end(self, epoch, logs=None):
        if self._epoch_t0 is None:
            return
        dur = time.time() - self._epoch_t0
        self.epoch_durations.append(dur)

    @property
    def avg_epoch_seconds(self):
        return (sum(self.epoch_durations) / len(self.epoch_durations)) if self.epoch_durations else float("nan")

DEBUG_MODE = True  # <--- für schnellen Test; später auf False setzen

# Originale Suchräume als Referenz (und für DEBUG_MODE=False)
MODEL_CHOICES_FULL    = ["baseline","separable"]
LR_CHOICES_FULL       = [1e-3, 3e-4]
WD_CHOICES_FULL       = [0.0, 1e-4]
EPOCH_CHOICES_FULL    = [15, 30]
BS_CHOICES_FULL       = [32]
DROPOUT_CHOICES_FULL  = [0.0, 0.3]

if DEBUG_MODE:
    # Radikal kleiner Suchraum für schnellen Test
    MODEL_CHOICES   = ["baseline"]
    LR_CHOICES      = [1e-3]
    WD_CHOICES      = [0.0]
    EPOCH_CHOICES   = [2]
    BS_CHOICES      = [16]
    DROPOUT_CHOICES = [0.0]
else:
    # 1:1 dein ursprünglicher Suchraum
    MODEL_CHOICES   = MODEL_CHOICES_FULL
    LR_CHOICES      = LR_CHOICES_FULL
    WD_CHOICES      = WD_CHOICES_FULL
    EPOCH_CHOICES   = EPOCH_CHOICES_FULL
    BS_CHOICES      = BS_CHOICES_FULL
    DROPOUT_CHOICES = DROPOUT_CHOICES_FULL

def build_model_keras(name, l2reg=0.0, dropout=0.0):
    if name == "baseline":
        return build_baseline(l2reg=l2reg, dropout=dropout)
    elif name == "separable":
        return build_separable(l2reg=l2reg, dropout=dropout)
    else:
        raise ValueError(name)

search_space = list(product(MODEL_CHOICES, LR_CHOICES, WD_CHOICES, EPOCH_CHOICES, BS_CHOICES, DROPOUT_CHOICES))

fold_tuning_results = []
saved_models = []

if DEBUG_MODE:
    skf = StratifiedKFold(n_splits=2, shuffle=True, random_state=SEED)  # schneller: nur 2 Folds
else:
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=SEED)  # dein ursprüngliches Setting

uniq_ids   = np.array([r["dataset_id"] for r in uniq_rows])
uniq_label = np.array([r["label"] for r in uniq_rows])

for fold_idx, (train_val_idx, test_idx) in enumerate(skf.split(uniq_ids, uniq_label), start=1):
    if DEBUG_MODE and fold_idx > 1:
        break

    tv_ids = uniq_ids[train_val_idx]; tv_lab = uniq_label[train_val_idx]
    test_ids = uniq_ids[test_idx]
    skf_tv = StratifiedKFold(n_splits=5, shuffle=True, random_state=SEED)
    inner_train_ids, val_ids = next(skf_tv.split(tv_ids, tv_lab))
    train_ids = tv_ids[inner_train_ids]; val_ids = tv_ids[val_ids]

    train_rows = rows_for_ids(train_ids)
    val_rows   = rows_for_ids(val_ids)
    test_rows  = rows_for_ids(test_ids)

    # Im Debug-Modus: nur Teilmenge der Daten für schnelleren Test (auf Listen-Basis)
    if DEBUG_MODE:
        rng = np.random.default_rng(SEED)

        def subsample(rows, n):
            if len(rows) <= n:
                return rows
            idx = rng.choice(len(rows), size=n, replace=False)
            return [rows[i] for i in idx]

        train_rows = subsample(train_rows, 200)
        val_rows   = subsample(val_rows, 100)
        test_rows  = subsample(test_rows, 100)


    trial_results = []
    total_trials = len(search_space)

    completed_trials = 0
    trial_wall_times = []

    print(f"\n========== Fold {fold_idx} / {skf.get_n_splits()} — {total_trials} Trials ==========")

    for (model_name, lr, wd, epochs, bs, dr) in search_space:
        completed_trials += 1
        trial_label = f"fold{fold_idx} | model={model_name}, lr={lr}, l2={wd}, ep={epochs}, bs={bs}, dr={dr}"
        print(f"\n[Trial {completed_trials}/{total_trials}] {trial_label}")

        # Kopien der Listen (damit wir sie nicht in-place verändern)
        train_rows_f = list(train_rows)
        val_rows_f   = list(val_rows)
        test_rows_f  = list(test_rows)

        # Dataset-IDs für dataset-level metrics (val/test)
        val_ids_all  = [r["dataset_id"] for r in val_rows_f]
        test_ids_all = [r["dataset_id"] for r in test_rows_f]

        # tf.data-Datasets mit deiner bestehenden Pipeline aus 2.3
        ds_train = make_tf_dataset(train_rows_f, batch_size=bs, shuffle=True)
        ds_val   = make_tf_dataset(val_rows_f,   batch_size=bs, shuffle=False)
        ds_test  = make_tf_dataset(test_rows_f,  batch_size=bs, shuffle=False)

        model = build_model_keras(model_name, l2reg=wd, dropout=dr)
        compile_model(model, lr=lr)

        et_cb = EpochTimer(total_epochs=epochs, label=trial_label)
        cb = [et_cb]

        _t0 = time.time()
        model.fit(ds_train, validation_data=ds_val, epochs=epochs, verbose=0, callbacks=cb)
        train_wall = time.time() - _t0
        trial_wall_times.append(train_wall)

        avg_ep = et_cb.avg_epoch_seconds
        done_ep = len(et_cb.epoch_durations)
        print(f"[{trial_label}] finished: {done_ep} ep, "
              f"avg {avg_ep:.1f}s/ep, wall {train_wall:.1f}s")

        remaining_trials = max(total_trials - completed_trials, 0)
        mean_trial = (sum(trial_wall_times) / len(trial_wall_times)) if trial_wall_times else float("nan")
        eta_fold = remaining_trials * mean_trial if math.isfinite(mean_trial) else float("nan")
        print(f"[Fold {fold_idx}] progress: {completed_trials}/{total_trials} trials done "
              f"(avg {mean_trial:.1f}s/trial) — ETA fold ~{eta_fold:.1f}s")

        # --- Validation logits / calibration / threshold tuning ---
        v_logits, v_true = predict_logits(model, ds_val)

        # Temperature scaling always on raw logits
        T = fit_temperature_tf(v_logits, v_true, steps=200, lr=0.05)

        if v_logits.ndim == 1:
            # Binary case: calibrate logits and work with calibrated probabilities
            v_logits_cal = v_logits / T
            v_probs_cal = 1.0 / (1.0 + np.exp(-v_logits_cal))

            # F1-optimal threshold on calibrated probabilities
            thr_info = find_best_threshold_f1(v_probs_cal, v_true)
            best_thr = float(thr_info["threshold"])

            # Metrics using calibrated logits (so that probs & decisions are consistent)
            val_m = evaluate_numpy(v_logits_cal, v_true, threshold=best_thr)
            val_ece = expected_calibration_error(v_probs_cal, v_true)

            # Gray-zone thresholds on calibrated probabilities
            tl, th, tlth_info = find_gray_zone_thresholds(
                v_probs_cal,
                v_true,
                target_precision=TARGET_PRECISION,
                min_points_each_side=5,
                grid_quantiles=99,
            )
            val_tern = evaluate_ternary(v_probs_cal, v_true, tl, th)

            # Store threshold & ternary metrics
            val_m.update({
                "threshold": best_thr,
                "thr_prec": float(thr_info["precision"]),
                "thr_rec": float(thr_info["recall"]),
                "thr_f1": float(thr_info["f1"]),
                "ece": val_ece,
                "abstain": val_tern["abstain_rate"],
                "acc_decided": val_tern["acc_decided"],
            })

            # Dataset-level metrics on calibrated probabilities
            val_ds_metrics = evaluate_dataset_level(
                probs=v_probs_cal,
                y_true=v_true,
                ds_ids=val_ids_all,
                threshold=best_thr,
            )
            for k, v in val_ds_metrics.items():
                val_m[f"ds_{k}"] = v

        else:
            # Multiclass fallback (not expected for this project, but kept for safety)
            v_logits_cal = v_logits / T
            v_probs_cal = tf.nn.softmax(v_logits_cal, axis=-1).numpy()
            best_thr = 0.5
            thr_info = None

            val_m = evaluate_numpy(v_logits_cal, v_true, threshold=best_thr)
            val_ece = expected_calibration_error(v_probs_cal.max(axis=1), v_true)
            val_m.update({"ece": val_ece})

            tl, th, tlth_info = DEFAULT_GRAY_T_LOW, DEFAULT_GRAY_T_HIGH, {"fallback": True, "multiclass": True}

        # --- Test split: use same T, thresholds etc. ---
        t_logits, t_true = predict_logits(model, ds_test)

        if t_logits.ndim == 1:
            t_logits_cal = t_logits / T
            t_probs_cal = 1.0 / (1.0 + np.exp(-t_logits_cal))

            test_m = evaluate_numpy(t_logits_cal, t_true, threshold=best_thr)
            test_ece = expected_calibration_error(t_probs_cal, t_true)

            test_tern = evaluate_ternary(t_probs_cal, t_true, tl, th)
            test_m.update({
                "ece": test_ece,
                "abstain": test_tern["abstain_rate"],
                "acc_decided": test_tern["acc_decided"],
                "threshold": float(best_thr),
            })

            # Dataset-level metrics on calibrated probabilities for test
            test_ds_metrics = evaluate_dataset_level(
                probs=t_probs_cal,
                y_true=t_true,
                ds_ids=test_ids_all,
                threshold=best_thr,
            )
            for k, v in test_ds_metrics.items():
                test_m[f"ds_{k}"] = v

        else:
            t_logits_cal = t_logits / T
            t_probs_cal = tf.nn.softmax(t_logits_cal, axis=-1).numpy()

            test_m = evaluate_numpy(t_logits_cal, t_true, threshold=best_thr)
            test_ece = expected_calibration_error(t_probs_cal.max(axis=1), t_true)
            test_m.update({"ece": test_ece})

        hp = {"model": model_name, "lr": lr, "l2": wd, "epochs": epochs, "bs": bs, "dropout": dr}
        path = MODELS_DIR / f"fold{fold_idx}_{model_name}_lr{lr}_l2{wd}_ep{epochs}_dr{dr}.keras"
        model.save(path, include_optimizer=False)

        trial_results.append({
            "fold": fold_idx,
            "hparams": hp,
            "val": val_m,
            "test": test_m,
            "path": str(path),
            "T": float(T),
            "t_low": float(tl),
            "t_high": float(th),
            "t_opt": float(best_thr),      # <--- NEU
            "tlth_info": tlth_info
        })

        saved_models.append((fold_idx, model_name, hp, str(path), float(T)))

    best = sorted(trial_results, key=lambda r: r["val"].get("roc_auc", -1), reverse=True)[:3]
    fold_tuning_results.append({"fold": fold_idx, "trials": trial_results, "topk": best})

def soft_vote(*probs_list):
    return np.mean(np.stack(probs_list, axis=0), axis=0)

report_csv = REPORTS_DIR / "cv_results_tuned_dropout_keras.csv"
with open(report_csv, "w", newline="", encoding="utf-8") as f:
    import csv, json
    header = [
      "fold","model","lr","l2","epochs","bs","dropout",
      "split",
      "acc","f1","roc_auc","pr_auc","ece","abstain","acc_decided","threshold",
      "ds_acc","ds_f1","ds_roc_auc","ds_pr_auc",
      "model_path","T","t_low","t_high","target_precision","fallback"
    ]
    w = csv.writer(f); w.writerow(header)
    for fr in fold_tuning_results:
      for r in fr["trials"]:
          hp = r["hparams"]; info = r.get("tlth_info", {})
          fallback = info.get("fallback", False)
          tprec = info.get("target_precision", TARGET_PRECISION)
          for split in ["val","test"]:
              m = r[split]

              ds_acc     = m.get("ds_acc", "")
              ds_f1      = m.get("ds_f1", "")
              ds_roc_auc = m.get("ds_roc_auc", "")
              ds_pr_auc  = m.get("ds_pr_auc", "")

              w.writerow([
                  fr["fold"], hp["model"], hp["lr"], hp["l2"], hp["epochs"], hp["bs"], hp["dropout"],
                  split,
                  m.get("acc",""), m.get("f1",""), m.get("roc_auc",""), m.get("pr_auc",""),
                  m.get("ece",""), m.get("abstain",""), m.get("acc_decided",""), m.get("threshold",""),
                  ds_acc, ds_f1, ds_roc_auc, ds_pr_auc,
                  r["path"], r["T"], r["t_low"], r["t_high"], tprec, int(fallback)
              ])
print("Saved CV report:", report_csv)




[Trial 1/1] fold1 | model=baseline, lr=0.001, l2=0.0, ep=2, bs=16, dr=0.0
[fold1 | model=baseline, lr=0.001, l2=0.0, ep=2, bs=16, dr=0.0] finished: 2 ep, avg 6.9s/ep, wall 13.8s
[Fold 1] progress: 1/1 trials done (avg 13.8s/trial) — ETA fold ~0.0s
[warn] Keine (t_low,t_high) gefunden, die target_precision erfüllen. Fallback auf (0.30, 0.70).
Saved CV report: /content/drive/MyDrive/Generated Data for Data science project/reports/cv_results_tuned_dropout_keras.csv


In [None]:
# === Best-Model Visuals: CV-Auswertung & Dashboard ===
import json, gc
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt

def load_cv_results(csv_path: str | Path) -> pd.DataFrame:
    df = pd.read_csv(csv_path)
    df = df.dropna(how="all")
    for col in [
        "acc","f1","roc_auc","pr_auc","ece","abstain","acc_decided","threshold",
        "ds_acc","ds_f1","ds_roc_auc","ds_pr_auc",
        "T","t_low","t_high","target_precision"
    ]:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors="coerce")
    return df

def best_row(df_val: pd.DataFrame, primary="roc_auc", secondary="f1", higher_is_better=None):
    p, s = primary.lower(), secondary.lower()
    hib = {"roc_auc": True, "f1": True, "acc": True, "pr_auc": True, "ece": False}
    if higher_is_better:
        hib.update({k.lower(): v for k, v in higher_is_better.items()})
    if df_val.empty:
        raise ValueError("No validation rows found in CSV (split=='val').")
    def sort_key(row):
        pk = row.get(p, np.nan); sk = row.get(s, np.nan)
        pk = pk if hib.get(p, True) else -pk
        sk = sk if hib.get(s, True) else -sk
        return (np.nan_to_num(pk, nan=-np.inf), np.nan_to_num(sk, nan=-np.inf))
    best_idx = max(df_val.index, key=lambda i: sort_key(df_val.loc[i]))
    return df_val.loc[best_idx]

def summarize_hparams(df_val: pd.DataFrame,
                      primary: str = "roc_auc",
                      secondary: str = "f1") -> pd.DataFrame:
    """
    Aggregate validation metrics across folds for each hyperparameter configuration.
    """
    cfg_cols = ["model","lr","l2","epochs","bs","dropout"]
    g = df_val.groupby(cfg_cols)

    rows = []
    for cfg, sub in g:
        row = dict(zip(cfg_cols, cfg))
        row["n_folds"] = sub["fold"].nunique()
        for col in [primary, secondary]:
            row[f"mean_{col}"] = sub[col].mean()
            row[f"std_{col}"]  = sub[col].std(ddof=0)
        rows.append(row)

    return pd.DataFrame(rows)

def select_best_config(hp_df: pd.DataFrame,
                       primary: str = "roc_auc",
                       secondary: str = "f1") -> pd.Series:
    """
    Select the hyperparameter configuration with best mean primary metric,
    breaking ties by mean secondary metric.
    """
    p = f"mean_{primary}"
    s = f"mean_{secondary}"

    if hp_df.empty:
        raise ValueError("Hyperparameter summary is empty.")

    best_idx = max(
        hp_df.index,
        key=lambda i: (hp_df.loc[i, p], hp_df.loc[i, s])
    )
    return hp_df.loc[best_idx]

def _ensure_np(a): return a if isinstance(a, np.ndarray) else np.asarray(a)

def plot_confusion_matrix(y_true, y_prob, threshold=0.5, save_path: str | Path = "cm.png"):
    y_true = _ensure_np(y_true).astype(int)
    y_pred = (_ensure_np(y_prob).ravel() >= threshold).astype(int)
    tp = int(((y_true==1)&(y_pred==1)).sum())
    tn = int(((y_true==0)&(y_pred==0)).sum())
    fp = int(((y_true==0)&(y_pred==1)).sum())
    fn = int(((y_true==1)&(y_pred==0)).sum())
    M = np.array([[tn, fp],[fn, tp]])

    fig, ax = plt.subplots(figsize=(4.5,4))
    im = ax.imshow(M, aspect="equal")
    for (i,j), v in np.ndenumerate(M):
        ax.text(j, i, str(v), ha="center", va="center")
    ax.set_xticks([0,1]); ax.set_yticks([0,1])
    ax.set_xticklabels(["Pred 0","Pred 1"]); ax.set_yticklabels(["True 0","True 1"])
    ax.set_title("Confusion Matrix (best model, split=val)")
    fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
    fig.tight_layout(); fig.savefig(save_path, dpi=160, bbox_inches="tight")
    plt.close(fig); gc.collect()
    return str(save_path)

def plot_confusion_matrix_ternary(y_true, y_prob, t_low=0.3, t_high=0.7, save_path: str | Path = "cm_ternary.png"):
    y = _ensure_np(y_true).astype(int).ravel()
    p = _ensure_np(y_prob).ravel()
    pred3 = np.full_like(p, 1, dtype=int)
    pred3[p <= t_low] = 0
    pred3[p >= t_high] = 2
    M = np.zeros((2,3), dtype=int)
    for yi, pi in zip(y, pred3):
        M[yi, pi] += 1

    fig, ax = plt.subplots(figsize=(6.5,4.5))
    im = ax.imshow(M, aspect="equal")
    for (i,j), v in np.ndenumerate(M):
        ax.text(j, i, str(v), ha="center", va="center")
    ax.set_xticks([0,1,2]); ax.set_yticks([0,1])
    ax.set_xticklabels(["Pred: no fat","Pred: uncertain","Pred: fat"], rotation=15, ha="right")
    ax.set_yticklabels(["True 0 (no fat)","True 1 (fat)"])
    ax.set_title("Ternary Confusion (best model, split=val)")
    fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
    fig.tight_layout(); fig.savefig(save_path, dpi=160, bbox_inches="tight")
    plt.close(fig); gc.collect()
    return str(save_path)

def plot_roc(y_true, y_prob, save_path: str | Path = "roc.png"):
    from sklearn.metrics import roc_curve, auc
    y_true = _ensure_np(y_true).astype(int)
    y_prob = _ensure_np(y_prob).ravel()
    fpr, tpr, _ = roc_curve(y_true, y_prob)
    A = auc(fpr, tpr)
    fig, ax = plt.subplots(figsize=(4.5,4))
    ax.plot(fpr, tpr, label=f"AUC={A:.3f}")
    ax.plot([0,1],[0,1],'--',lw=1)
    ax.set_xlabel("FPR"); ax.set_ylabel("TPR"); ax.legend()
    ax.set_title("ROC (best model, split=val)")
    fig.tight_layout(); fig.savefig(save_path, dpi=160, bbox_inches="tight")
    plt.close(fig); gc.collect()
    return str(save_path)

def plot_reliability(y_true, y_prob, n_bins=10, save_path: str | Path = "reliability.png"):
    y_true = _ensure_np(y_true).astype(int)
    y_prob = _ensure_np(y_prob).ravel()
    bins = np.linspace(0,1,n_bins+1)
    idx = np.digitize(y_prob, bins)-1
    xs, ys = [], []
    for b in range(n_bins):
        m = (idx==b)
        if not np.any(m):
            continue
        xs.append(y_prob[m].mean())
        ys.append(y_true[m].mean())
    fig, ax = plt.subplots(figsize=(4.5,4))
    ax.plot([0,1],[0,1],'--', lw=1)
    ax.plot(xs, ys, marker='o')
    ax.set_xlabel("confidence"); ax.set_ylabel("empirical accuracy")
    ax.set_title("Reliability (best model, split=val)")
    fig.tight_layout(); fig.savefig(save_path, dpi=160, bbox_inches="tight")
    plt.close(fig); gc.collect()
    return str(save_path)

def plot_cv_bars(df_val: pd.DataFrame, metric: str, save_path: str | Path):
    metric = metric.lower()
    g = df_val.groupby("fold", as_index=False)[metric].max().sort_values("fold")
    fig, ax = plt.subplots(figsize=(7,4))
    ax.bar(g["fold"].astype(str), g[metric].values)
    ax.set_xlabel("Fold"); ax.set_ylabel(metric); ax.set_title(f"Val {metric} (best per Fold)")
    fig.tight_layout(); fig.savefig(save_path, dpi=160, bbox_inches="tight")
    plt.close(fig); gc.collect()
    return str(save_path)



In [None]:
import shutil
from pathlib import Path

BEST_MODEL_PATH = MODELS_DIR / "best_model_keras.h5"
BEST_MODEL_META = REPORTS_DIR / "best_model_meta.json"

def show_best_model_dashboard(csv_path: str | Path,
                              out_dir: str | Path,
                              primary: str = "roc_auc",
                              secondary: str = "f1",
                              with_curves: bool = True,
                              ternary_thresholds=None):
    """
    Read CV results, aggregate metrics across folds per hyperparameter configuration,
    select the best configuration, train one final model on all data (with internal
    validation split), and save this final model + metadata to:

      - BEST_MODEL_PATH  (e.g. models_tf/best_model_keras.h5)
      - BEST_MODEL_META  (e.g. reports/best_model_meta.json)
    """
    out_dir = Path(out_dir)
    out_dir.mkdir(parents=True, exist_ok=True)
    print(f"[DEBUG] Starting show_best_model_dashboard with csv_path={csv_path}, out_dir={out_dir}")

    # 1) Load CSV & aggregate hyperparameters across folds
    print(f"[DEBUG] Loading CV results from: {csv_path}")
    df = load_cv_results(csv_path)
    df_val = df[df["split"] == "val"].copy()
    if df_val.empty:
        raise ValueError("Keine Zeilen mit split=='val' in CV-CSV gefunden.")

    print("[DEBUG] Loaded DataFrame head:")
    print(df.head())

    hp_summary = summarize_hparams(df_val, primary=primary, secondary=secondary)
    if hp_summary.empty:
        raise ValueError("Keine Hyperparameter-Konfigurationen in den Validation-Ergebnissen gefunden.")

    best_cfg = select_best_config(hp_summary, primary=primary, secondary=secondary)

    if hasattr(best_cfg, "to_dict"):
        best_cfg_dict = best_cfg.to_dict()
    else:
        best_cfg_dict = dict(best_cfg)

    print("[DEBUG] Best aggregated hyperparameters across folds:")
    print(best_cfg_dict)

    # Bar-Plot pro Fold (wie bisher, sample-level)
    bars_png = plot_cv_bars(df_val, primary, out_dir / f"cv_{primary}.png")

    # 2) Final training on all data (with internal validation split)
    from sklearn.model_selection import StratifiedKFold

    # uniq_ids / uniq_label wurden in 2.6 global definiert
    skf_final = StratifiedKFold(n_splits=5, shuffle=True, random_state=SEED)
    final_train_idx, final_val_idx = next(skf_final.split(uniq_ids, uniq_label))
    final_train_ids = uniq_ids[final_train_idx]
    final_val_ids   = uniq_ids[final_val_idx]

    final_train_rows = rows_for_ids(final_train_ids)
    final_val_rows   = rows_for_ids(final_val_ids)

    # Hyperparameter aus der besten Konfiguration holen
    model_name = best_cfg_dict["model"]
    lr         = best_cfg_dict["lr"]
    wd         = best_cfg_dict["l2"]
    epochs     = int(best_cfg_dict["epochs"])
    bs         = int(best_cfg_dict["bs"])
    dropout    = best_cfg_dict["dropout"]

    if DEBUG_MODE:
        final_train_rows = final_train_rows[:500]
        final_val_rows   = final_val_rows[:200]
        # jetzt ist epochs bereits definiert -> kein UnboundLocalError mehr
        epochs = min(epochs, 2)

    final_val_ids_all = [r["dataset_id"] for r in final_val_rows]

    ds_train_final = make_tf_dataset(final_train_rows, batch_size=bs, shuffle=True)
    ds_val_final   = make_tf_dataset(final_val_rows,   batch_size=bs, shuffle=False)

    model = build_model_keras(model_name, l2reg=wd, dropout=dropout)
    compile_model(model, lr=lr)

    print("[DEBUG] Fitting final model with best hyperparameters on all data (with internal validation split)...")
    model.fit(ds_train_final, validation_data=ds_val_final, epochs=epochs, verbose=0)

    # 3) Calibration, threshold & gray zone for the final model
    v_logits_final, v_true_final = predict_logits(model, ds_val_final)
    T_final = fit_temperature_tf(v_logits_final, v_true_final, steps=200, lr=0.05)

    if v_logits_final.ndim == 1:
        v_logits_cal_final = v_logits_final / T_final
        v_probs_cal_final = 1.0 / (1.0 + np.exp(-v_logits_cal_final))

        thr_info_final = find_best_threshold_f1(v_probs_cal_final, v_true_final)
        best_thr_final = float(thr_info_final["threshold"])

        val_m_final = evaluate_numpy(v_logits_cal_final, v_true_final, threshold=best_thr_final)
        val_ece_final = expected_calibration_error(v_probs_cal_final, v_true_final)

        t_low_final, t_high_final, tlth_info_final = find_gray_zone_thresholds(
            v_probs_cal_final,
            v_true_final,
            target_precision=TARGET_PRECISION,
            min_points_each_side=5,
            grid_quantiles=99,
        )
        val_tern_final = evaluate_ternary(v_probs_cal_final, v_true_final, t_low_final, t_high_final)

        val_m_final.update({
            "threshold": best_thr_final,
            "thr_prec": float(thr_info_final["precision"]),
            "thr_rec": float(thr_info_final["recall"]),
            "thr_f1": float(thr_info_final["f1"]),
            "ece": val_ece_final,
            "abstain": val_tern_final["abstain_rate"],
            "acc_decided": val_tern_final["acc_decided"],
        })

        # Dataset-level metrics for final model
        val_ds_metrics_final = evaluate_dataset_level(
            probs=v_probs_cal_final,
            y_true=v_true_final,
            ds_ids=final_val_ids_all,
            threshold=best_thr_final,
        )
        for k, v in val_ds_metrics_final.items():
            val_m_final[f"ds_{k}"] = v

    else:
        # Multiclass fallback (not expected here)
        v_logits_cal_final = v_logits_final / T_final
        v_probs_cal_final = tf.nn.softmax(v_logits_cal_final, axis=-1).numpy()
        best_thr_final = 0.5

        val_m_final = evaluate_numpy(v_logits_cal_final, v_true_final, threshold=best_thr_final)
        val_ece_final = expected_calibration_error(v_probs_cal_final.max(axis=1), v_true_final)
        val_m_final.update({"ece": val_ece_final})

        t_low_final, t_high_final, tlth_info_final = DEFAULT_GRAY_T_LOW, DEFAULT_GRAY_T_HIGH, {"fallback": True, "multiclass": True}

    # 4) Save final best model
    BEST_MODEL_PATH.parent.mkdir(parents=True, exist_ok=True)
    model.save(BEST_MODEL_PATH, include_optimizer=False)
    print(f"[DEBUG] Final best model saved to: {BEST_MODEL_PATH}")

    # 5) Write meta JSON (used by Evaluation notebook)
    best_meta = {
        "fold": -1,  # no single fold; this is the final retrained model
        "model": model_name,
        "original_model_path": None,  # we no longer copy a single fold model
        "model_path": str(BEST_MODEL_PATH),
        "T": float(T_final),
        "threshold": float(best_thr_final),
        "t_opt": float(best_thr_final),
        "t_low": float(t_low_final),
        "t_high": float(t_high_final),
        "target_precision": float(TARGET_PRECISION),
        # final validation metrics (sample-level)
        "acc": float(val_m_final.get("acc", np.nan)),
        "f1": float(val_m_final.get("f1", np.nan)),
        "roc_auc": float(val_m_final.get("roc_auc", np.nan)),
        "pr_auc": float(val_m_final.get("pr_auc", np.nan)),
        # final validation metrics (dataset-level)
        "ds_acc": float(val_m_final.get("ds_acc", np.nan)),
        "ds_f1": float(val_m_final.get("ds_f1", np.nan)),
        "ds_roc_auc": float(val_m_final.get("ds_roc_auc", np.nan)),
        "ds_pr_auc": float(val_m_final.get("ds_pr_auc", np.nan)),
        # aggregated CV metrics over folds for this configuration
        "cv_mean_roc_auc": float(best_cfg_dict.get("mean_roc_auc", np.nan)),
        "cv_std_roc_auc": float(best_cfg_dict.get("std_roc_auc", np.nan)),
        "cv_mean_f1": float(best_cfg_dict.get("mean_f1", np.nan)),
        "cv_std_f1": float(best_cfg_dict.get("std_f1", np.nan)),
    }

    BEST_MODEL_META.parent.mkdir(parents=True, exist_ok=True)
    with open(BEST_MODEL_META, "w", encoding="utf-8") as f:
        json.dump(best_meta, f, indent=2)
    print(f"[DEBUG] Best model meta written to: {BEST_MODEL_META}")
    print("[DEBUG] best_meta content:")
    print(best_meta)

    thresholds_used = (float(t_low_final), float(t_high_final))

    results = {
        "bars": bars_png,
        "best_row": best_cfg_dict,
        "thresholds": thresholds_used,
        "best_model_path": str(BEST_MODEL_PATH),
        "best_model_meta": str(BEST_MODEL_META),
    }

    # Optional plots if _Y_TRUE_ / _Y_PROB_ are defined externally
    if with_curves and ("_Y_TRUE_" in globals()) and ("_Y_PROB_" in globals()):
        roc_png = plot_roc(_Y_TRUE_, _Y_PROB_, out_dir / "roc.png")
        cm_png  = plot_confusion_matrix(_Y_TRUE_, _Y_PROB_, 0.5, out_dir / "cm.png")
        rel_png = plot_reliability(_Y_TRUE_, _Y_PROB_, 10, out_dir / "reliability.png")
        tcm_png = plot_confusion_matrix_ternary(_Y_TRUE_, _Y_PROB_, t_low_final, t_high_final, out_dir / "cm_ternary.png")
        results.update({
            "roc": roc_png,
            "cm": cm_png,
            "reliability": rel_png,
            "cm_ternary": tcm_png,
        })

    print(f"Bestes Modell kopiert nach: {BEST_MODEL_PATH}")
    print(f"Metadaten gespeichert nach: {BEST_MODEL_META}")

    return results



In [None]:
# === 2.7) Bestes Modell auswählen & Meta-JSON schreiben ===

# Pfad zur in 2.6 geschriebenen CV-CSV
csv_path = report_csv  # = REPORTS_DIR / "cv_results_tuned_dropout_keras.csv"

# Ordner für die Plots/Dashboard-Ausgabe
dashboard_dir = PLOTS_DIR / "best_model_dashboard"

show_best_model_dashboard(
    csv_path=csv_path,
    out_dir=dashboard_dir,
    primary="roc_auc",   # wie im Dozenten-Feedback: AUC als Hauptkriterium ok
    secondary="f1",      # F1 als sekundäres Kriterium
    with_curves=True,
)

print("Bestes Modell gespeichert unter:", BEST_MODEL_PATH)
print("Meta-Informationen gespeichert in:", BEST_MODEL_META)


[DEBUG] Starting show_best_model_dashboard with csv_path=/content/drive/MyDrive/Generated Data for Data science project/reports/cv_results_tuned_dropout_keras.csv, out_dir=/content/drive/MyDrive/Generated Data for Data science project/plots/best_model_dashboard
[DEBUG] Loading CV results from: /content/drive/MyDrive/Generated Data for Data science project/reports/cv_results_tuned_dropout_keras.csv
[DEBUG] Loaded DataFrame head:
   fold     model     lr   l2  epochs  bs  dropout split   acc        f1  ...  \
0     1  baseline  0.001  0.0       2  16      0.0   val  0.68  0.698113  ...   
1     1  baseline  0.001  0.0       2  16      0.0  test  0.66  0.701754  ...   

     ds_acc     ds_f1  ds_roc_auc  ds_pr_auc  \
0  0.716981  0.727273    0.772989   0.776014   
1  0.655556  0.693069    0.734196   0.626571   

                                          model_path    T  t_low  t_high  \
0  /content/drive/MyDrive/Generated Data for Data...  1.0    0.3     0.7   
1  /content/drive/MyDrive/G



[warn] Keine (t_low,t_high) gefunden, die target_precision erfüllen. Fallback auf (0.30, 0.70).
[DEBUG] Final best model saved to: /content/drive/MyDrive/Generated Data for Data science project/models_tf/best_model_keras.h5
[DEBUG] Best model meta written to: /content/drive/MyDrive/Generated Data for Data science project/reports/best_model_meta.json
[DEBUG] best_meta content:
{'fold': -1, 'model': 'baseline', 'original_model_path': None, 'model_path': '/content/drive/MyDrive/Generated Data for Data science project/models_tf/best_model_keras.h5', 'T': 1.0, 'threshold': 0.01, 't_opt': 0.01, 't_low': 0.3, 't_high': 0.7, 'target_precision': 0.9, 'acc': 0.28, 'f1': 0.4375, 'roc_auc': 0.5892857142857143, 'pr_auc': 0.46616560751094704, 'ds_acc': 0.28, 'ds_f1': 0.4375, 'ds_roc_auc': 0.6507936507936508, 'ds_pr_auc': 0.5700320208584327, 'cv_mean_roc_auc': 0.7623444399839421, 'cv_std_roc_auc': 0.0, 'cv_mean_f1': 0.6981132075471698, 'cv_std_f1': 0.0}
Bestes Modell kopiert nach: /content/drive/MyDr