# Blatt 8

## NN.Backprop.01

Notation

- Aktivierung: $a^{(l)} = \sigma(z^{(l)})$, $z^{(l)} = W^{(l)} a^{(l-1)} + b^{(l)}$
- Fehlerfunktion (Quadratfehler): $E = \frac{1}{2} \lVert a^{(3)} - y \rVert^2$

Rückwärts:

- Ausgabe: $\delta^{(3)} = (a^{(3)} - y) \odot \sigma'(z^{(3)})$
- Versteckte Schicht 2: $\delta^{(2)} = (W^{(3)T} \delta^{(3)}) \odot \sigma'(z^{(2)})$
- Versteckte Schicht 1: $\delta^{(1)} = (W^{(2)T} \delta^{(2)}) \odot \sigma'(z^{(1)})$

Gewicht in der ersten versteckten Schicht (von Neuron $i$ in Schicht 0 nach $j$ in Schicht 1):

- $\frac{\partial E}{\partial w_{ij}^{(1)}} = a_i^{(0)} \cdot \delta_j^{(1)}$
- Update: $w_{ij}^{(1)} \leftarrow w_{ij}^{(1)} - \alpha \cdot a_i^{(0)} \cdot \delta_j^{(1)}$


## NN.Backprop.02

Gewichte aus der Abbildung:

- $x \to h$: $w_{xh} = -1$
- bias $\to h$: $w_{bh} = 1$
- $h \to y$: $w_{hy} = 1$
- $x \to y$: $w_{xy} = 2$
- bias $\to y$: $w_{by} = -2$

Forward für $(x, y_T) = (0, 0.5)$:

- $z_h = w_{bh} \cdot 1 + w_{xh} \cdot x = 1$
- $h = \sigma(z_h) = 0.7310585786$
- $z_y = w_{by} \cdot 1 + w_{hy} \cdot h + w_{xy} \cdot x = -1.2689414214$
- $y = \sigma(z_y) = 0.2194385171$

Fehler:

- $E = 0.5 \cdot (y_T - y)^2 = 0.0393573728$

Backprop:

- $\sigma'(z_y) = y(1 - y) = 0.1712852543$
- $\delta_y = (y - y_T)\sigma'(z_y) = -0.04805604495$
- $\sigma'(z_h) = h(1 - h) = 0.1966119332$
- $\delta_h = (w_{hy} \cdot \delta_y)\sigma'(z_h) = -0.00944839190$

Partielle Ableitungen:

- $\frac{\partial E}{\partial w_{hy}} = \delta_y \cdot h = -0.03513178391$
- $\frac{\partial E}{\partial w_{xy}} = \delta_y \cdot x = 0$
- $\frac{\partial E}{\partial w_{by}} = \delta_y = -0.04805604495$
- $\frac{\partial E}{\partial w_{xh}} = \delta_h \cdot x = 0$
- $\frac{\partial E}{\partial w_{bh}} = \delta_h = -0.00944839190$

Gewichtsupdates ($\alpha = 0.01$):

- $w_{hy} \leftarrow 1 - 0.01 \cdot \frac{\partial E}{\partial w_{hy}} = 1.00035131784$
- $w_{xy} \leftarrow 2 - 0.01 \cdot \frac{\partial E}{\partial w_{xy}} = 2.0$
- $w_{by} \leftarrow -2 - 0.01 \cdot \frac{\partial E}{\partial w_{by}} = -1.99951943955$
- $w_{xh} \leftarrow -1 - 0.01 \cdot \frac{\partial E}{\partial w_{xh}} = -1.0$
- $w_{bh} \leftarrow 1 - 0.01 \cdot \frac{\partial E}{\partial w_{bh}} = 1.00009448392$


## NN.Backprop.03


In [None]:
import csv
import math
import os
import random

import numpy as np

In [17]:
def load_iris(path):
    features = []
    labels = []
    with open(path, newline="") as f:
        reader = csv.reader(f)
        for row in reader:
            if not row:
                continue
            features.append([float(v) for v in row[:4]])
            labels.append(row[4])

    classes = sorted(set(labels))
    class_to_idx = {name: i for i, name in enumerate(classes)}
    y = np.zeros((len(labels), len(classes)), dtype=np.float64)
    for i, label in enumerate(labels):
        y[i, class_to_idx[label]] = 1.0

    x = np.array(features, dtype=np.float64)
    mean = x.mean(axis=0)
    std = x.std(axis=0)
    std[std == 0] = 1.0
    x = (x - mean) / std

    return x, y, classes


def sigmoid(z):
    z = np.clip(z, -60.0, 60.0)
    return 1.0 / (1.0 + np.exp(-z))


def relu(z):
    return np.maximum(0.0, z)


def relu_grad(z):
    return (z > 0.0).astype(np.float64)


def bce_loss(y_true, y_pred, eps=1e-9):
    y_pred = np.clip(y_pred, eps, 1.0 - eps)
    return -np.sum(y_true * np.log(y_pred) + (1.0 - y_true) * np.log(1.0 - y_pred))




In [8]:
class MLP:
    def __init__(self, layer_sizes, seed=7):
        if len(layer_sizes) < 2:
            raise ValueError("Need at least input and output layer sizes")
        rng = np.random.default_rng(seed)
        self.weights = []
        for in_size, out_size in zip(layer_sizes[:-1], layer_sizes[1:]):
            limit = math.sqrt(6.0 / (in_size + out_size))
            w = rng.uniform(-limit, limit, size=(in_size + 1, out_size))
            self.weights.append(w)

    def forward(self, x):
        activations = [x]
        pre_acts = []
        for idx, w in enumerate(self.weights):
            a_prev = activations[-1]
            a_prev_bias = np.concatenate(([1.0], a_prev))
            z = a_prev_bias @ w
            pre_acts.append(z)
            if idx == len(self.weights) - 1:
                a = sigmoid(z)
            else:
                a = relu(z)
            activations.append(a)
        return activations, pre_acts

    def predict(self, x):
        activations, _ = self.forward(x)
        return activations[-1]

    def train_epoch(self, x, y, lr):
        indices = list(range(len(x)))
        random.shuffle(indices)
        for i in indices:
            activations, pre_acts = self.forward(x[i])
            delta = activations[-1] - y[i]
            for layer_idx in range(len(self.weights) - 1, -1, -1):
                a_prev = activations[layer_idx]
                a_prev_bias = np.concatenate(([1.0], a_prev))
                w_current = self.weights[layer_idx]
                grad_w = np.outer(a_prev_bias, delta)
                if layer_idx > 0:
                    w_no_bias = w_current[1:, :]
                    delta = w_no_bias @ delta
                    delta *= relu_grad(pre_acts[layer_idx - 1])
                self.weights[layer_idx] = w_current - lr * grad_w

    def loss(self, x, y):
        total = 0.0
        for i in range(len(x)):
            y_pred = self.predict(x[i])
            total += bce_loss(y[i], y_pred)
        return total / len(x)




In [4]:
def train_model(x, y, layer_sizes, lr=0.01, epochs=3000, seed=7):
    model = MLP(layer_sizes, seed=seed)
    losses = []
    for epoch in range(epochs):
        model.train_epoch(x, y, lr)
        losses.append(model.loss(x, y))
    return model, losses




In [18]:
data_path = os.path.join("iris.csv")
x, y, classes = load_iris(data_path)

layer_sizes = [x.shape[1], 16, 8, len(classes)]
model, losses = train_model(x, y, layer_sizes, lr=0.0005, epochs=6000, seed=7)

near_zero_epoch = None
for i, loss in enumerate(losses, start=1):
    if loss < 0.05:
        near_zero_epoch = i
        break
print("Classes:", classes)
print("Final loss:", f"{losses[-1]:.6f}")
if near_zero_epoch is not None:
    print("Epoch with loss < 0.05:", near_zero_epoch)
else:
    print("Loss did not fall below 0.05 within the given epochs")


Classes: ['setosa', 'versicolor', 'virginica']
Final loss: 0.041727
Epoch with loss < 0.05: 5335
