<b>Group Number:</b> 7
<br><b>Name Group Member 1:</b>   Paraa Afifi
<br><b>u-Kürzel Group Member 1:</b>   uppns
<br><b>Name Group Member 2:</b>   Dan-Jason Bräuninger
<br><b>u-Kürzel Group Member 2:</b>   uuuab
<br><b>Name Group Member 3:</b> Sami Shahzad
<br><b>u-Kürzel Group Member 3:</b>uvoei

# 1. Regression with Neural Networks

Neural networks are used in a wide variety of applications—ranging from regression and classification tasks to advanced sequence modeling. In this section, we focus on the fundamentals of **regression** using neural networks. We will illustrate these foundations with simple examples that cover:

1. The concept of a single artificial neuron.
2. Activation functions and how they add non-linearity.
3. A brief introduction to backpropagation.

We will build up from the simplest neuron (linear neuron) to networks with multiple neurons that enable us to approximate more complex functions.


## 1.1 The Artificial Neuron (Theory)

An **artificial neuron** is a mathematical function inspired by the way biological neurons process information. It can have one or more inputs ($x_n$) and produces a single output (y). The neuron applies a simple mathematical operation, as shown below:

$$
  f_{\text{neuron}}(x) = \phi\left(\sum_{n=1}^m x_n \, w_{n} \;+\; b\right)
\quad (1)
$$

- **Inputs**: $x_n$ are numeric values from data or other neurons.
- **Weights**: $w_n$ scale the importance of each input.
- **Summation**: The weighted inputs are summed up into an intermediate value $v$.
- **Bias** $b$: A constant added to the sum.
- **Activation Function** $\phi$: A possibly non-linear function applied to the sum.
- **Output** $y$: The final activation of the neuron.

Even though a single neuron only performs a simple transformation, large networks of these neurons can solve very complex tasks.


<center>
  <img src="images/neural_network.png" alt="A diagram illustrating a single neuron" width="600"/>
</center>
<p style="text-align: center;"><em>Figure 1: General artificial neuron.</em></p>


<div class="alert alert-block alert-info">
<b>Note:</b>

- An artificial neuron is an **abstract** concept. Different software or hardware approaches (even 
  [optical computing](https://www.osapublishing.org/optica/abstract.cfm?uri=optica-6-9-1132)) 
  can implement the same mathematical function in (1).  

- Sometimes the bias is treated as if it were another weight $w_0$ with a fixed input $x_0 = 1$.
</div>

## 1.2 A Simple Neuron (Practice)

Let us consider the simplest case: a single neuron with **one input** and **no activation function**. This neuron essentially computes:

$$
  f_{\text{neuron}}(x) = w \cdot x + b 
\quad (2)
$$

This is simply a linear (or affine) function. Graphically, it looks like:


<center>
  <img src="images/single_neuron_no_activation.png" alt="A diagram illustrating a single neuron" width="600"/>
</center>
<p style="text-align: center;"><em>Figure 2: Simple artificial neuron without activation.</em></p>

In practice, we will:
- Keep track of `weight` and `bias`.
- Compute the neuron’s output based on input $x$ as $w \cdot x + b$.

Below, we define a Python class `SimpleNeuron` that:
- Holds a single weight and a single bias.
- Has methods to set or get these parameters.
- Provides a `compute` method to evaluate $f_{\text{neuron}}(x)$.
- Links to an interactive plot that updates whenever we change the neuron’s parameters.

In [1]:
from typing import Union, Dict, List, Tuple
import numpy as np
from __future__ import annotations # Used to allow referencing classes that have not yet been defined in type annotations. This will become default behaviour in Python 3.10. Until then, we have to use this line to enable that behaviour

class SimpleNeuron:
    def __init__(self, plot: Interactive2DPlot):
        self.plot = plot #I am assigned the following plot
        self.plot.register_neuron(self) #hey plot, remember me
        
    def set_values(self, weight: float, bias: float):
        self.weight = weight
        self.bias = bias
        self.plot.update() #hey plot, I have changed, redraw my output
        
    def get_weight(self) -> float:
        return self.weight
    
    def get_bias(self) -> float:
        return self.bias

    def compute(self, x: Union[float, np.ndarray]) -> Union[float, np.ndarray]:
        self.activation = np.dot(self.weight, x) + self.bias
        return self.activation

### 1.3 The Problem of Regression

We use **regression** to approximate a function that fits a given set of data points as accurately as possible. A common measure of “fit” is the **mean squared error (MSE)**:

$$
  J = \frac{1}{N} \sum_{n=1}^{N} \bigl(\hat{f}(x_n) - y_n\bigr)^2,
\quad (3)
$$

where:
- $\hat{f}(x_n)$ is the model’s predicted output for an input $x_n$,
- $y_n$ is the actual target value,
- $N$ is the number of data points.

We aim to minimize $J$. A lower $J$ indicates that our model’s predictions are closer to the target values. Once we have found a good approximation, we can use the trained model to **predict** new points by providing new $x$-values.

<center>
  <img src="images/least_squares_explanation.png" alt="Visualization of least squares approach" width="600"/>
</center>
<p style="text-align: center;"><em>Figure 3: Illustration of distances between data points and a model’s curve.</em></p>

<a id='simple_neuron'></a>

<div class="alert alert-block alert-info">
<b>Note:</b>

- In equation (3), $x_1, x_2, \dots, x_N$ represent **samples** (not neuron inputs like in equation (1)).
- Our goal is to choose weights and biases to minimize $J$.
- The **mean squared error** is based on a distance metric, and can be replaced by other metrics which may fit the target problem better.

</div>

We will create a function "loss" that performs the operation (3). It will receive a neuron object and a set of points as arguments.
- For each point that we give it, it first separates x and y-values. 
- It hands the neuron an x-value and asks the neuron to compute a prediction for the y-value. (see $\hat{f}(x_n)$) 
- Then it subtracts the real y-value from the predicted y-value, as in operation (3), resulting in a distance
- It then squares up the distance and accumulates the squared distances.  
- In the last step, it divides the sum of squared distances by the amount of compared points.

In [2]:
def loss(neuron: SimpleNeuron, points: Dict[str, List[float]]) -> float:
    sum_squared_dist = 0

    for point_x, point_y in zip(points["x"], points["y"]):  # zip merges both points["x"] and points["y"]

        predicted_point_y = neuron.compute(point_x)
        dist = point_y - predicted_point_y
        squared_dist = dist ** 2
        sum_squared_dist += squared_dist

    loss = sum_squared_dist / len(points["y"])
    return loss


### 1.3.1 Preparing an Interactive Plot

Below, we define a helper class for interactive plotting:
- It can register a neuron and update its plot when the neuron’s parameters change.
- We will display **data points** alongside the **neuron’s predicted line**.
- We will also define a `loss` function to compute the mean squared error $J$.

<div class="alert alert-block alert-info">
<b>Note:</b> The plot classes are not part of the subject matter for this lab.  

</div>

In [3]:
import plotly.graph_objs as go
from ipywidgets import interact, Layout, HBox, FloatSlider
import time
import threading

In [4]:
# an Interactive Plot monitors the activation of a neuron or a neural network
class Interactive2DPlot:
    def __init__(self, points: Dict[str, List[float]], ranges: Dict[str, Tuple[float, float]], width: int = 800, height: int = 400, margin: Dict[str, int] = {'t': 0, 'l': 170}, draw_time: float = 0.05):
        self.idle = True
        self.points = points
        self.x = np.arange(ranges["x"][0], ranges["x"][1], 0.1)
        self.y = np.arange(ranges["y"][0], ranges["y"][1], 0.1)
        self.draw_time = draw_time
        self.layout = go.Layout(
            xaxis=dict(title="Input: x", range=ranges["x"], fixedrange=True),
            yaxis=dict(title="Output: y", range=ranges["y"], fixedrange=True),
            width=width,
            height=height,
            showlegend=False,
            autosize=False,
            margin=margin,
        )
        self.trace = go.Scatter(x=self.x, y=self.y)
        self.plot_points = go.Scatter(
            x=points["x"], y=points["y"], mode="markers")
        self.data = [self.trace, self.plot_points]
        self.plot = go.FigureWidget(self.data, self.layout)

    def register_neuron(self, neuron: SimpleNeuron) -> None:
        self.neuron = neuron

    def redraw(self) -> None:
        self.idle = False
        time.sleep(self.draw_time)
        self.plot.data[0].y = self.neuron.compute(self.x)
        self.idle = True

    def update(self) -> None:
        print("Loss: {:0.2f}".format(loss(self.neuron, self.points)))
        if self.idle:
            thread = threading.Thread(target=self.redraw)
            thread.start()

<div class="alert alert-block alert-success">
<b>Task:</b> Train the neuron
<ul>
<li> You are given a set of 3 points and one neuron to do a curve fit. Run the cell below.
<li> <b>Change the weight and bias of the neuron using the sliders to minimize the loss.</b>
    <li><b>Hint:</b> You can also change the sliders with the arrow keys on your keyboard after clicking on the slider.
</ul>
</div>

In [5]:
points_linreg = dict(x=[1, 2, 3], y=[1.5, 0.7, 1.2])
ranges_linreg = dict(x=(-4, 4), y=(-4, 4))

linreg_plot = Interactive2DPlot(points_linreg, ranges_linreg)
simple_neuron = SimpleNeuron(linreg_plot)

slider_layout = Layout(width="90%")

interact(
    simple_neuron.set_values, 
    weight=FloatSlider(min=-3, max=3, step=0.1, value = 0, layout=slider_layout),
    bias=FloatSlider(min=-3, max=3, step=0.1, value = 0, layout=slider_layout)
)

linreg_plot.plot

interactive(children=(FloatSlider(value=0.0, description='weight', layout=Layout(width='90%'), max=3.0, min=-3…

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '78fd7b5e-c06b-4193-87ee-0a3cec754e28',
              'x': {'bdata': ('AAAAAAAAEMAzMzMzMzMPwGZmZmZmZg' ... 'mZmQ1AdmZmZmZmDkBDMzMzMzMPQA=='),
                    'dtype': 'f8'},
              'y': {'bdata': ('AAAAAAAAEMAzMzMzMzMPwGZmZmZmZg' ... 'mZmQ1AdmZmZmZmDkBDMzMzMzMPQA=='),
                    'dtype': 'f8'}},
             {'mode': 'markers',
              'type': 'scatter',
              'uid': '95ca5dbd-fde4-43f1-a5a4-ba01c3e30cb8',
              'x': [1, 2, 3],
              'y': [1.5, 0.7, 1.2]}],
    'layout': {'autosize': False,
               'height': 400,
               'margin': {'l': 170, 't': 0},
               'showlegend': False,
               'template': '...',
               'width': 800,
               'xaxis': {'fixedrange': True, 'range': [-4, 4], 'title': {'text': 'Input: x'}},
               'yaxis': {'fixedrange': True, 'range': [-4, 4], 'title': {'text': 'Output: y'}}}
})

<div class="alert alert-block alert-success">
<b>Question (1pt):</b> What is the optimal weight and bias combination? 
</div>

<div class="alert alert-block alert-success">
<b>Your Answer:</b> 
    <ul>
        <li> weight ~= -0,2 or -0,1</li>
        <li> bias ~= 1,50 or 1,40</li>
        <li> loss = 0,10 = 0,10</li>
    </ul>
</div>

### 1.3.2 3D Visualization of Loss Surface

We can also visualize the incurred loss $J$ based on our parameters in 3D space:
- The **x-axis** will represent the weight.
- The **y-axis** will represent the bias.
- The **z-axis** (height) represents $\log_{10}$ of the loss (for easier visibility).
- A movable **black sphere** shows the loss for the current $(w, b)$.

This gives us a “surface” where the minimum is the bottom of a valley.  This can be though of as the **optimum** weight and bias combination, which minimizes the loss and ideally fits the data best.

Below, you can interact with both a 3D surface plot of $\log(\text{MSE})$ and a 2D plot of the neuron’s line vs. data:

In [6]:
def log_mse(neuron: SimpleNeuron, points: Dict[str, List[float]]) -> np.ndarray:
    least_squares_loss = loss(neuron, points)
    return np.log10(least_squares_loss)

In [7]:
class Interactive3DPlot:
    def __init__(self, points: Dict[str, List[float]], ranges: Dict[str, Tuple[float, float]], width: int = 600, height: int = 600, draw_time: float = 0.1):
        self.idle = True
        self.points = points
        self.draw_time = draw_time
        self.threading = threading

        self.range_weights = np.arange(  # Array with all possible weight values in the given range
            ranges["x"][0], ranges["x"][1], 0.1
        )
        self.range_biases = np.arange(  # Array with all possible bias values in the given range
            ranges["y"][0], ranges["y"][1], 0.1
        )
        self.range_biases_t = self.range_biases[:, np.newaxis]  # Bias array transposed
        self.range_losses = []  # initialize z axis for 3D surface

        self.ball = go.Scatter3d(  # initialize ball
            x=[], y=[], z=[], hoverinfo="none", mode="markers", marker=dict(size=12, color="black")
        )

        self.layout = go.Layout(
            width=width,
            height=height,
            showlegend=False,
            autosize=False,
            margin=dict(t=0, l=0),
            scene=dict(
                xaxis=dict(title="Weight", range=ranges["x"], autorange=False, showticklabels=True),
                yaxis=dict(title="Bias", range=ranges["y"], autorange=False, showticklabels=True),
                zaxis=dict(title="Loss: log(MSE)", range=ranges["z"], autorange=True, showticklabels=False),
            ),
        )

        self.data = [
            go.Surface(
                z=self.range_losses,
                x=self.range_weights,
                y=self.range_biases,
                colorscale="Viridis",
                opacity=0.9,
                showscale=False,
                hoverinfo="none",
            ),
            self.ball,
        ]

        self.plot = go.FigureWidget(self.data, self.layout)

    def register_neuron(self, neuron: SimpleNeuron):
        self.neuron = neuron
        self.calc_surface()

        # height of 3d surface represents loss of weight/bias combination
        # In the 2D plot, x is an array from e.g. -4 to +4. But the weights and biases only have a single value
        # Here x will be the points to do regression and to calculate the loss on. 
        # The surface is spanned by the arrays of weight and bias.
        
    def calc_surface(self):  
                
        self.neuron.weight = (  #instead of 1 weight and 1 bias, let Neuron have an array of all weights and biases
            self.range_weights
        )
        self.neuron.bias = self.range_biases_t
        self.range_losses = log_mse(  # result: matrix of losses of all weight/bias combinations in the given range
            self.neuron, self.points
        )
        self.plot.data[0].z = self.range_losses

    def update(self):
        if self.idle:
            thread = threading.Thread(target=self.redraw)
            thread.start()

    def redraw(self):  # when updating, only the ball is redrawn
        self.idle = False
        time.sleep(self.draw_time)
        self.ball.x = [self.neuron.weight]
        self.ball.y = [self.neuron.bias]
        self.ball.z = [log_mse(self.neuron, self.points)]
        self.plot.data[1].x = self.ball.x
        self.plot.data[1].y = self.ball.y
        self.plot.data[1].z = self.ball.z
        self.idle = True

In [8]:
class DualPlot:
    def __init__(self, points: Dict[str, List[float]], ranges_3d: Dict[str, Tuple[float, float]], ranges_2d: Dict[str, Tuple[float, float]]):
        self.plot_3d = Interactive3DPlot(points, ranges_3d)
        self.plot_2d = Interactive2DPlot(points, ranges_2d, width=400, height=500, margin=dict(t=200, l=30))

    def register_neuron(self, neuron: SimpleNeuron):
        self.plot_3d.register_neuron(neuron)
        self.plot_2d.register_neuron(neuron)

    def update(self):
        self.plot_3d.update()
        self.plot_2d.update()

<div class="alert alert-block alert-success">
<b>Task:</b> Train the neuron
<ul>
<li> You are given the same set of 3 points and again one neuron to do a curve fit. Run the cell below.
<li> <b>Change the weight and bias of the neuron using the sliders to minimize the loss.</b>
<li> <b>Observe all changes.</b>
    </li>

</ul>

</div>

<div class="alert alert-block alert-info">
<b>Note:</b> You can turn the 3D-Plot by clicking on it and moving your cursor, but you have to stay inside the widget with your cursor. 

</div>

In [9]:
ranges_3d = dict(x=(-2.5, 2.5), y=(-2.5, 2.5), z=(-1, 2.5))  # set up ranges for the 3d plot
plot_task2 = DualPlot(points_linreg, ranges_3d, ranges_linreg)  # create a DualPlot object to mange plotting on two plots
neuron_task2 = SimpleNeuron(plot_task2)  # create a new neuron for this task

interact(
    neuron_task2.set_values,
    weight=FloatSlider(min=-2, max=2, step=0.2, layout=slider_layout),
    bias=FloatSlider(min=-2, max=2, step=0.2, layout=slider_layout),
)

HBox((plot_task2.plot_3d.plot, plot_task2.plot_2d.plot))

interactive(children=(FloatSlider(value=0.0, description='weight', layout=Layout(width='90%'), max=2.0, min=-2…

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

<div class="alert alert-block alert-success">
<b>Question (2 pts):</b> In general, what does the optimal weight and bias combination correspond to in the 3D Plot? And what is the steepness in this point?
</div>

<div class="alert alert-block alert-success">
<b>Your Answer:</b> Die beste Kombi für den kleinsten Loss ist der Globale Minimum. An diesem Punkt erreichen wir den kleinsten Wert der Loss-Funktion und liegen somit am nächsten zum Istwert. Die Steilheit an diesem Punkt ist gleich NULL -> Das ist genau der Punkt, an dem der Gradient in beide Richtungen ( Weight und Bias ) gleich Null ist. 
</div>

## 1.4 Activation Functions

So far, our neuron was purely linear. To capture more complex relationships, we need **non-linear activation functions**. Without them, a deep network reduces to a $\textbf{single-layer linear model}$, as a composition of linear functions remains linear. This severely limits the network's ability to handle $\textbf{non-linear problems}$, As noted in $\textit{Deep Learning}$ ([Goodfellow et al.](https://www.deeplearningbook.org/), 2016, p. 168).   

One popular choice is the **Rectified Linear Unit (ReLU)**:
$$
  \phi_{\mathrm{ReLU}}(x) = \max(0, x).
\quad (4)
$$

This simple function outputs 0 for negative inputs and behaves linearly for positive inputs. This non-linearity allows networks to approximate more sophisticated functions than a pure linear model. 

Below, we define `relu` and then illustrate a `ReluNeuron` class that inherits from `SimpleNeuron` but applies $\max(0, x)$ to the output.

<center>
  <img src="images/single_neuron_relu.png" alt="Diagram of a neuron with ReLU activatio" width="600"/>
</center>
<p style="text-align: center;"><em>Figure 5: Neuron with ReLU activation.</em></p>

In [10]:
def relu(input_val: np.ndarray) -> np.ndarray:
    return np.maximum(input_val, 0)

<div class="alert alert-block alert-success">
<b>Task:</b>  Implement an artificial neuron with relu activation function
<ul>
<li> Complete artificial neuron code below by using the relu function from above to calculate its activation, like in Figure 5. </li>
<li>Take a look at the <a href="#simple_neuron">Simple Neuron Class</a> and appropriately update the compute method for the ReluNeuron with the just-defined relu function.</li>

</ul>
</div>

In [11]:
class ReluNeuron(SimpleNeuron): #inherit from SimpleNeuron class
    def compute(self, x: Union[float, np.ndarray]) -> Union[float, np.ndarray]:
        # STUDENT CODE HERE (1 pt)
        '''
        *) Was wir quasi hier machen ist, mit der Funktion super() erben wir die Funktion compute() von der Elternklasse SimpleNeuron
        *) Die Funktion compute() hat die Aufgabe, den linearen Zusammenhang auszurechnen für ein gegebenes weight und Bias -> und die x-Eingabe wird als parameter in compute(x) eingegeben.
        *) Da wir jetzt die Aktivierungsfunktion relu() anwenden wollen und das lokal auf jedem linearen Zusammenhang, um eine nicht-Linearität einzuführen, wenden wir die Funktion auf der Ausgabe von compute() an. 
        '''
        Ausgabe = super().compute(x)
        self.activation = relu(Ausgabe)
        # STUDENT CODE until HERE
        return self.activation

### 1.4.1 Task: Nonlinear Climate Control

Imagine you work at "ClimaTronics", and need to design an AI-based climate control system with the following requirements:
- **Climate control off** for temperatures under 25°C.
- **At 30°C**, it should reach **10%** of its cooling power.
- **Between 30°C and 40°C**, cooling power rises **quadratically** with temperature.
- **At 40°C**, cooling power is **100%** (the maximum).

These points form a non-linear curve as per Figure 6. We’ll attempt to approximate it with a **single ReLU neuron**.

<center>
  <img src="images/datasheet.png" alt="ClimaTronics target curve" width="750"/>
</center>
<p style="text-align: center;"><em>Figure 6: ClimaTronics target curve.</em></p>

In [12]:
points_climate = dict(x=[25.0, 27.5, 30.0, 32.5, 35, 37.5, 40.0], y=[0.0, 2.0, 10.0, 23.7, 43, 68.7, 100.0])

ranges_climate = dict(x=(-4, 45), y=(-4, 105))
climate_plot = Interactive2DPlot(points_climate, ranges_climate)
our_relu_neuron = ReluNeuron(climate_plot)

interact(
    our_relu_neuron.set_values,
    weight=FloatSlider(min=-10, max=10, step=0.1, value=0, layout=slider_layout),
    bias=FloatSlider(min=-200.0, max=200.0, step=1, value=0, layout=slider_layout),
)

climate_plot.plot

interactive(children=(FloatSlider(value=0.0, description='weight', layout=Layout(width='90%'), max=10.0, min=-…

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '38b19c61-0cef-4fc4-b163-3d9ef449e82e',
              'x': {'bdata': ('AAAAAAAAEMAzMzMzMzMPwGZmZmZmZg' ... 'mZmVlGQGxmZmZmZkZAOTMzMzNzRkA='),
                    'dtype': 'f8'},
              'y': {'bdata': ('AAAAAAAAEMAzMzMzMzMPwGZmZmZmZg' ... 'zMzCxaQDozMzMzM1pAoJmZmZk5WkA='),
                    'dtype': 'f8'}},
             {'mode': 'markers',
              'type': 'scatter',
              'uid': 'ccddb6fe-ffec-4844-9819-cd92eab7d9fc',
              'x': [25.0, 27.5, 30.0, 32.5, 35, 37.5, 40.0],
              'y': [0.0, 2.0, 10.0, 23.7, 43, 68.7, 100.0]}],
    'layout': {'autosize': False,
               'height': 400,
               'margin': {'l': 170, 't': 0},
               'showlegend': False,
               'template': '...',
               'width': 800,
               'xaxis': {'fixedrange': True, 'range': [-4, 45], 'title': {'text': 'Input: x'}},
               'yaxis': {'fixedrange': True, 'range': [-4, 1

<div class="alert alert-block alert-success">
<b>Question (3 pts):</b> Answer the following questions in the answer block below and indicate which question your answer is referring to: <br>
    
1. When setting the bias to 0.00, how does changing the weight affect the output function? <br>
2. How does changing the bias affect the output function? <br>
3. When setting the weight to 1.00 and the bias to -10, at what temperature does the climate control start? <br>
4. When setting the weight to 1.00 and the bias to -20, at what temperature does the climate control start? <br>
5. When setting the weight to 2.00 and the bias to -20, at what temperature does the climate control start? <br>
6. What's the best weight/bias configuration that you could find? <br>
    
</div>

<div class="alert alert-block alert-success">
<b>Your Answer:</b> 
    
1. <br> Der weight Parameter ist hauptsächlich dafür da, die Steigung des Anstiegs zu bestimmen -> Bei einer positiven Steigung ist für x < 0 alles gleich NULL und bei einer negativen Steigung ist für x > 0 alles gleich NULL. 
2. <br> Der Bias Parameter ist hauptsächlich dafür da, den "Knick-Punkt" auf der x-Achse zu verschieben (Rechts-Links)
3. <br> In diesem Fall ist der "Knick-Punkt" bei 9.8, d.h. wie Klimaanlage fängt schon bei 9.8C an, aufzuheißen.
4. <br> In diesem Fall ist der "Knick-Punkt" bei 19.9, d.h. wie Klimaanlage fängt schon bei 19.9C an, aufzuheißen.
5. <br> In diesem Fall ist der "Knick-Punkt" bei 9.9, d.h. wie Klimaanlage fängt schon bei 9.9C an, aufzuheißen.
6. <br> Wir wollen ja eine Konfiguration bei der die Loss-Funktion am niedrigsten ist, diese ist genau bei weight = 7.1, bias = -199 zu finden mit einem Loss von 50,59

</div>

## 1.5 Neural Networks

Our singular neuron, couldn't capture the desired quadratic scaling. To achieve more complex approximations, we can combine multiple neurons:
- Multiple **hidden neurons** allow for multiple “bends” or “segments” in the overall function.
- Stacking layers of neurons forms a *Multi-Layer Perceptron (MLP)*.

Below, we show an example network with **two ReLU neurons** in the hidden layer and **one output neuron** (which, in this example, is linear). We can choose the weights and biases freely, but the resulting function is more flexible than a single neuron.
<center>
  <img src="images/hidden_layer.png" alt="Diagram of a hidden layer with two neurons" width="700"/>
</center>
<p style="text-align: center;"><em>Figure 6: A neural network with one hidden layer (two neurons) and one linear output neuron.</em></p>

<div class="alert alert-block alert-info">
<b>Note:</b>

- For simplicity and reusability, we treat neural networks like individual neurons. Since a neuron is just a mathematical function, an entire network can also be represented as a single function, as shown in the activation calculation—without requiring explicit neuron objects.
-  With at least one hidden layer and a suitable non-linear activation function, a neural network can theoretically approximate any continuous function. More about this can be read in "Further Reading" at the end of the notebook.

</div>

### 1.5.1 Building a Simple Network

Below, we define a minimal Python class for a 2-neuron hidden layer plus 1-neuron output:
- Weights: $w_{i1}, w_{o1}, w_{i2}, w_{o2}$
- Biases: $b_1, b_2$
- The output (as per Figure 6) is:
$$
   \text{network\_output}(x) 
   = \mathrm{ReLU}(w_{i1} x + b_1) \cdot w_{o1}
   + \mathrm{ReLU}(w_{i2} x + b_2) \cdot w_{o2}.
$$

We connect it to the same interactive plotting scheme. Again, you will be able to move sliders for these **six parameters** to see how the curve changes.


In [13]:
class NeuralNetwork:
    def __init__(self, plot: Interactive2DPlot):
        self.plot = plot #I am assigned the following plot
        self.plot.register_neuron(self) #hey plot, remember me
        
    def set_config(self, w_i1: float, w_o1: float, b1: float, w_i2: float, w_o2: float, b2: float):
        self.w_i1 = w_i1
        self.w_o1 = w_o1
        self.b1 = b1
        self.w_i2 = w_i2
        self.w_o2 = w_o2
        self.b2 = b2
        self.show_config()
        self.plot.update()  # please redraw my output

    def show_config(self):
        print("w_i1:", self.w_i1, "\t| ", "w_o1:", self.w_o1,"\n")
        print("b1:", self.b1, "\t| ", "w_i2:", self.w_i2,"\n")
        print("w_o2:", self.w_o2, "\t| ", "b2:", self.b2,"\n")

    def compute(self, x: Union[float, np.ndarray]) -> Union[float, np.ndarray]:
        self.prediction = (relu(self.w_i1 * x + self.b1) * self.w_o1
                         + relu(self.w_i2 * x + self.b2) * self.w_o2)
        return self.prediction

### 1.5.2 Task: Nonlinear Climate Control with a Small Neural Network

This time, let’s approximate the same climate-control curve (25–40°C) with **two ReLU neurons** in the hidden layer:
- Each hidden neuron can provide one “bend” in the function.
- The output is a linear combination of those two ReLU outputs.


In [14]:
climate_plot_adv = Interactive2DPlot(points_climate, ranges_climate)
our_neural_net = NeuralNetwork(climate_plot_adv)

interact(
    our_neural_net.set_config,
    w_i1=FloatSlider(min=-10, max=10, step=0.1, layout=slider_layout),
    w_o1=FloatSlider(min=-10, max=10, step=0.1,  layout=slider_layout),
    b1=FloatSlider(min=-200.0, max=200.0, step=1,  layout=slider_layout),
    w_i2=FloatSlider(min=-10, max=10, step=0.1, layout=slider_layout),
    w_o2=FloatSlider(min=-10, max=10, step=0.1,  layout=slider_layout),
    b2=FloatSlider(min=-200.0, max=200.0, step=1,layout=slider_layout),
)
climate_plot_adv.plot

interactive(children=(FloatSlider(value=0.0, description='w_i1', layout=Layout(width='90%'), max=10.0, min=-10…

FigureWidget({
    'data': [{'type': 'scatter',
              'uid': '7ae132ff-cab6-4a83-8730-01a9e641e4e8',
              'x': {'bdata': ('AAAAAAAAEMAzMzMzMzMPwGZmZmZmZg' ... 'mZmVlGQGxmZmZmZkZAOTMzMzNzRkA='),
                    'dtype': 'f8'},
              'y': {'bdata': ('AAAAAAAAEMAzMzMzMzMPwGZmZmZmZg' ... 'zMzCxaQDozMzMzM1pAoJmZmZk5WkA='),
                    'dtype': 'f8'}},
             {'mode': 'markers',
              'type': 'scatter',
              'uid': '5502efa6-74d2-4764-bfd6-6da2c3995c20',
              'x': [25.0, 27.5, 30.0, 32.5, 35, 37.5, 40.0],
              'y': [0.0, 2.0, 10.0, 23.7, 43, 68.7, 100.0]}],
    'layout': {'autosize': False,
               'height': 400,
               'margin': {'l': 170, 't': 0},
               'showlegend': False,
               'template': '...',
               'width': 800,
               'xaxis': {'fixedrange': True, 'range': [-4, 45], 'title': {'text': 'Input: x'}},
               'yaxis': {'fixedrange': True, 'range': [-4, 1

<div class="alert alert-block alert-success">
<b>Question (1pt):</b> What is the best configuration you could find?
</div>

<div class="alert alert-block alert-success">
<b>Your Answer:</b> 
    <ul>
        <li> w_i1: 2.9 </li> 
        <li> w_o1: 1.9 </li> 
        <li> b1: -82 </li> 
        <li> w_i2: 4.4</li> 
        <li> w_o2: 1.3 </li>
        <li> b2: -151 </li> 
        <li> Loss: 2.45 </li>       
        <li> Wie haben erstmal 0C für alle Temperaturen > 28.2 Grad, danach geht es linear hoch mit der Steigung des ersten Neurons wi1 und bei kinickt es wieder bei ungefähr 34.3C mit der Steugung des zweiten Neurons wi2 </li>       
    </ul>
        
</div>

### 1.5.3 Conclusion
Using two ReLU neurons with individual biases better approximates a quadratic relationship than a single ReLU neuron, as it introduces two bends in the function. However, as network complexity grows, optimizing weights and biases becomes significantly harder.

## 1.6 Backpropagation

The examples above relied on manual parameter adjustments. Real-world neural networks can have thousands or even millions of parameters, making manual tuning impossible. The solution is **backpropagation**, an algorithm that automatically:
1. Performs **forward propagation**: passes inputs through the network to compute predictions.
2. Calculates the **loss** by comparing predictions to target values.
3. Uses the **chain rule** to compute **gradients** (partial derivatives of the loss) w.r.t. each weight and bias.
4. **Updates** each parameter in the direction that **reduces** the loss, typically through **gradient descent**.

### 1.6.1 Gradient Descent

A common gradient-based update rule, where at each pass the parameters get updated based on loss feedback, is:
$$
  \theta_{\text{new}} \;=\; \theta_{\text{old}} \;-\; \eta \cdot \frac{\partial J}{\partial \theta},
  \quad (5)
$$
where:
- $\theta$ represents a parameter (e.g., weight or bias),
- $\eta$ is the **learning rate**,
- $\partial J / \partial \theta$ is the partial derivative of the loss w.r.t. $\theta$.

- If $\eta$ is too large, updates might overshoot and the loss may explode or not converge.
- If $\eta$ is too small, training converges very slowly.

**Epoch**: One full pass over the training data. Since we must recalculate gradients after each step, deep learning typically uses efficient frameworks (like PyTorch, TensorFlow, etc.) to handle these calculations automatically.

<center>
  <img src="images/backprop.png" alt="Visualization of gradient-based optimization on a loss surface" width="600"/>
</center>
<p style="text-align: center;"><em>Figure 7: Gradient-based descent on a 3D loss surface.</em></p>

In [15]:
plot_backprop = DualPlot(points_linreg, ranges_3d, ranges_linreg)
trace_to_plot = go.Scatter3d(x=[], y=[], z=[], hoverinfo="none", mode="lines", line=dict(width=10, color="grey"))

plot_backprop.plot_3d.data.append(trace_to_plot)  # Expand 3D Plot to also plot traces
plot_backprop.plot_3d.plot = go.FigureWidget(plot_backprop.plot_3d.data, plot_backprop.plot_3d.layout)
plot_backprop.plot_3d.draw_time = 0


def redraw_with_traces(plot_to_update: Interactive2DPlot, neuron: SimpleNeuron, trace_list: Dict[str, List[float]], points: Dict[str, List[float]]):  # executed every update step
    plot_to_update.plot_3d.plot.data[2].x = trace_list["x"]
    plot_to_update.plot_3d.plot.data[2].y = trace_list["y"]
    plot_to_update.plot_3d.plot.data[2].z = trace_list["z"]
    plot_to_update.plot_3d.plot.data[1].x = [neuron.weight]
    plot_to_update.plot_3d.plot.data[1].y = [neuron.bias]
    plot_to_update.plot_3d.plot.data[1].z = [log_mse(neuron, points)]
    plot_to_update.update()


def add_traces(neuron: SimpleNeuron, points: Dict[str, List[float]], trace_list: Dict[str, List[float]]):  # executed every epoch
    trace_list["x"].extend([neuron.weight])
    trace_list["y"].extend([neuron.bias])
    trace_list["z"].extend([log_mse(neuron, points)])

### 1.6.2 Implementing Backpropagation for a Single Neuron

We return to a simpler scenario (a single neuron with no activation) to illustrate the idea:
1. We compute the **forward pass** ($\hat{y} = w \cdot x + b$).
2. We measure the **loss** using MSE.
3. We compute the **partial derivatives** (gradients) w.r.t. $w$ and $b$.
4. We **update** $w$ and $b$ with gradient descent.
The final code will:
- Plot the neuron’s movement in the 3D **loss surface** (weight vs. bias vs. log of MSE).
- Show how the neuron’s line changes in 2D over the data.


<div class="alert alert-block alert-success">
<b>Task:</b> Determine the Gradient <b>analytically!!</b>
<ul>
<li> <b>Finish the function below by yourself.</b>
<li> There are multiple solutions to this, your algorithm may adjust the weight and bias in the right direction despite the gradient calculation being wrong.
<li> <b>Benchmark:</b> If you can reach a loss of 0.22 after 100 epochs and a learning rate of 0.03, your solution is correct
    </li>
</ul>
</div>

<div class="alert alert-block alert-info">
<b>Hint:</b>

- If you are having trouble figuring the gradient out, try calculating the gradient by hand to grasp the core idea/algorithm behind the update steps.
- Ask yourself: What are the components of the Loss-function? How does the Loss-function depend on the weight and bias variables, by which you have to differentiate?
- If you aren't satisfied with the explanation, you could look at [resources](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/) such as the one linked here, or other ones. Thankfully, there is a plethora of explanations in different languages available online.

</div>

In [16]:
def simple_neuron_loss_gradient(neuron: SimpleNeuron, points: Dict[str, List[float]]) -> Dict[str, float]:

    gradient_sum = dict(weight=0, bias=0) # contains the sum of the weight and bias gradient
    for point_x, point_y in zip(points["x"], points["y"]):  # for each point
            # Hint: point_x and point_y are the current point values
        '''
        *) point_x ist der Eingabewert und point_y ist der richtige Ausgabewert.
        *) Wir rechnen die Vorhersage für jeden Punkt aus.
        *) Jetzt berechnen wir die Ableitung nach weight für diesen einen Punkt aus.
        *) Jetzt berechnen wir die Ableitung nach bias für denselben Punkt aus.
        *) Wir rechnen die Ableitung von der Loss-Funktion L = ( Der richtige Wert - Die Vorhersage )^2
        *) Und die Vorhersage eines Neurons ist ja = weight * Eingabe + bias -> und da wir hier einen linearen Zusammenhang haben können wir direkt den Gradienten bilden.
        *) Was wir im Ideal Fall erreichen wollen ist, dass der Gradient = 0 (Vektor) ist, dann heißt es dass wir keinen Fehler haben -> dafür summieren wir die Ableitung nach weght und bias um ein Gefühl dafür zu bekommen in welcher Richtung wir uns bewegen müssen, um nahe 0 zu kommen. 
        *) 
        '''
        ŷ = neuron.weight * point_x + neuron.bias # Das ist die Vorhersage

        gradient_sum["weight"] += ( # sum up the gradient for each point

            ### STUDENT CODE HERE (2 pts)
            -2 * (point_y - ŷ) * point_x
            ### STUDENT CODE until HERE
        )

        gradient_sum["bias"] += (
            ### STUDENT CODE HERE (2 pts)
            -2 * (point_y - ŷ)
            ### STUDENT CODE until HERE
        )

    gradient = dict(weight=gradient_sum["weight"] / len(points["x"]), bias=gradient_sum["bias"] / len(points["x"]))
    return gradient


<div class="alert alert-block alert-success">
<b>Task:</b> Adjust the Neuron
<ul>

<li> After finding the gradient you have to adjust the weight and bias of the neuron, based on the partial derivatives and the learning rate. You have to verify your results by training the neural network in an upcoming code block.
<li> <b>Finish the function below by yourself.</b>
    </li>
</ul>
</div>

<div class="alert alert-block alert-info">
<b>Info:</b>
<ul>
    <li> This is an iterative function used on each neuron once per epoch.
    <li> Use the neurons current weight and bias as a starting point and adjust it to improve the NN, as per equation (5).
    <li> The entered learning rate scales the magnitude of the adjustment.
    <li> Think about the direction of the loss gradient and the direction you want your loss to shift in.
</ul>

In [17]:
def adjust_neuron(neuron: SimpleNeuron, gradient: Dict[str, float], learning_rate: float):
    ### STUDENT CODE HERE (2 pts)
    '''
    *) Wir haben die weight und bias Werte im Block vorher bestimmt, jetzt wollen wir diese anpassen. 
    *) Das Anpassen heißt, dass wir die Werte gezielt so ändern, dass wir uns an dem globalen Minimum annähren, dies funktioniert mit der folgenden Regel.
    *) unter gradient[ ] haben wir einen Dic, in dem der Durchschnitt aller wieghts/ biases gespeichert sind - Z.B. gradient = {"weight": -2.5, "bias": 1.8} 
    '''
    neuron.weight = neuron.weight - learning_rate * gradient['weight']
    neuron.bias = neuron.bias - learning_rate * gradient['bias']
    ### STUDENT CODE until HERE

### 1.6.3 Training Loop and Hyperparameters

Once backpropagation is implemented, we define a training loop that:
- Iterates for a chosen number of epochs.
- In each epoch:
  1. Calculates gradients via backpropagation.
  2. Updates weights and biases.
  3. (Optionally) logs or plots intermediate results.

**Hyperparameters** like **learning rate** (`learning_rate`) and **epochs** are crucial. 

- A **large** learning rate may lead to divergence (loss grows uncontrollably).
- A **small** learning rate can make convergence very slow.

Once you find a good balance, training converges to a local minimum for this single-neuron problem.

In [18]:
# do not change
def train(neuron: SimpleNeuron, points: Dict[str, List[float]], epochs: int, learning_rate: float, redraw_step: int, trace_list: Dict[str, List[float]]):
    redraw_with_traces(neuron.plot, neuron, trace_list, points)
    for i in range(1, epochs + 1):  # first Epoch is Epoch no.1
        add_traces(neuron, points, trace_list)
        gradient = simple_neuron_loss_gradient(neuron, points)
        adjust_neuron(neuron, gradient, learning_rate)

        if i % redraw_step == 0:
            print("Epoch:{} \t".format(i), end="")
            redraw_with_traces(neuron.plot, neuron_backprop, trace_list, points)

<div class="alert alert-block alert-success">
<b>Task:</b> Choose Hyperparameters and Train
<ul>

<li> Choose an optimal learning rate and number of epochs by trying out values and running the two cells below</li>
<li> The default values required to verify your previous implementations are a learning rate of 0.03, 100 epochs with a redraw_step of 10.
    </li>

</ul>
</div>

In [19]:
learning_rate = 0.03 #keep this for benchmarking, change to play around
epochs = 100 # keep this for benchmarking, change to play around
redraw_step = 10 # update plot every n'th epoch. too slow? set this to a higher value (e.g. 100)

# these values are taken as parameters by the train function below

neuron_backprop = SimpleNeuron(plot_backprop)
HBox((plot_backprop.plot_3d.plot, plot_backprop.plot_2d.plot))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

In [20]:
#run this cell to test algorithm
np.random.seed(4) # keep this for benchmarking, remove to play around

neuron_backprop.set_values(  # set weight and bias randomly
    (5 * np.random.random() - 2.5), (5 * np.random.random() - 2.5)
)
trace_list1 = dict(x=[], y=[], z=[])

train(neuron_backprop, points_linreg, epochs, learning_rate, redraw_step, trace_list1)

Loss: 18.45
Loss: 18.45
Epoch:10 	Loss: 0.56
Epoch:20 	Loss: 0.49
Epoch:30 	Loss: 0.44
Epoch:40 	Loss: 0.39
Epoch:50 	Loss: 0.35
Epoch:60 	Loss: 0.32
Epoch:70 	Loss: 0.29
Epoch:80 	Loss: 0.26
Epoch:90 	Loss: 0.24
Epoch:100 	Loss: 0.22


**Benchmark:** If you can reach a loss of 0.22 after 100 epochs and a learning rate of 0.03, your solution is correct

**Only answer this after your algorithm has hit the benchmark**

<div class="alert alert-block alert-success">
<b>Question (4 pts):</b> Answer the following questions in the answer block below and indicate which question your answer is referring to: <br>
    
1. What happens when you set the learning rate to 0.18? Explain this behavior. <br>
2. What happens when you set the learning rate to 0.182? Explain this behavior. <br> 
3. What is the best learning rate you could find? (In terms of: lowest loss after 100 Epochs with lr=0.03) (Anything better than the benchmark loss of 0.22 is correct) <br>
    
</div>

<div class="alert alert-block alert-success">
<b>Your Answer:</b> 
    
1. Die Lernrate hier ist zu groß und daher springen wir sozusagen über das Optimum und es wird nie erreicht, weil wir halt zu große Schritte beim korrigieren der Werte machen <br>
2. Hier sind die Schritte NOCH größer, was dafür sorgt, dass der Training komplett instabil wird... Das geht sogar ganz in die Falsche Richtung ( LOSS wird GRÖßER!!!! ) <br>
3. Mit einer Learning-Rate von 0.05 erreichen wir einen LOSS von 0.14 nach 100 Epochen ----- Mit einer Learning-Rate von 0.04 erreichen wir einen LOSS von 0.17 nach 100 Epochen <br> 
ZUSATZ FÜR MICH -> Wenn die Lernrate zu klein ist, dauert es EWIG bis wir zum Optimum kommen und wenn sie zu Groß ist schwingen wir hin und her oder geraten völlig wo anders! Deswegen muss man immer einen Mittelwert finden, bei dem sie weder zu Groß noch zu Klein ist <br> 
</div>

## 1.7 Machine/Deep Learning Notation

- **Batch size**: The number of samples processed before updating parameters. Training may use:
  - **Stochastic Gradient Descent (SGD)**: Update after each individual sample.
  - **Mini-batch Gradient Descent**: Update after processing small batches (e.g., 32 samples).
  - **Full-batch Gradient Descent**: Update after processing the entire dataset once.
- **Epoch**: One full pass through the training data.
- **Regularization** (e.g., L1, L2, Dropout) helps avoid overfitting by penalizing large weights or temporarily “dropping” neurons.
- **Exploding/Vanishing Gradients** can occur in deep networks when gradients multiply across many layers. This is why ReLU is often used instead of sigmoid or tanh in deeper architectures. Modern deep networks may also include **skip (residual) connections** or other architectures to mitigate these issues.

***
**Further Reading**:  
1. [Hornik, K. (1991).](https://www.sciencedirect.com/science/article/pii/089360809190009T?via%3Dihub) *Approximation capabilities of multilayer feedforward networks*. **Neural Networks**, 4(2), 251–257. 

  Demonstrates that feedforward networks with a single hidden layer and non-linear activations can approximate *any* continuous function on a compact set, given sufficiently many hidden neurons.
