# AI-ANNE: (A) (N)EURAL (N)ET FOR (E)XPLORATION
#### by Prof. Dr. habil. Dennis Klinkhammer (2025)

![title](cow_rabbit.png)
# What is a Neural Network?
A neural network consists of **neurons and layers** that process data via **activation functions**. Weights and biases are necessary in order to activate a neuron and to reach out for other neurons in another layer. AI-ANNE will adjust weights and biases automatically in order to identify and process **non-linear patterns** within datasets. As a result, AI-ANNE is a self-learning neural network.

# Your Data
AI-ANNE will learn the **difference between cows and rabbits** by using **four features**. Features are individual pieces of information (e.g. variables) that the neural network uses to make predictions or decisions. Additionaly, **one dependent variable** differs between cows (0) and rabbits (1).

In [1]:
# Features (X) and Dependent Variable (y)
X = [[ 0.81575475, -0.21746808, -0.12904165, -0.65303909],
         [ 0.05761837,  1.59476592,  0.84485761,  1.71304456],
         [ 0.96738203,  0.68864892, -0.00730424, -0.41643072],
         [ 2.02877297,  0.38660992,  2.06223168,  1.00321947],
         [ 1.42226386,  0.99068792,  1.33180724,  0.29339437],
         [ 0.81575475,  0.99068792,  1.21006983,  1.4764362 ],
         [-1.00377258,  0.38660992, -0.49425387, -0.41643072],
         [ 0.05761837, -0.51950708, -0.00730424,  0.29339437],
         [ 0.36087292,  0.38660992,  1.08833242,  1.23982783],
         [ 0.66412748,  0.38660992,  0.35790798,  1.4764362 ],
         [ 0.05761837,  0.08457092,  0.84485761,  0.29339437],
         [-0.70051802, -0.51950708,  0.23617057,  0.53000274],
         [ 0.20924564, -0.21746808,  0.84485761,  1.00321947],
         [-0.24563619,  0.08457092, -0.25077906, -0.65303909],
         [-2.06516352, -1.42562408, -1.95510276, -1.59947255],
         [-1.15539985, -1.42562408, -1.34641572, -1.36286418],
         [ 0.05761837, -1.12358508, -0.00730424, -0.41643072],
         [ 0.20924564,  0.08457092, -0.73772869, -0.88964745],
         [-0.39726347, -0.51950708,  0.23617057, -0.17982236],
         [ 0.5125002 ,  0.08457092, -0.37251647, -0.88964745]]

y = [0,1,0,1,1,1,0,1,1,1,1,1,1,0,0,0,0,0,0,0]

# Libraries and Activation Functions
The libraries **random** and **math** are imported to simplify the execution of some functions required for self-learning neural networks. Self-learning neural networks not only require **activation functions**, but also their derivates. The derivative of a function represents its instantaneous rate of change at a specific point, which enables the training of neural networks.

In [8]:
# Libraries
import random
import math

# Sigmoid
def sigmoid(x):
    return 1 / (1 + math.exp(-x))

# Derivate of Sigmoid
def sigmoid_derivative(output):
    return output * (1 - output)

# ReLU
def relu(x):
    return max(0, x)

# Derivate of ReLU
def relu_derivative(output):
    return 1 if output > 0 else 0

# Data Processing 
In neural networks, **forward propagation** is the process of passing input data through the network's layers to generate a prediction and **backward propagation**, on the other hand, is the mechanism used to train the network by calculating the error between the prediction and the actual output, and then adjusting the network's weights to minimize that error. This important for the learning ability of a neural network. Furthermore, a **loss function** quantifies the difference between a deep learning model's prediction and the actual outcome, essentially acting as a measure of the model's error. Cross-entropy, a specific type of loss function, is commonly used for classification problems, especially when the model outputs probabilities. 

In [3]:
# Forward Propagation
def dense_forward(inputs, weights, biases, activation='relu'):
    outputs = []
    pre_activations = []
    for w, b in zip(weights, biases):
        z = sum(i*w_ij for i, w_ij in zip(inputs, w)) + b
        pre_activations.append(z)
        if activation == 'sigmoid':
            outputs.append(sigmoid(z))
        elif activation == 'relu':
            outputs.append(relu(z))
        else:
            raise Exception("Unknown activation")
    return outputs, pre_activations

# Backward Propagation
def dense_backward(inputs, grad_outputs, outputs, pre_activations, weights, biases, activation='relu', lr=0.01):
    input_grads = [0.0 for _ in range(len(inputs))]
    for j in range(len(weights)):
        if activation == 'sigmoid':
            delta = grad_outputs[j] * sigmoid_derivative(outputs[j])
        elif activation == 'relu':
            delta = grad_outputs[j] * relu_derivative(pre_activations[j])
        else:
            raise Exception("Unknown activation")
        for i in range(len(inputs)):
            input_grads[i] += weights[j][i] * delta
            weights[j][i] -= lr * delta * inputs[i]
        biases[j] -= lr * delta
    return input_grads

# Loss Function
def binary_cross_entropy(predicted, target):
    epsilon = 1e-7
    return - (target * math.log(predicted + epsilon) + (1 - target) * math.log(1 - predicted + epsilon))

def binary_cross_entropy_derivative(predicted, target):
    epsilon = 1e-7
    return -(target / (predicted + epsilon)) + (1 - target) / (1 - predicted + epsilon)

# Random Initialization of Neural Network
Since the neural network is supposed to learn the weights and biases by itself, the layers and neurons of the neural network will be **initialized with some random values**. The **architecture of the neural network** consists of four independent variables which will be forwarded to three neurons in the input layer and one neuron in the output layer. This is a very simple neural network that consists of four neurons in two layers with according weights (w1 and w2) and biases (b1 and b2).

In [4]:
# Function for Initializing Weights and Biases
def init_layer(input_size, output_size):
    weights = [[random.uniform(-0.5, 0.5) for _ in range(input_size)] for _ in range(output_size)]
    biases = [random.uniform(-0.5, 0.5) for _ in range(output_size)]
    return weights, biases

# Initialize Weights and Biases
w1, b1 = init_layer(4, 3)
w2, b2 = init_layer(3, 1)

# Learning Behavior
Finally, the number of **epochs and the learning rate** need to be specified in MicriPython. In neural networks, an epoch represents one complete pass of the entire training dataset through the model. Learning rate determines how much the model's weights are adjusted during each update step in the training process. Both are crucial hyperparameters that influence training and model performance.

In [5]:
# Epochs and Learning Rate for Training
epochs = 100
lr = 0.05

for epoch in range(epochs):
    total_loss = 0
    for xi, yi in zip(X, y):
        # Forward pass
        out1, pre1 = dense_forward(xi, w1, b1, 'relu')
        out2, pre2 = dense_forward(out1, w2, b2, 'sigmoid')
        loss = binary_cross_entropy(out2[0], yi)
        total_loss += loss

        # Backward pass
        dL_dout2 = [binary_cross_entropy_derivative(out2[0], yi)]
        dL_dout1 = dense_backward(out1, dL_dout2, out2, pre2, w2, b2, 'sigmoid', lr)
        _ = dense_backward(xi, dL_dout1, out1, pre1, w1, b1, 'relu', lr)

    if epoch % 10 == 0 or epoch == epochs - 1:
        print(f"Epoch {epoch+1}, Loss: {total_loss:.4f}")

Epoch 1, Loss: 14.7243
Epoch 11, Loss: 8.7675
Epoch 21, Loss: 2.6471
Epoch 31, Loss: 1.2069
Epoch 41, Loss: 0.7670
Epoch 51, Loss: 0.5462
Epoch 61, Loss: 0.4117
Epoch 71, Loss: 0.3231
Epoch 81, Loss: 0.2615
Epoch 91, Loss: 0.2168
Epoch 100, Loss: 0.1864


# Predictions and Confusion Matrix
The **outcome of the neural network** can be predicted with the collowing code in MicroPython and a **confusion matrix** can be used to evaluate the performance of the neural network.

In [6]:
def predict(x):
    out1, _ = dense_forward(x, w1, b1, 'relu')
    out2, _ = dense_forward(out1, w2, b2, 'sigmoid')
    return 1 if out2[0] > 0.5 else 0

ypred = [predict(xi) for xi in X]

def classification_report(ytrue, ypred):
    TP = TN = FP = FN = 0
    for true, pred in zip(ytrue, ypred):
        if true == pred:
            if true == 1:
                TP += 1
            else:
                TN += 1
        else:
            if true == 1:
                FN += 1
            else:
                FP += 1
    accuracy = (TP + TN) / len(ytrue)
    print("Accuracy: {:.3f}".format(accuracy))
    print("Confusion Matrix:")
    print("TN: {}, FP: {}".format(TN, FP))
    print("FN: {}, TP: {}".format(FN, TP))

# Solution
Finally, the **performance** AI-ANNE can be inspected via the confusion matrix. A confusion matrix helps visualize the performance of a classification model by **comparing its predictions against the actual results**. It essentially breaks down the predictions into four categories: true positives (correctly predicted positive cases), true negatives (correctly predicted negative cases), false positives (incorrectly predicted positive cases), and false negatives (incorrectly predicted negative cases). 


In [7]:
# Generate predictions
ypred = [predict(xi) for xi in X]

# Show classification metrics
classification_report(y, ypred)

Accuracy: 1.000
Confusion Matrix:
TN: 10, FP: 0
FN: 0, TP: 10
