In [39]:
import numpy as np
# to install pandas properly you might eed to use this for your virtual environment: sudo python3 -m pip install pandas
import pandas as pd

# Towards Neural Networks

This jupyter notebook inspired by the ML with Python course held by Peter Kocmann as part of the ProScience Computer Science Research coure at Technical University Berlin during Summer 2023. 

In this coure we went through very basic operations to understand Machine Learning bottom-up without uing the sophistcated libraries. Further literature that approaches the topic of Machine Learning in a similar way can be found here:

Mathematische Algorithmen mit Python (V. Steinkamp, 2022)
Neuronale Netze programmieren (J. Steinwendner, R. Schwaiger, 2020)

The written and theoretical explanations are taken from Chat-GPT. 
The model we are working with is from the course.

## Some theoretical backgrounds

A while-program for adding two numbers is a computational algorithm that utilizes iterative loops and conditional statements to perform arithmetic operations. It follows a step-by-step procedure, updating variables and executing instructions until a specified condition is met. This type of algorithm is based on symbolic computation and explicit programming.

On the other hand, a neural network is a computational model inspired by the structure and function of biological neural networks. It consists of interconnected artificial neurons or nodes organized in layers. Neural networks learn patterns and relationships from data through a training process, adjusting the weights and biases of the network connections to optimize its performance.

The connection between a while-program and a neural network can be seen in the concept of universal approximation. It has been theoretically proven that certain types of neural networks, particularly those with a sufficient number of hidden layers and neurons, are capable of approximating any continuous function, including the addition operation performed by a while-program.

In this sense, a neural network can be trained to approximate the behavior of a while-program for adding two numbers. By providing input values to the network and adjusting its parameters during training, the network can learn to produce outputs that approximate the desired sum of the two numbers.

Although the implementation and underlying mechanisms of while-programs and neural networks differ significantly, their theoretical connection lies in the ability of neural networks to approximate the computational behavior of while-programs, including tasks like adding numbers, through their flexible and adaptive learning capabilities.

In [4]:
# Add-function without sum-operator in a simple while-programme
def sum_wo_plus_operator(x, y):
    counter = 0
    while y != 0:
        counter += 1
        #print(counter,  'Calculation-step:')
        #print('x is:', bin(x), 'y is:', bin(y))
        #print('Compare all bits with each other, if both are 1 then carry_bit is 2, else the carry bit is 0.')
        carry = x & y
        #print('The carry bits result in:', bin(carry))
        x = x ^ y
        #print('x : ', x, '=', bin(x), '\n')
        y = carry << 1
        #print('y : ', y, '=', bin(y), '\n')
    return x

result = sum_wo_plus_operator(5, 3)

In [5]:
# This is a hard coded way to calculate the sum of two numbers following a 
# conceptional truthtable, in ML this is NOT what we want to do ;)

def ai_sum(a, b):
    if a == 0: return b
    if b == 0: return a
    if a == 1: 
        if b == 2:
            return 2
        elif b == 2:
            return 3
    elif a == 2:
        if b == 1:
            return 3
        if b == 2:
            return 4
    elif a == 3:
        if b == 1:
            return 4
        if b == 2:
            return 5
        if b == 3:
            return 6

ai_sum(2,2)

4

### Our model

In [44]:
# The model we are training has only two outcomes (binary clasification): 0 or 1
# In real life the network would work with text, image or more complex numerical data

data = {"Input Vector": [np.array([1.66, 1.56]), np.array([2, 1.5])], "Target": [1, 0]}
model = pd.DataFrame(data)

# printing the model without the indexes of the rows
print(model.to_string(index=False))

Input Vector  Target
[1.66, 1.56]       1
  [2.0, 1.5]       0


### Towards a neural network: some basic linear regression operations

The following graph illustrates a simplified computation flow of a single-layer neural network. It demonstrates how inputs are combined with corresponding weights, passed through an activation function (sigmoid in this case), and produce a prediction. While this graph is a simplified representation, it captures the fundamental steps involved in neural network computations, such as weighted sums, activation functions, and output predictions.

**Blue**: functions
**Pink**: outputs.

**Input**: The graph starts with an input node, which represents the input data to the neural network. It could be a single input or a vector of inputs.

**Weights**: This node represents the parameters of the neural network. These weights determine the strength and impact of each input on the network's computation.

**Dot Product and it's result**: This computes the dot product between the input data and the corresponding weights. This step multiplies each input with its respective weight and sums them up. The result of the dot product represents the weighted sum of the inputs, which is an intermediate computation in the neural network. The dot product calculation is a crucial operation in a neural network, it serves the *aggregation of inputs* and *linear tranformation*. #TODO: Link to an explanation of the two italic points.

**Sum and Bias**: The bias term allows the neural network to introduce an offset or bias towards certain outputs.

**Sigmoid**: This is the activation function, which is applied to the first layer.  The sigmoid function squashes the output into a range between 0 and 1, enabling non-linear transformations and introducing non-linearities into the neural network. It is a classification function, where 0 is 'no' and 1 is 'yes'.

**Prediction**: The prediction can then be used for further explorations and calculation. 

```mermaid
graph LR

classDef blue fill:#2374f7,stroke:#000,stroke-width:2px,color:#fff
classDef pink fill:#eb3dd6,stroke:#000,stroke-width:2px,color:#fff

id1.1[input] --> id2[dot_product]:::blue
id1.2[weights] --> id2
id4.1 --> id5(layer1_result):::pink


    subgraph A
        id2 --> id3(dot_product_result):::pink
        id3 --> id4.1[sum]:::blue
        id4.2(bias):::blue --> id4.1
    end

    subgraph B
        id5 --> id6[sigmoid]:::blue
        id6 --> id7(prediction):::pink
    end
```

#TODO: Layer1 must actually be outside box a and b and in between those two

#TODO: add explanation for second chart

```mermaid
graph LR

id1(bias)
id2(sum)
id3(sigmoid)
id4(error)

id2 -- dlayer1_dbias --> id1
id3 -- dprediction_dlayer1--> id2
id4 -- derror_dprediction --> id3
id4 -- derror_dbias --> id1
```

In [9]:
######################################
# Functions
######################################

def calculate_dot_product(v1, v2, result=0):
    if len(v1) != len(v2):
        raise ValueError('The vectors must be of same length!')
    
    for i in range(len(v1)):
        result += v1[i] * v2[i]

    return result

def calculate_dot_product_numpy(v1, v2, result=[]):
    result = np.dot(v1, v2)
    result = np.asarray(result)
    return result

def calculate_first_layer(x, y, w_0, w_1, bias):
     temp0 = x * w_0
     temp1 = y * w_1
     first_layer = temp0 + temp1 + bias
     return first_layer

# sigmoid function results always between 0 and 1
def sigmoid(x):
     return 1/(1+np.exp(-1))

def predict(input_vector, weight, bias):
     # the following can also be written as np.dot()
    temp_1 = v[0] * weight[0]
    temp_2 = v[1] * weight[1]

    layer_1 = temp_1 + temp_2 + bias
    prediction = sigmoid(layer_1)
    return prediction

def mean_squared_error(Y_true, Y_pred):
    return np.square(np.subtract(Y_true,Y_pred)).mean()

def sigmoid_deriv(x):
    return sigmoid(x) * (1-sigmoid(x))

In [7]:
# Initializations for our neural network build
v = [1.72, 1.23]
w = np.array([1.45, -0.66])
b = 0.0

# Initializations
# TODO: initialize properly
target = 0
Y_true = 0

prediction = predict(v, w, b)
first_layer = calculate_first_layer(v[0], v[1], w[0], w[1], b)

# note, todo: put it at the right playe, first layer is the layer before the activation function
# sigmoid function is for categorization, aying 0 (no), and 1 (yes)

In [32]:
class Backpropagation: 
    def __init__(self):
        # initializing the values with None ensures that they are available 
        # as attributes on the Backpropagation class instance and can be 
        # assigned values as needed thoughout the execution 
        self.weights = None
        self.input_data = None
        self.target = None
        self.prediction = None
        self.first_layer = None
        self.bias = b
    
    # perform the forward pass in a neural network
    # it takes input_data and weight to calculate the prediction
    def forward_pass(self, input_data, weights, bias): 
        self.input_data = input_data 
        self.weights = weights
        self.first_layer = self.calculate_first_layer()
        self.prediction = self.sigmoid_1(self.first_layer)
    
    def calculate_first_layer(self):
        return calculate_first_layer
    
    def sigmoid_1(self, x):
        return sigmoid(x)
    
    def predict(self, input_vector, weight, bias):
        return predict(input_vector, weight, bias)
    
    def sigmoid_deriv(self, x):
        return sigmoid_deriv(x)

    def calculate_error_gradient(self, target):
        self.target = target
        derror_dprediction = 2 * (self.prediction - self.target)
        dprediction_dlayer1 = self.sigmoid_deriv(self.first_layer)
        dlayer1_dbias = 1  # Derivative of sum with respect to bias

        derror_dbias = derror_dprediction * dprediction_dlayer1 * dlayer1_dbias

        # Update bias
        learning_rate = 0.1
        self.bias -= learning_rate * derror_dbias

        # Update weights
        derror_dweights = derror_dprediction * dprediction_dlayer1 * self.input_data
        self.weights -= learning_rate * derror_dweights

        return derror_dprediction, self.weights, self.bias


# Example usage 

In [34]:
# Example usage
input_data = np.asarray(v)
weights = w
bias = 0.0

backprop = Backpropagation()
backprop.target = 0
backprop.forward_pass(input_data, weights, bias)
error_gradient, updated_weights, updated_bias = backprop.calculate_error_gradient(target=0)

print("Updated weights:", updated_weights)
print("Updated bias:", updated_bias)
print("Error gradient:", error_gradient)

Updated weights: [ 1.40055521 -0.69535877]
Updated bias: -0.028746968091443028
Error gradient: 1.4621171572600098
