<a href="https://colab.research.google.com/github/kn9ck/MAT422/blob/master/HW_3_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HW 3.7

## Mathematical Formulation
In a neural network, the goal is to transform inputs through a series of weights and biases to make predictions. Each layer in a neural network applies a linear transformation followed by an activation function to produce its output.

The output of a node in a layer $l$ can be calculated by taking the weighted sum of the inputs from the previous layer, adding a bias, and applying an activation function.

$$z^l = \textbf{W}^l \textbf{a}^{l-1} + \textbf{b}^l$$
where:
* $\textbf{W}^l$ is the weight matrix for layer $l$
* $\textbf{a}^{l-1}$ is the output of the previous layer
* $\textbf{b}^l$ is the bias vector
* $z^l$ is the pre-activation output, which will be passed through an activation function.



In [21]:
import numpy as np

#define inputs and weights for a single layer neural network
inputs = np.array([0.5, -0.2])  # x1 and x2
weights = np.array([0.8, 0.4])  # w1 and w2
bias = 0.1

#output
z = np.dot(weights, inputs) + bias
print("weihgted sum (z):", z)

#apply activation function
output = 1 / (1 + np.exp(-z))  #sigmoid function
print("Outupt after activation:", output)


Weighted Sum (z): 0.42000000000000004
Outupt after activation: 0.6034832498647263


## Activation Functions
An **activation function** determines whether a neuron "fires" and thus contributes to the network's output. Different activation functions serve different purposes in neural networks.

The activation function applies non-linearity, enabling the network to solve complex tasks.

* **ReLU**: $\sigma (x) = \text{max}(0,x)$, allows faster and more effective training, especially in deeper networks.
* **Sigmoid**: $\sigma (x) = \frac{1}{1+e^{-x}}$, maps the output between 0 and 1, useful in binary classification.

* **Softmax**: used in classification, it outputs probabilities for each class.

In [22]:
#ReLU
def relu(x):
    return np.maximum(0, x)

#sigmoid
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

#softmax
def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / exp_x.sum()

z = np.array([-1, 0, 1])
print("ReLU:", relu(z))
print("Sigmoid:", sigmoid(z))
print("Softmax:", softmax(z))


ReLU: [0 0 1]
Sigmoid: [0.26894142 0.5        0.73105858]
Softmax: [0.09003057 0.24472847 0.66524096]


## Cost Function
The cost function measures how well the neural network's predictions match the true labels. It guides the learning process by quantifying the "error" between predictions and the actual outputs.

* For regression tasks, a common cost function is **Mean Squared Error** (MSE): $$J = \frac{1}{2} \sum_{n=1}^{N} \sum_{k=1}^{K}(\hat{y}_k^{(n)}-{y}_k^{(n)})^2$$
* For classification tasks, particularly binary classification, **cross-entropy loss** is often used: $$J=-\sum_{n=1}^{N}(y^{(n)}\ln \hat{y}^{(n)}+(1-y^{(n)}) \ln (1- \hat{y}^{(n)}))$$

In [23]:
#Mean Squared Error
def mse(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

# Cross-Entropy
def cross_entropy(y_true, y_pred):
    return -np.sum(y_true * np.log(y_pred))

# test with sample outputs
y_true = np.array([1, 0, 0])
y_pred = np.array([0.8, 0.1, 0.1])

print("MSE:", mse(y_true, y_pred))
print("Cross-Entropy:", cross_entropy(y_true, y_pred))


MSE: 0.019999999999999993
Cross-Entropy: 0.2231435513142097


## Backpropagation
**Backpropagation** is a process to compute the gradient of the cost function with respect to each weight, helping to minimize the cost function by adjusting the weights and biases.

**Backpropagation** calculates how changing each weight and bias affects the cost function. It does this layer by layer from the output back to the input.

For each layer, we calculate the "error" (delta) at each node, which depends on the derivative of the activation function and the error from the previous layer.

In [24]:
def backward_pass(weights, bias, inputs, output, target, learning_rate=0.01):
    #gradient of cost function with respect to output
    error = output - target

    #gradient of output with respect to z (using the derivative of sigmoid)
    d_output_d_z = output * (1 - output)

    #gadient of cost with respect to z
    d_cost_d_z = error * d_output_d_z

    #update weights and bias
    weights -= learning_rate * d_cost_d_z * inputs
    bias -= learning_rate * d_cost_d_z

    return weights, bias

inputs = np.array([0.5, -0.2])
weights = np.array([0.8, 0.4])
bias = 0.1
target = 1  # desired output

#forward pass
z = np.dot(weights, inputs) + bias
output = sigmoid(z)

#backward pass
new_weights, new_bias = backward_pass(weights, bias, inputs, output, target)
print("updated weights:", new_weights)
print("updated bias:", new_bias)



Updated weights: [0.80047441 0.39981023]
Updated bias: 0.10094882975699739


## Backpropagation Algorithm
The **backpropagation algorithm** is the systematic application of the backpropagation process over multiple iterations (epochs). During each epoch, it uses gradient descent to adjust the weights and biases to minimize the cost function.

1. Initialize weights and biases randomly.
2. For each training input, calculate the network output and cost.
3. Compute gradients of the cost with respect to weights and biases, update them using stochastic gradient descent, and repeat until reaching desired accuracy.

In [25]:
#training with backpropagation
def train_network(inputs, targets, weights, bias, epochs=1000, lr=0.01):
    for epoch in range(epochs):
        for x, target in zip(inputs, targets):
            #forward pass
            z = np.dot(weights, x) + bias
            output = sigmoid(z)

            #compute gradients and update weights
            error = output - target
            d_output_d_z = output * (1 - output)
            d_cost_d_z = error * d_output_d_z

            weights -= lr * d_cost_d_z * x
            bias -= lr * d_cost_d_z

    return weights, bias

inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
targets = np.array([0, 0, 0, 1])

#initial weights and bias
weights = np.random.rand(2) #random intialization
bias = 0.1

#train
trained_weights, trained_bias = train_network(inputs, targets, weights, bias)
print("trained weights:", trained_weights)
print("trained bias:", trained_bias)


Trained weights: [0.71819519 0.84419674]
Trained bias: -1.4574899292589012
