<a href="https://colab.research.google.com/github/hvbhanot/FNN/blob/main/FNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Feedforward Neural Network (FNN) Math Overview

This note provides the mathematical foundation required to understand the feedforward neural network (FNN) implemented in my code. I used the following article as a reference to design and build this FNN:

[Neural Networks from Scratch](https://developer.ibm.com/articles/neural-networks-from-scratch/)

## Key Mathematical Components

### 1. Rectified Linear Unit (ReLU)
The **ReLU** activation function introduces non-linearity into the model, allowing it to learn complex patterns. It is defined as:

$$
\text{ReLU}(x) = \max(0, x)
$$

**Purpose:**
- **Non-linearity:** Enables the network to capture complex relationships.
- **Gradient Flow:** Helps mitigate the vanishing gradient problem by allowing gradients to pass through for positive input values.

### 2. Softmax Function
For an input vector
$$
\mathbf{z} = [z_1, z_2, \dots, z_n] \in \mathbb{R}^n,
$$
the **softmax** function converts the raw scores (logits) into a probability distribution over the output classes. It is defined as:

$$
\text{softmax}(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^n e^{z_j}} \quad \text{for } i = 1, 2, \dots, n.
$$

**Purpose:**
- **Probability Distribution:** Transforms outputs into probabilities that sum to 1.
- **Classification:** Commonly used in the output layer for multi-class classification tasks.

### 3. Additional Concepts
- **Weighted Sums:** Each neuron computes a weighted sum of its inputs.
- **Bias Addition:** A bias term is added to the weighted sum for flexibility.
- **Backpropagation & Gradient Descent:** These techniques optimize the network parameters by minimizing the loss between the predictions and the actual values.

---

This summary encapsulates the key mathematical concepts behind my FNN. The referenced article was instrumental in guiding the design and implementation of this network.


In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


In [None]:
from sklearn.datasets import load_iris

iris = load_iris()

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

df['species'] = iris.target

df['species_name'] = df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

df = df.sample(frac=1).reset_index(drop=True)

df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species,species_name
0,4.8,3.1,1.6,0.2,0,setosa
1,5.8,4.0,1.2,0.2,0,setosa
2,6.4,2.8,5.6,2.2,2,virginica
3,4.4,2.9,1.4,0.2,0,setosa
4,6.1,3.0,4.9,1.8,2,virginica
...,...,...,...,...,...,...
145,6.1,2.8,4.7,1.2,1,versicolor
146,6.7,2.5,5.8,1.8,2,virginica
147,5.0,3.0,1.6,0.2,0,setosa
148,5.0,3.5,1.3,0.3,0,setosa


In [None]:
df = df.drop('species_name', axis=1)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   species            150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB


In [None]:
data = np.array(df)
m,n = data.shape
data

train_data = data[:100].T
test_data = data[100:150].T

Y_train = train_data[-1].astype(float)
X_train = train_data[1:n].astype(float)

Y_test = test_data[-1].astype(float)
X_test = test_data[1:n].astype(float)

X_train.shape

(4, 100)

In [272]:
def init_params():
    W1 = np.random.rand(10, 4)
    b1 = np.random.rand(10, 1)
    W2 = np.random.rand(10, 10)
    b2 = np.random.rand(10, 1)
    W3 = np.random.rand(3, 10)
    b3 = np.random.rand(3, 1)
    return W1, b1, W2, b2, W3, b3

def ReLU(Z):
    return np.maximum(Z, 0)

def softmax(Z):
    A = np.exp(Z) / sum(np.exp(Z))
    return A

def forward_prop(W1, b1, W2, b2, W3, b3, X):
    Z1 = np.dot(W1, X) + b1
    A1 = ReLU(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = ReLU(Z2)
    Z3 = np.dot(W3, A2) + b3
    A3 = softmax(Z3)
    return Z1, A1, Z2, A2, Z3, A3

def one_hot(Y, num_classes=3):
    Y = Y.astype(int)
    one_hot_Y = np.zeros((num_classes, Y.shape[0]), dtype=float)
    one_hot_Y[Y, np.arange(Y.shape[0])] = 1.0
    return one_hot_Y

def ReLU_deriv(Z):
    return (Z > 0).astype(float)


def back_prop(Z1, A1, Z2, A2, Z3, A3, W1, W2, W3, X, Y):
    m = Y.size
    one_hot_Y = one_hot(Y)

    # dZ3 for output layer
    dZ3 = A3 - one_hot_Y
    dW3 = 1 / m * np.dot(dZ3, A2.T)
    db3 = 1 / m * np.sum(dZ3, axis=1, keepdims=True)

    # dZ2 for hidden layer 2
    dZ2 = np.dot(W3.T, dZ3) * ReLU_deriv(Z2)
    dW2 = 1 / m * np.dot(dZ2, A1.T)
    db2 = 1 / m * np.sum(dZ2, axis=1, keepdims=True)

    # dZ1 for hidden layer 1
    dZ1 = np.dot(W2.T, dZ2) * ReLU_deriv(Z1)
    dW1 = 1 / m * np.dot(dZ1, X.T)
    db1 = 1 / m * np.sum(dZ1, axis=1, keepdims=True)

    return dW1, db1, dW2, db2, dW3, db3

def update_params(W1, b1, W2, b2, W3, b3, dW1, db1, dW2, db2, dW3, db3, alpha):
    W1 = W1 - alpha * dW1
    b1 = b1 - alpha * db1
    W2 = W2 - alpha * dW2
    b2 = b2 - alpha * db2
    W3 = W3 - alpha * dW3
    b3 = b3 - alpha * db3
    return W1, b1, W2, b2, W3, b3

In [273]:
def get_predictions(A3):
    return np.argmax(A3, axis=0)

def get_accuracy(predictions, Y):
    return np.mean(predictions == Y) * 100

def gradient_descent(X, Y, alpha, iterations):
    W1, b1, W2, b2, W3, b3 = init_params()

    for i in range(iterations):
        np.random.seed(1)
        Z1, A1, Z2, A2, Z3, A3 = forward_prop(W1, b1, W2, b2, W3, b3, X)

        dW1, db1, dW2, db2, dW3, db3 = back_prop(Z1, A1, Z2, A2, Z3, A3, W1, W2, W3, X, Y)

        W1, b1, W2, b2, W3, b3 = update_params(W1, b1, W2, b2, W3, b3, dW1, db1, dW2, db2, dW3, db3, alpha)

        if i % 10 == 0:
            predictions = get_predictions(A3)
            acc = get_accuracy(predictions, Y)
            print(f"Iteration: {i} - Accuracy: {acc:.2f}%")

    return W1, b1, W2, b2, W3, b3

In [274]:
np.random.seed(1)
W1,b1,W2,b2,W3,b3 = gradient_descent(X_train,Y_train,0.01,200)

Iteration: 0 - Accuracy: 30.00%
Iteration: 10 - Accuracy: 30.00%
Iteration: 20 - Accuracy: 30.00%
Iteration: 30 - Accuracy: 33.00%
Iteration: 40 - Accuracy: 63.00%
Iteration: 50 - Accuracy: 33.00%
Iteration: 60 - Accuracy: 63.00%
Iteration: 70 - Accuracy: 62.00%
Iteration: 80 - Accuracy: 63.00%
Iteration: 90 - Accuracy: 91.00%
Iteration: 100 - Accuracy: 63.00%
Iteration: 110 - Accuracy: 70.00%
Iteration: 120 - Accuracy: 63.00%
Iteration: 130 - Accuracy: 64.00%
Iteration: 140 - Accuracy: 63.00%
Iteration: 150 - Accuracy: 64.00%
Iteration: 160 - Accuracy: 65.00%
Iteration: 170 - Accuracy: 69.00%
Iteration: 180 - Accuracy: 79.00%
Iteration: 190 - Accuracy: 85.00%


In [275]:
def make_predictions(X, W1, b1, W2, b2, W3, b3):
    np.random.seed(1)
    _,_,_,_,_,A3 = forward_prop(W1,b1,W2,b2,W3,b3,X)
    predictions = get_predictions(A3)
    return predictions

predictions = make_predictions(X_test, W1, b1, W2, b2, W3, b3)
get_accuracy(predictions, Y_test)

90.0

In [None]:
predictions

array([1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 2, 2, 0, 1, 1, 2,
       1, 0, 1, 1, 2, 0, 0, 1, 0, 0, 1, 0, 2, 0, 0, 0, 1, 0, 1, 1, 1, 2,
       0, 1, 2, 0, 0, 1])