https://www.kaggle.com/code/soham1024/basic-neural-network-from-scratch-in-python

In [1]:
import numpy as np
import math
import matplotlib.pyplot as plt
import seaborn as sns
from pylab import rcParams

from mathutils import *

%matplotlib inline

sns.set(style='whitegrid')

rcParams['figure.figsize'] = 12, 6

RANDOM_SEED = 42

np.random.seed(RANDOM_SEED)

In [2]:
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()
softmax(np.array([[2, 4, 6, 8]]))

array([[0.00214401, 0.0158422 , 0.11705891, 0.86495488]])

In [3]:
epochs = 60000           # Number of iterations
inputLayerSize, hiddenLayerSize, outputLayerSize = 2, 3, 1
LR = 0.1                 # learning rate

In [4]:
#Our data
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([ [0],   [1],   [1],   [0]])

In [5]:
# Initialize the weights of our NN to random numbers :

w_hidden = np.random.uniform(size=(inputLayerSize, hiddenLayerSize))
w_output = np.random.uniform(size=(hiddenLayerSize,outputLayerSize))

In [6]:
# Implementation of the Backprop algorithm

def sigmoid (x): return 1/(1 + np.exp(-x))           # activation function
def sigmoid_prime(x): return x * (1 - x)             # derivative of sigmoid

In [7]:
for epoch in range(epochs):
 
    # Forward
    act_hidden = sigmoid(np.dot(X, w_hidden))
    output = np.dot(act_hidden, w_output)
    
    # Calculate error
    error = y - output
    
    if epoch % 5000 == 0:
        print(f'error sum {sum(error)}')

    # Backward
    dZ = error * LR
    w_output += act_hidden.T.dot(dZ)
    dH = dZ.dot(w_output.T) * sigmoid_prime(act_hidden)
    w_hidden += X.T.dot(dH) 

error sum [-1.77496016]
error sum [0.02481767]
error sum [0.01544889]
error sum [0.01214582]
error sum [0.01032507]
error sum [0.00913186]
error sum [0.00827274]
error sum [0.00761624]
error sum [0.00709348]
error sum [0.00666447]
error sum [0.00630417]
error sum [0.00599601]


In [8]:
X_test = X[1] # [0, 1]

act_hidden = sigmoid(np.dot(X_test, w_hidden))
np.round(np.dot(act_hidden, w_output))

array([1.])

### Feedforward Neural Network

Feedforward Neural Network (FNN) is a type of artificial neural network in which information flows in a single direction i.e from the input layer through hidden layers to the output layer without loops or feedback. It is mainly used for pattern recognition tasks like image and speech classification.

For example in a credit scoring system, banks use an FNN which analyze users financial profiles such as income, credit history and spending habits to determine their creditworthiness.

**Theory:**
- https://www.geeksforgeeks.org/deep-learning/feedforward-neural-network/
- https://www.geeksforgeeks.org/deep-learning/deep-learning-tutorial/

#### Structure of a Feedforward Neural Network
Feedforward Neural Networks have a structured layered design where data flows sequentially through each layer.

1. **Input Layer:** The input layer consists of neurons that receive the input data. Each neuron in the input layer represents a feature of the input data.
2. **Hidden Layers:** One or more hidden layers are placed between the input and output layers. These layers are responsible for learning the complex patterns in the data. Each neuron in a hidden layer applies a weighted sum of inputs followed by a non-linear activation function.
3. **Output Layer:** The output layer provides the final output of the network. The number of neurons in this layer corresponds to the number of classes in a classification problem or the number of outputs in a regression problem.


**Activation Functions**
Activation functions introduce non-linearity into the network enabling it to learn and model complex data patterns.

Common activation functions include:
- **Sigmoid:** $  \sigma(x) = \frac{1}{1 + e^{-x}}  $
- **Tanh:** $  \tanh(x) = \frac{e^{x}-e^{-x}}{e^{x} + e^{-x}}  $
- **ReLu:** ReLu(x)=max(0,x)


#### Training a Feedforward Neural Network
Training a Feedforward Neural Network involves adjusting the weights of the neurons to minimize the error between the predicted output and the actual output. This process is typically performed using backpropagation and gradient descent.

1. **Forward Propagation:** During forward propagation the input data passes through the network and the output is calculated.
2. **Loss Calculation:** The loss (or error) is calculated using a loss function such as Mean Squared Error (MSE) for regression tasks or Cross-Entropy Loss for classification tasks.
3. **Backpropagation:** In backpropagation the error is propagated back through the network to update the weights. The gradient of the loss function with respect to each weight is calculated and the weights are adjusted using gradient descent.

**Note:** During training, a feedforward neural network performs forward pass followed by backpropagation to update weights, while during prediction only the forward pass is used. 1

#### Gradient Descent

Gradient Descent is an optimization algorithm used to minimize the loss function by iteratively updating the weights in the direction of the negative gradient. Common variants of gradient descent include:

- **Batch Gradient Descent:** Updates weights after computing the gradient over the entire dataset.
- **Stochastic Gradient Descent (SGD):** Updates weights for each training example individually.
- **Mini-batch Gradient Descent:** It Updates weights after computing the gradient over a small batch of training examples.

#### Evaluation of Feedforward neural network
Evaluating the performance of the trained model involves several metrics:

- **Accuracy:** The proportion of correctly classified instances out of the total instances.
- **Precision:** The ratio of true positive predictions to the total predicted positives.
- **Recall:** The ratio of true positive predictions to the actual positives.
- **F1 Score:** The harmonic mean of precision and recall, providing a balance between the two.
- ***Confusion Matrix:** A table used to describe the performance of a classification model, showing the true positives, true negatives, false positives and false negatives.

In [3]:
"""
    This code demonstrates the process of building, training and evaluating a neural network model using TensorFlow and Keras to classify handwritten digits from the MNIST dataset.
    The model architecture is defined using the Sequential consisting of:
        - a Flatten layer to convert the 2D image input into a 1D array
        - a Dense layer with 128 neurons and ReLU activation
        - a final Dense layer with 10 neurons and softmax activation to output probabilities for each digit class.

    Model is compiled with
        - Adam optimizer
        - Sparse Categorical Crossentropy loss function
        - Sparse Categorical Accuracy metric
        - Then trained for 5 epochs on the training data
"""


import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.metrics import SparseCategoricalAccuracy

# Load and prepare the MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build the model
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer=Adam(),
              loss=SparseCategoricalCrossentropy(),
              metrics=[SparseCategoricalAccuracy()])

# Train the model
model.fit(x_train, y_train, epochs=5)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'\nTest accuracy: {test_acc}')

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

Test accuracy: 0.9779000282287598


In [3]:
# 3.4: Micro-lecture - Hyper Parameters

import tensorflow as tf
import numpy as np

In [None]:
# Prepare the MNIST dataset
batch_size = 500
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))

In [5]:
# Reserver 10000 samples for validation
x_val = x_train[-10000:]
y_val = y_train[-10000:]
x_train = x_train[:-10000]
y_train = y_train[:-10000]

In [6]:
# Prepared the training dataset as tf.data.Dataset objects
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

2026-02-09 20:56:06.538442: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2026-02-09 20:56:06.583527: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2026-02-09 20:56:06.583893: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-

In [10]:
# prepare the validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_dataset = val_dataset.batch(batch_size)

print(train_dataset.take(1))

<_TakeDataset element_spec=(TensorSpec(shape=(None, 784), dtype=tf.uint8, name=None), TensorSpec(shape=(None,), dtype=tf.uint8, name=None))>
