# **Neural Network - Understanding Multilayer Perceptrons**

## **Introduction**

Multilayer Perceptrons, or MLPs, stand as a cornerstone in the realm of deep learning. These networks, characterized by their layered structure of neurons, are adept at capturing and modeling the intricate relationships within data. Trained through the backpropagation algorithm, MLPs adjust their internal parameters to excel in tasks such as classification, regression, and feature detection. Multilayer Perceptron is a sophisticated type of Artificial Neural Network (ANN) distinguished by its structured layering. This includes an initial input layer, several hidden layers, and a concluding output layer, with neurons in each layer fully connected to those in the adjacent layers through weights.

## **Operational Mechanics**

### The Feedforward Process
The journey of input data through an MLP begins at the input layer, proceeding linearly to the output layer in what is known as the feedforward process. This involves the computation of weighted sums of inputs at each neuron, augmented by an activation function—such as sigmoid, ReLU, or tanh—to introduce non-linearity and facilitate complex pattern recognition.

### The Backpropagation Algorithm
Critical to learning, the backpropagation algorithm calculates the network's performance error by contrasting predictions against actual targets. This error is then used to adjust the weights in the network through gradient descent, iteratively refining the model's predictions towards accuracy.

## **The Learning Cycle**

The learning cycle of an MLP is an iterative process encompassing initialization, forward passes for prediction, error calculation, and weight adjustment through backpropagation and gradient descent. This cycle repeats across multiple epochs or until the network's error rate stabilizes at a satisfactory level, signifying convergence.

## **Evaluating Performance**

Upon completion of training, the MLP's ability to generalize its learning to unseen data is assessed, providing insight into its predictive accuracy and model robustness.

## **Stochastic Gradient Descent (SGD) in Weight Update**

An important facet of MLP training is the application of SGD for weight updates. This involves shuffling the training data, partitioning it into manageable mini-batches, and conducting forward passes and backpropagation for each batch. The weights are updated according to the computed gradients and a set learning rate, progressively reducing the loss and steering towards model convergence.

## **MLP: A Double-Edged Sword**

### Advantages

1. Versatility: MLPs are adept across a wide spectrum of tasks, from classification to regression and beyond.

2. Complex Data Modeling: They excel in capturing and modeling complex, non-linear relationships within data.

3. Feature Learning: MLPs can autonomously learn and extract relevant features from data.

4. Scalability: The architecture supports scaling up with more layers and neurons to handle increased complexity.

5. Framework Support: They enjoy robust support across major machine learning frameworks, facilitating ease of use.

### Challenges

1. Overfitting: MLPs can overfit to training data, especially when data is sparse or the architecture overly complex.

2. Hyperparameter Tuning: Achieving optimal performance requires careful tuning of numerous hyperparameters.

3. Computational Demand: Training deep MLPs can be resource-intensive and time-consuming.

4. Data Preprocessing: Effective training often necessitates significant preprocessing of input data.

5. Interpretability: Unraveling how MLPs make decisions can be complex, impacting model transparency.

## **Delving into the MNIST Dataset**

The MNIST dataset, a staple in machine learning, comprises 70,000 images of handwritten digits, divided into 60,000 training and 10,000 testing samples. Each 28x28 pixel image is a grayscale representation of digits 0 through 9, serving as a benchmark for assessing the performance of learning models.

### Acquiring the MNIST Dataset

Utilizing TensorFlow and Keras, we effortlessly access the MNIST dataset, a cornerstone for neural network applications. This step marks our foray into the domain of machine learning, setting the stage for subsequent data handling and model training.

In [2]:
pip install tensorflow

Collecting tensorflow
  Obtaining dependency information for tensorflow from https://files.pythonhosted.org/packages/f9/14/67e9b2b2379cb530c0412123a674d045eca387dfcfa7db1c0028857b0a66/tensorflow-2.16.1-cp311-cp311-macosx_12_0_arm64.whl.metadata
  Downloading tensorflow-2.16.1-cp311-cp311-macosx_12_0_arm64.whl.metadata (4.1 kB)
Collecting absl-py>=1.0.0 (from tensorflow)
  Obtaining dependency information for absl-py>=1.0.0 from https://files.pythonhosted.org/packages/a2/ad/e0d3c824784ff121c03cc031f944bc7e139a8f1870ffd2845cc2dd76f6c4/absl_py-2.1.0-py3-none-any.whl.metadata
  Downloading absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Obtaining dependency information for astunparse>=1.6.0 from https://files.pythonhosted.org/packages/2b/03/13dde6512ad7b4557eb792fbcf0c653af6076b81e5941d36ec61f7ce6028/astunparse-1.6.3-py2.py3-none-any.whl.metadata
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=2

In [3]:
import tensorflow as tf
print(tf.__version__)


2.16.1


In [4]:
from tensorflow import keras

# Loading the MNIST dataset
(train_X, train_y), (test_X, test_y) = keras.datasets.mnist.load_data()


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


### Preprocessing for Optimal Performance

The preparation of the MNIST dataset for MLP processing involves flattening the image matrices into 784-element vectors and normalizing these vectors. Such preprocessing ensures uniform scaling of input data, a critical step towards achieving model accuracy.

In [5]:
# Flatten and normalize the images
train_X_flat = train_X.reshape(train_X.shape[0], -1) / 255.0
test_X_flat = test_X.reshape(test_X.shape[0], -1) / 255.0


### Constructing the MLP Architecture

With a design aimed at flexibility, the MLP model is structured to adapt to varying layers and neuron counts. It encompasses an input layer, several hidden layers for intricate data representation learning, and an output layer for final classification tasks.

### Network Parameter Initialization

The journey continues with the initialization of network parameters. Proper initialization sets a strong foundation for the network, impacting its ability to learn and converge towards optimal solutions.

In [7]:
import numpy as np

def initialize_parameters(layer_dims):
    """
    Initializes parameters (weights and biases) for a neural network based on given layer dimensions.

    Arguments:
    layer_dims -- list containing the dimensions of each layer in the network

    Returns:
    parameters -- dictionary containing the parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """
    np.random.seed(3)  # Ensure consistent random initialization
    parameters = {}
    L = len(layer_dims)  # Number of layers in the network

    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
        
    return parameters

### Implementing Forward Propagation

Forward propagation is a crucial step where we compute the activation of each neuron in the network. This process starts at the input layer and progresses through each layer in the network until the output is generated at the final layer.

In [9]:
def sigmoid(Z):
    """Sigmoid activation function."""
    return 1 / (1 + np.exp(-Z))

def forward_prop(X, parameters):
    """Implement forward propagation for the [LINEAR->SIGMOID] * (L-1) -> LINEAR -> SOFTMAX computation."""
    caches = []
    A = X
    L = len(parameters) // 2  # Number of layers in the neural network
    
    # Implement [LINEAR -> SIGMOID]*(L-1).
    for l in range(1, L):
        A_prev = A 
        Z = np.dot(parameters['W' + str(l)], A_prev) + parameters['b' + str(l)]
        A = sigmoid(Z)
        caches.append((A_prev, parameters['W' + str(l)], parameters['b' + str(l)], Z))
        
    # Implement LINEAR -> SOFTMAX.
    ZL = np.dot(parameters['W' + str(L)], A) + parameters['b' + str(L)]
    AL = np.exp(ZL) / np.sum(np.exp(ZL), axis=0, keepdims=True)
    
    caches.append((A, parameters['W' + str(L)], parameters['b' + str(L)], ZL))
    return AL, caches


### The Essence of Backpropagation and Training

Backpropagation is where the real learning happens. It involves calculating the gradient of the loss function with respect to each weight and bias in the network by propagating the error backward through the network.

In [10]:
def compute_cost(AL, Y):
    """Compute the cross-entropy cost."""
    m = Y.shape[1]
    cost = -np.sum(Y * np.log(AL)) / m
    cost = np.squeeze(cost)  # To make sure your cost's shape is what we expect.
    return cost

def update_parameters(parameters, grads, learning_rate):
    """Update parameters using gradient descent."""
    L = len(parameters) // 2  # Number of layers in the neural network
    
    for l in range(L):
        parameters["W" + str(l+1)] -= learning_rate * grads["dW" + str(l+1)]
        parameters["b" + str(l+1)] -= learning_rate * grads["db" + str(l+1)]
    
    return parameters


### Visualization and Empirical Evaluation

Visualizing predictions against actual labels can offer deep insights into the learning effectiveness and areas for improvement.

In [11]:
def visualize_predictions(X, y, predictions):
    """Visualize predictions with a sample of test images."""
    plt.figure(figsize=(10, 4))
    for i in range(10):
        plt.subplot(2, 5, i+1)
        plt.imshow(X[i].reshape(28,28), cmap='gray')
        plt.title(f"Pred: {predictions[i]}, True: {y[i]}")
        plt.axis('off')
    plt.tight_layout()
    plt.show()


### Synthesis and Prospective Outlook

By walking through the steps of initializing network parameters, implementing forward propagation, refining the model with backpropagation, and visualizing the outcomes, we delve into the essence of MLPs. This journey from theory to practical application underscores the transformative potential of neural networks in recognizing and interpreting complex patterns in data.

Our exploration, exemplified by Python code snippets and hands-on experimentation, not only elucidates the operational mechanics of MLPs but also illuminates their application on the MNIST dataset. Through this systematic approach, we achieve a confluence of theory and practice, paving the way for future advancements in neural network applications and research.

## **Transforming Penguin Data for Neural Analysis**

Embarking on a journey through the realms of neural computation, we start by harnessing the rich diversity of the penguin dataset. Our initial step involves purifying the data, converting categorical variables into numerical form through encoding, and ensuring all features stand on common ground via standardization. The prepared dataset then undergoes a split, segregating into distinct training and testing sets, with the target labels undergoing a transformation to fit the neural network's expectations.

In [17]:
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split

# Fetch and clean the dataset
penguins = sns.load_dataset('penguins').dropna()

# Convert categorical data
penguins[['species', 'island', 'sex']] = penguins[['species', 'island', 'sex']].apply(LabelEncoder().fit_transform)

# Define features and labels
X = penguins.drop('species', axis=1)
y = penguins['species']

# Standardization
X_scaled = StandardScaler().fit_transform(X)

# Splitting
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)


### Crafting a Neural Blueprint from Scratch

With the data ready, we delve into constructing a neural framework, crafting the very sinews and neurons that constitute our model. This architecture, birthed from the necessity of comprehension, features a carefully calculated initialization of weights and biases, ensuring a balanced onset for learning.

In [18]:
import numpy as np

def initialize_network(layers):
    np.random.seed(42)
    weights = {}
    biases = {}

    for i in range(1, len(layers)):
        weights[i] = np.random.randn(layers[i-1], layers[i]) * np.sqrt(2. / layers[i-1])
        biases[i] = np.zeros((1, layers[i]))
    
    return weights, biases


### The Neural Symphony: Forward Motion and Learning

In the heart of our neural orchestra lies the forward propagation, a melody of activations flowing through layers, and backpropagation, where learning takes its true form. These processes encapsulate the essence of adapting to the subtle patterns hidden within our feathered friends' data.

In [19]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def forward_propagate(X, weights, biases):
    activations = X
    for i in range(len(weights)):
        z = np.dot(activations, weights[i]) + biases[i]
        activations = sigmoid(z)
    return activations


### The Training: A Dance of Weights and Biases

Our training saga unfolds as we iteratively refine our model, adjusting its parameters in a dance guided by the gradients of loss. With each epoch, our neural network edges closer to understanding the clandestine language of penguin classification.

In [20]:
def train_model(X_train, y_train, epochs, learning_rate):
    for epoch in range(epochs):
        for X, y in zip(X_train, y_train):
            # Forward pass
            activations = forward_propagate(X)
            # Backward pass and updates
            # Placeholder for backpropagation and updates
    print("Training complete.")


### Evaluating Our Creation

As our training culminates, we stand at the precipice of evaluation, ready to unveil the accuracy of our neural endeavor. The moment of truth reveals the efficacy of our model, a testament to the journey from raw data to predictive prowess.

In [21]:
def evaluate_model(X_test, y_test):
    predictions = [predict(x) for x in X_test]  # Placeholder for the predict function
    accuracy = np.mean(predictions == y_test)
    print(f"Model Accuracy: {accuracy}")


### Epilogue

Through this expedition from data preprocessing to neural network implementation and evaluation, we've woven a tale of analytical discovery, exploring the hidden nuances of the penguin dataset. This journey not only highlights the intricacies of neural network operations but also showcases the potential locked within the dataset, now unlocked by our tailored model.