<a href="https://colab.research.google.com/github/sidbhaumik/sidb_datascience_projects.github.io/blob/main/deeplearning_techniques.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# We will implement three different Neural Network Architecture:
## - Feedforward Neural Networks (FNNs),
## - Convolutional Neural Networks (CNNs),
## - Recurrent Neural Networks (RNNs).

# Step 1: Set up the environment
Before starting, ensure that we have TensorFlow installed. We can install it using the following command:

In [1]:
pip install tensorflow



Import the necessary libraries to build and train the neural networks:

In [2]:
import tensorflow as tf
from tensorflow.keras import layers, models

# Step 2: Implement a feedforward neural network (FNN)

We built a simple FNN to classify the **Iris flower** dataset.

## Step 2a: Load and prepare the data
We started by loading the Iris dataset, one-hot encoding the target labels, and splitting the data into training and testing sets.

In [3]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import numpy as np # Import numpy

# Load IRIS Flower dataset
iris = load_iris()
X = iris.data
y = iris.target.reshape(-1, 1)

**X** contains measurements of flowers (sepal length, petal width, etc.)

**y** contains class labels: 0 = Setosa, 1 = Versicolor, 2 = Virginica

**.reshape(-1, 1)** converts a 1D array into a 2D column vector, which is needed for encoding.

In [4]:
# One-hot encode labels
encoder = OneHotEncoder()
y = encoder.fit_transform(y)

Transforms **y** from [0], [1], [2] → into one-hot format:

0 → [1, 0, 0]

1 → [0, 1, 0]

2 → [0, 0, 1]

**fit_transform()** both learns the encoding and applies it.

In [5]:
# Convert the sparse matrix to a dense NumPy array
y = y.toarray()

The result from OneHotEncoder is a sparse matrix (efficient format for storage).

**.toarray()** converts it into a regular (dense) NumPy array so we can use it easily with libraries like TensorFlow/Keras.

In [6]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Splits the dataset into:

80% training data

20% testing data

**random_state=42** ensures reproducibility (this gives same split every time).

## Step 2b: Build the FNN
A simple FNN architecture was created with two hidden layers. ReLU activation functions were applied to introduce non-linearity.

In [7]:
# Build the FNN model
model_fnn = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dense(32, activation='relu'),
    layers.Dense(3, activation='softmax')  # 3 output classes
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### What this does?
- models.Sequential :
  Creates a Sequential model — a linear stack of layers (input → hidden layers → output).
- layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)) :

   Adds the first hidden layer:

  - Dense(64) → a fully connected layer with 64 neurons.

  - activation='relu' → uses the ReLU activation function, introducing non-linearity.

  - input_shape=(X_train.shape[1],) → sets the input size to match the number of features (columns) in your training data (X_train).

- layers.Dense(32, activation='relu') : Adds a second hidden layer with 32 neurons and ReLU activation.

- layers.Dense(3, activation='softmax') :

  Adds the output layer:

  - Dense(3) → 3 neurons for 3 output classes (multiclass classification).

  - activation='softmax' → converts output scores into probabilities (values between 0 and 1 that sum to 1).

## Step 2c: Compile and train the model
The model was compiled with the Adam optimizer and categorical crossentropy loss, then trained for 20 epochs.

In [8]:
# Compile the model
model_fnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

### What this does?

- optimizer='adam'

  - Uses the Adam optimizer, a popular algorithm that adjusts learning rates automatically during training.

  - It’s fast and works well in most deep learning problems.

- loss='categorical_crossentropy'

  - This is the loss function used for multiclass classification problems when your labels are one-hot encoded (i.e., [1,0,0], [0,1,0], etc.).

   It measures how far off the predicted probabilities are from the actual class.

- metrics=['accuracy']

  - Tracks accuracy as the evaluation metric during training and validation.

In [9]:
# Train the Model
model_fnn.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/20
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 64ms/step - accuracy: 0.3748 - loss: 1.4120 - val_accuracy: 0.4000 - val_loss: 1.1287
Epoch 2/20
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - accuracy: 0.5854 - loss: 1.1563 - val_accuracy: 0.7000 - val_loss: 0.9759
Epoch 3/20
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step - accuracy: 0.6702 - loss: 0.9833 - val_accuracy: 0.6333 - val_loss: 0.9676
Epoch 4/20
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.6846 - loss: 0.9424 - val_accuracy: 0.4000 - val_loss: 0.9608
Epoch 5/20
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step - accuracy: 0.5656 - loss: 0.9204 - val_accuracy: 0.6333 - val_loss: 0.9052
Epoch 6/20
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - accuracy: 0.6979 - loss: 0.8866 - val_accuracy: 0.8333 - val_loss: 0.8392
Epoch 7/20
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x7b36bf914550>

## What this does?

- X_train, y_train

  - Your training data: features and labels.

- epochs=20

  - The model will go through the entire training dataset 20 times.

    More epochs = longer training, but possibly better accuracy (up to a point).

- batch_size=32

  - The training data is split into mini-batches of 32 samples at a time before updating the model.

     Helps with efficiency and model convergence.

- validation_data=(X_test, y_test)

  - After each epoch, the model is evaluated on test (validation) data.

     Helps track performance on unseen data to detect overfitting.

So, in simple words, we are telling the model that Use the Adam optimizer and measure accuracy. Train for 20 epochs, 32 samples at a time. After each epoch, check how well its doing on the test set using accuracy.

## Step 2d: Model Evaluation

In [10]:
loss, accuracy = model_fnn.evaluate(X_test, y_test)
print(f'Test Accuracy: {accuracy}')

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - accuracy: 0.9333 - loss: 0.4980
Test Accuracy: 0.9333333373069763


## What this does?

 model_fnn.evaluate(X_test, y_test):
This command evaluates the trained model on the test dataset.

It runs the model on X_test (features) and compares the predictions with y_test (true labels).

It returns two values:

1. loss: How far the predictions are from the true values (measured using the loss function defined earlier: categorical_crossentropy)

2. accuracy: The proportion of correct predictions out of total samples (since we specified metrics=['accuracy'] earlier)

## Conclusion:
The model correctly predicted 93.3% of the test samples. This shows FNNs are well suited for simple classification tasks.

# STEP 3: Implementing a convolutional neural network (CNN)

We used a CNN to classify images from the **CIFAR-10 dataset**.

## Step 3a: Load and preprocess the data
Load the CIFAR-10 data.

In [11]:
# Load CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step


**tf.keras.datasets.cifar10** is a built-in dataset in TensorFlow.

It loads 60,000 color images, each of size 32×32 pixels, across 10 classes.

Splits the data into:

- train_images (50,000 images for training)

- test_images (10,000 images for testing)


Each image has a label like:

0 = airplane

1 = automobile

2 = bird

till 9

Normalize the images to have pixel values between zero and one to facilitate efficient training.

In [12]:
# Normalize the data
train_images, test_images = train_images / 255.0, test_images / 255.0

Pixel values in images range from 0 to 255 (as integers).

Dividing by 255.0 scales all values to range between 0 and 1, which:

 - Improves training performance and model stability

 - Helps the neural network converge faster.

## Step 3b: Build the CNN
The CNN consisted of two convolutional layers followed by max-pooling layers, which reduce the spatial dimensions of the data.

In [13]:
# Build the CNN model
model_cnn = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')  # 10 output classes
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## What this does?
 - model_cnn = models.Sequential :
  This creates a Sequential model, meaning layers are stacked one after another in a straight line.
 - layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)) :

   - Conv2D(32, (3, 3)): Applies 32 filters, each of size 3×3.

   - activation='relu': Adds non-linearity to learn complex patterns.

   - input_shape=(32, 32, 3): Input images are 32x32 pixels with 3 channels (RGB).

 Purpose: Detect basic features like edges or color patterns.

- layers.MaxPooling2D((2, 2)) :
    - First pooling layer. Downsamples the feature maps using max pooling with a 2×2 window.
    - This reduces spatial size by half (from 32x32 to 16x16), decreasing computation and helping prevent overfitting.
- layers.Conv2D(64, (3, 3), activation='relu') :
  - Second Convolutional Layer. Adds 64 filters of size 3×3.

  - Detects more complex patterns in the image (e.g., textures, shapes).
- layers.MaxPooling2D((2, 2)) :
  - Second Pooling layer. Downsamples again to reduce the feature map size further (now likely 8×8 if padding is default).
- layers.Flatten() :
  - Converts the 2D feature maps into a 1D vector so it can be passed into dense (fully connected) layers.

 For example, a 64×8×8 tensor becomes a 4096-element vector.
- layers.Dense(64, activation='relu') :
    - Adds a fully connected layer with 64 neurons and ReLU activation.

  Learns complex combinations of features for classification.

- layers.Dense(10, activation='softmax') :
  - Final layer with 10 neurons, one for each class in CIFAR-10.

  - Softmax converts raw scores into probabilities that sum to 1.

## Step 3c: Compile and train the model
The model was compiled with sparse categorical crossentropy (suitable for integer labels) and trained for ten epochs.

In [15]:
# Compile the model
model_cnn.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

## What this does?

- optimizer='adam'

  Uses the Adam optimizer, a popular and efficient choice for training neural networks.

  It automatically adjusts learning rates during training for better convergence.

- loss='sparse_categorical_crossentropy'

 This is used for multiclass classification when your target labels are integers (like 0, 1, 2, ..., 9).

  Unlike categorical_crossentropy, this does not require one-hot encoding of the labels.

  Example:

    2 → means "class 2" (instead of [0, 0, 1, 0, ..., 0])

- metrics=['accuracy']

  Tracks accuracy during training and validation to evaluate how well the model is performing.

In [16]:
#train the model
model_cnn.fit(train_images, train_labels, epochs=10, batch_size=64, validation_data=(test_images, test_labels))

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 44ms/step - accuracy: 0.3462 - loss: 1.7839 - val_accuracy: 0.5434 - val_loss: 1.2950
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 45ms/step - accuracy: 0.5650 - loss: 1.2366 - val_accuracy: 0.6105 - val_loss: 1.1107
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 47ms/step - accuracy: 0.6322 - loss: 1.0613 - val_accuracy: 0.6299 - val_loss: 1.0617
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 45ms/step - accuracy: 0.6573 - loss: 0.9863 - val_accuracy: 0.6513 - val_loss: 1.0080
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 46ms/step - accuracy: 0.6829 - loss: 0.9241 - val_accuracy: 0.6606 - val_loss: 0.9827
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 49ms/step - accuracy: 0.7020 - loss: 0.8679 - val_accuracy: 0.6677 - val_loss: 0.9648
Epoch 7/10
[1m7

<keras.src.callbacks.history.History at 0x7b36bc628d50>

## What this does?

- train_images, train_labels

  From training dataset (input images and their labels).

- epochs=10

  The model will pass through the entire training dataset 10 times.

- batch_size=64

  The training data is split into mini-batches of 64 images each.

  Each batch is processed before updating the model's weights.

- validation_data=(test_images, test_labels)

  After each epoch, the model is evaluated on the test set.

  This gives insight into how well the model performs on unseen data (to monitor overfitting).

## Step 3d: Model Evaluation
After training, the CNN should achieve accuracy between 70–80 percent on the test data, as CIFAR-10 is a more challenging dataset.

In [19]:
loss, accuracy = model_cnn.evaluate(test_images, test_labels)
print(f'Test Accuracy: {accuracy}')

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 8ms/step - accuracy: 0.6967 - loss: 0.8983
Test Accuracy: 0.6927000284194946


## What this does?

This runs the model on the test dataset.

It returns:

- loss: The model's loss value (error) on the test set, using the loss function defined during compile() — in this case, sparse categorical crossentropy.

- accuracy: The percentage of correctly classified images in the test set.

## Conclusion:
After training, the CNN achieve's accuracy of around 70 percent on the test data, as CIFAR-10 is a more challenging dataset.

# Step4: Implementing a recurrent neural network

We built an RNN to predict the next value in a sine wave sequence, a classic example of time-series prediction.

## Step 4a: Create the data
A synthetic **sine wave dataset** was created and split into sequences for training and testing.

In [20]:
import numpy as np

# Generate synthetic sine wave data
t = np.linspace(0, 100, 10000)
X = np.sin(t).reshape(-1, 1)

# Prepare sequences
def create_sequences(data, seq_length):
    X_seq, y_seq = [], []
    for i in range(len(data) - seq_length):
        X_seq.append(data[i:i+seq_length])
        y_seq.append(data[i+seq_length])
    return np.array(X_seq), np.array(y_seq)

seq_length = 100
X_seq, y_seq = create_sequences(X, seq_length)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=0.2, random_state=42)

## What this does?

- Generate Synthetic Sine Wave Data
  - np.linspace(0, 100, 10000)

    Generates 10,000 evenly spaced time steps from 0 to 100.

    Think of this as our "time axis".

  - np.sin(t)

    Calculates the sine of each time value in t, creating a smooth wave.

  - .reshape(-1, 1)

    Converts the 1D array into a 2D column vector so it can be fed into a neural network later.

- Create Sequences for RNN Input:
  
  Turns the sine wave into sliding windows (sequences) of length seq_length (e.g., 100).

  For each sequence:

  X_seq = the first seq_length values (input)

  y_seq = the next value (target/prediction)


    Example:
    If data = [0.1, 0.2, 0.3, ..., 0.101] and seq_length = 3
    Then:
    Input X_seq[0] = [0.1, 0.2, 0.3]
    Target y_seq[0] = 0.4 (next value after the sequence)

  So, its creating sequences of 100 time steps, and the model will predict the 101st value.

 - Train-Test Split:
   
   Splits the sequence data into:

    80% training

    20% testing

    random_state=42 ensures reproducibility of the split.

So, in summary, We created a long sine wave and chopped it into many overlapping sequences of 100 time steps.

Each sequence is used to predict the next value in the time series.

This prepares our data for training an RNN or LSTM model, which is well-suited for sequence data.

## Step 4b: Build the RNN
A simple RNN architecture was implemented with one recurrent layer and a single output neuron for predicting the next value in the sequence.

In [21]:
# Build the RNN model
model_rnn = models.Sequential([
    layers.SimpleRNN(128, input_shape=(seq_length, 1)),
    layers.Dense(1)  # Single output for next value prediction
])

  super().__init__(**kwargs)


## What this does?
It takes in a sequence of 100 sine wave values (as input) and predicts the next value in the sequence (as output).

- models.Sequential([...]) :
  Creates a Sequential model, meaning layers are stacked one after another in a linear fashion.

- layers.SimpleRNN(128, input_shape=(seq_length, 1)) :
  This is a Simple RNN layer with 128 units (neurons).

- input_shape=(seq_length, 1) means:

  - seq_length = 100 → the number of time steps per input sequence

  - 1 → each time step has a single value (sine wave point)

  This layer processes the entire sequence and outputs a single vector representing the temporal features it learned.

- layers.Dense(1) :
  A fully connected output layer with 1 neuron.

  Outputs a single numeric value — the predicted next value in the sine wave sequence.

    
    Example Flow:
    1. Input shape: (batch_size, 100, 1) — 100 time steps per sequence
    2. RNN learns temporal patterns (like wave cycles)
    3. Dense layer outputs 1 value — the next predicted point

## Step 4c: Compile and train the model
The model was compiled using the mean squared error (MSE) loss function and trained for ten epochs.

In [22]:
# Compile the model
model_rnn.compile(optimizer='adam', loss='mse')

## What this does?

- optimizer='adam' :
  Uses the Adam optimizer, one of the most popular and effective optimizers in deep learning.It automatically adjusts learning rates during training.Combines benefits of momentum and RMSProp.
  
  Works well for nonlinear data like sine waves and converges faster than basic optimizers like SGD.

- loss='mse' (Mean Squared Error):

  This is the loss function, which measures how far the model’s predictions are from the true values.

  Lower MSE = better accuracy.

  MSE is commonly used for regression tasks — like predicting the next numeric value in a sequence.



In [23]:
#train the model
model_rnn.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/10
[1m248/248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 22ms/step - loss: 0.0239 - val_loss: 2.3342e-05
Epoch 2/10
[1m248/248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 20ms/step - loss: 1.6284e-05 - val_loss: 3.5334e-06
Epoch 3/10
[1m248/248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 20ms/step - loss: 4.6336e-06 - val_loss: 5.9749e-06
Epoch 4/10
[1m248/248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 19ms/step - loss: 8.9488e-06 - val_loss: 2.3022e-06
Epoch 5/10
[1m248/248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 19ms/step - loss: 3.4459e-06 - val_loss: 2.7542e-06
Epoch 6/10
[1m248/248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 19ms/step - loss: 6.5751e-06 - val_loss: 3.4764e-06
Epoch 7/10
[1m248/248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 18ms/step - loss: 3.6632e-06 - val_loss: 7.3360e-06
Epoch 8/10
[1m248/248[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 20ms/step - loss: 6.2620

<keras.src.callbacks.history.History at 0x7b36b5ebc750>

## What this does?

- model_rnn.fit(...) :
  This is the training function for our neural network.

  It fits the model to the training data (X_train, y_train).

  - epochs=10 :
    The model will go through the entire training dataset 10 times.
  - batch_size=32 :
    The data is processed in mini-batches of 32 sequences at a time (helps improve training speed and stability).
  - validation_data=(X_test, y_test) :
    After each epoch, the model is evaluated on the test set to monitor how well it performs on unseen data.

So, to summarize, We are training our RNN model over 10 cycles (epochs), in small chunks (batches of 32), while checking performance on test data after each round — a standard and efficient training approach in deep learning.

## Step 4d: Model Evaluation

In [24]:
mse = model_rnn.evaluate(X_test, y_test)
print(f'Test MSE: {mse}')

[1m62/62[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 11ms/step - loss: 2.9444e-05
Test MSE: 2.8974569431738928e-05


## What this does?

This runs the model on the test data (X_test) and compares the predicted values with the actual values (y_test).

Since the model was compiled with loss='mse' (mean squared error), this function returns the MSE on the test set.

It's a measure of how far off the model’s predictions are, on average.


## Conclusion:

The MSE value is 2.89 which means the average squared difference between predicted and actual values is 2.89 (which is very small—indicating good performance).

## **Summary of Results:**


**FNN:** We achieved more than 90 percent accuracy on the Iris dataset, showcasing that FNNs are well-suited for simple classification tasks.

**CNN:** We achieved around 70 percent accuracy on the CIFAR-10 dataset, highlighting the CNN’s ability to recognize spatial features in image data.

**RNN:** We have minimized MSE for predicting the Sine Wave, demonstrating the RNN's capacity for handling sequential data.