## 1. Introduction

## Problem Overview

The field of machine learning has seen significant advancements in recent years, with neural networks becoming increasingly popular for solving complex classification problems. In this project, we aim to develop and evaluate neural network models for a real-world dataset, specifically the Iris Dataset.

## Dataset Details

The Iris Dataset, available from the UCI Machine Learning Repository, is a classic dataset in the field of machine learning. It consists of:

- 150 samples
- 4 features: sepal length, sepal width, petal length, and petal width
- 3 target classes: Setosa, Versicolour, and Virginica

This dataset is particularly suitable for classification tasks and serves as an excellent starting point for understanding neural network performance on real-world data.

## Objective

The main objectives of this project are:

1. To perform a comprehensive Exploratory Data Analysis (EDA) on the Iris Dataset
2. To design and implement two different neural network architectures
3. To train these models using appropriate learning techniques
4. To evaluate and compare the performance of the models using various metrics

## Approach

Our approach to this project will be structured as follows:

1. **Exploratory Data Analysis (EDA)**: We will begin by preprocessing the data and conducting a thorough EDA. This will include data visualization, feature scaling, and splitting the dataset into training, validation, and test sets.

2. **Neural Network Design**: We will design two different neural network architectures, such as a Feed-Forward Neural Network and a Convolutional Neural Network. The choice of architectures will be justified based on the characteristics of the Iris Dataset and the classification problem at hand.

3. **Model Training**: The models will be trained using Keras, employing appropriate learning techniques such as backpropagation and the Adam optimizer.

4. **Evaluation**: We will evaluate the models using various metrics including accuracy, precision, recall, F1-score, and confusion matrices. This comprehensive evaluation will allow us to compare the performance of our models and discuss their strengths and limitations.

Through this project, we aim to gain practical experience in applying neural networks to real-world data and to develop a deeper understanding of the challenges and considerations involved in model development and evaluation.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn.datasets import load_iris
# Training frameworks
from pipeline.training_framework import ModelTrainer
# Utilities
from utils.data_processing import IrisDataEngineer
from utils.evaluation_metrics import AdvancedAnalytics
from utils.eda_module import run_eda

In [None]:
    # ---------------- Set Reproducibility ----------------
SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)

## 2. Dataset Introduction

The Iris dataset, introduced by British statistician and biologist Ronald Fisher in 1936, represents one of the earliest examples of multivariate analysis[15]. It consists of 150 samples from three species of Iris flowers (Setosa, Versicolour, and Virginica), with 50 samples per species. Each sample includes four features:

- Sepal Length (cm)
- Sepal Width (cm)
- Petal Length (cm)
- Petal Width (cm)

### Historical and Scientific Significance

The Iris dataset has become a cornerstone in machine learning and pattern recognition for several reasons:

- **Benchmark Status**: It serves as a standard benchmark for classification algorithms due to its clean structure and moderate complexity
- **Educational Value**: The dataset provides an ideal starting point for teaching classification techniques
- **Feature Relationships**: The three classes are linearly separable in some dimensions but not in others, creating an interesting classification challenge
- **Real-world Representation**: It represents actual biological measurements, connecting machine learning to scientific applications

In this project, we'll examine whether advanced neural network architectures (Residual and Attention-based networks) provide any advantages over simpler models for this classic dataset.


## 3. Data Loading and Preprocessing

We load and process the Iris dataset using a custom `IrisDataEngineer` class. The class handles:
- Loading the dataset from `sklearn.datasets`.
- Encoding target labels numerically.
- Splitting the data into training (60%), validation (20%), and test (20%) sets using stratified sampling.
- Applying `StandardScaler` to normalize features to zero mean and unit variance.

This preprocessing pipeline is crucial for stabilizing and accelerating neural network training.



In [None]:
    # ---------------- Load and Preprocess Data ----------------
print("\n🚀 Initializing Data Pipeline...")
processor = IrisDataEngineer()
(X_train, y_train), (X_val, y_val), (X_test, y_test) = processor.process()
class_names = load_iris().target_names.tolist()

## 4. Exploratory Data Analysis

Our EDA process leverages visualization techniques to understand the dataset's characteristics before model development. The `eda_module.py` provides comprehensive visualization capabilities to extract meaningful insights from the Iris dataset.

### 4.1 Data Distribution Analysis

The histograms below reveal the distribution of each feature across the three Iris species:

- **Sepal Dimensions**: While there is some overlap in sepal measurements across species, Setosa typically has shorter but wider sepals compared to the other species
- **Petal Dimensions**: Petal measurements show clearer separation between species, with Setosa having distinctly smaller petals, making these features particularly valuable for classification

### 4.2 Feature Correlation and Relationships

The correlation matrix and pairwise scatter plots reveal:

- **Strong Positive Correlation**: Petal length and petal width show strong positive correlation (r > 0.9), suggesting these features carry similar information
- **Moderate Correlation**: Sepal length correlates moderately with petal dimensions
- **Species Clustering**: The scatter plots demonstrate that Setosa forms a distinct cluster, while Versicolor and Virginica show some overlap

### 4.3 Dimensionality Reduction Visualization

PCA and t-SNE visualizations reduce the 4-dimensional feature space to 2 dimensions:

- **PCA**: The first two principal components explain approximately 95% of the variance, with clear separation between Setosa and the other species
- **t-SNE**: This non-linear technique further enhances visualization of the cluster separation, particularly between Versicolor and Virginica

These insights inform our modeling approach by highlighting the relative importance of features and the inherent separability of the classes.


In [None]:
    # ---------------- Exploratory Data Analysis (EDA) ----------------
# Load the full structured dataset before any train/validation/test split
df_raw = processor._load_structured_data()
X_full = df_raw.drop('species', axis=1)
y_full = df_raw['species']

# This will generate summary statistics and all relevant visualizations for the full dataset
run_eda(X_full, y_full)


   ## 5. Model Trainers Instantiation
To streamline our experiments, we instantiate two `ModelTrainer` objects – one for the *residual* network and one for the *attention-based* network. The `ModelTrainer` class abstracts away the model-building and training processes for each architecture, allowing us to initialize a model by specifying its type (`'residual'` or `'attention'`). This design promotes modularity and ensures a consistent training pipeline for both models. By providing the input shape (4 features for Iris) and number of classes (3 species), each `ModelTrainer` knows how to construct the appropriate neural network with the correct input and output dimensions.


In [None]:
    # ---------------- Instantiate Model Trainers ----------------
# We use a unified ModelTrainer class for both residual and attention models
residual_trainer = ModelTrainer('residual', input_shape=X_train.shape[1:], num_classes=len(class_names))
attention_trainer = ModelTrainer('attention', input_shape=X_train.shape[1:], num_classes=len(class_names))
print("✅ ModelTrainer class ready for 'residual' and 'attention' models.")

## 6. Residual Network Training and Initial Evaluation
Here we construct and train the Residual Neural Network. This architecture includes skip connections to improve gradient flow and support deeper learning. After model creation, we train it using early stopping to avoid overfitting. The evaluation step includes accuracy, F1-score, and ROC-AUC, which establish a baseline performance for the residual model.


In [None]:
    # ---------------- Initialize Evaluation Module ----------------
analyst = AdvancedAnalytics(class_names)

# ---------------- Residual Model (Manual) ----------------
print("\n🧠 Building and Training Residual Model...")
residual_model = residual_trainer.create_model()
residual_model.summary()
residual_history = residual_trainer.train(residual_model, X_train, y_train, X_val, y_val)

print("\n🔍 Evaluating Residual Model...")
analyst.full_analysis(residual_model, X_test, y_test)   

## 7. Attention Network Training and Initial Evaluation
We now train the Attention-Based Neural Network, designed to focus on the most informative features. After creating the model and reviewing the architecture, we train it under the same settings as the residual network. The evaluation metrics on the test set (including precision and ROC-AUC) help us assess its performance and establish a baseline for comparison.


In [None]:
    # ---------------- Attention Model (Manual) ----------------
print("\n🧠 Building and Training Attention Model...")
attention_model = attention_trainer.create_model()
attention_model.summary()
attention_history = attention_trainer.train(attention_model, X_train, y_train, X_val, y_val)

print("\n🔍 Evaluating Attention Model...")
analyst.full_analysis(attention_model, X_test, y_test)

## 8. Hyperparameter Tuning
Next, we use automated tuning (e.g., Keras Tuner) to optimize key hyperparameters like the number of units and learning rate. 
For each architecture, the tuner searches the space for the best validation performance. 
This step is important to ensure each model is trained under ideal conditions and to test how much tuning improves generalization.


In [None]:
    # ---------------- Hyperparameter Tuning ----------------
print("\n🔧 Tuning Residual Model...")
best_residual, best_residual_hp = residual_trainer.tune_model(X_train, y_train, X_val, y_val)
print("Best Residual Hyperparameters:", best_residual_hp.values)

print("\n🔧 Tuning Attention Model...")
best_attention, best_attention_hp = attention_trainer.tune_model(X_train, y_train, X_val, y_val)
print("Best Attention Hyperparameters:", best_attention_hp.values)


## 9. Retraining Models with Best Hyperparameters
With the best hyperparameters selected, we retrain each model from scratch to obtain the final tuned versions. This allows us to compare them fairly with the earlier manually-tuned versions and analyze how tuning affects learning dynamics and final performance.


In [None]:
    # ---------------- Retraining Models with Best Hyperparameters ----------------
print("\n🔥 Final Training with Tuned Residual Model...")
residual_history_tuned = residual_trainer.train(best_residual, X_train, y_train, X_val, y_val)

print("\n🔥 Final Training with Tuned Attention Model...")
attention_history_tuned = attention_trainer.train(best_attention, X_train, y_train, X_val, y_val)

## 10. Saving Models to Disk
Once models are retrained with optimal parameters, we save them to disk in `.keras` format. This is a best practice for reproducibility and enables future re-use or deployment of the models without retraining.


In [None]:
    # ---------------- Save Models to Disk ----------------
residual_trainer.save_model(best_residual)
attention_trainer.save_model(best_attention)

## 11. Reloading Models for Evaluation
To confirm our saved models are usable, we reload them into memory. This mimics deployment or reuse in a different session, and ensures that the saved weights and architecture restore correctly. These reloaded models are now ready for final evaluation.


In [None]:
    # ---------------- Reload Models for Evaluation ----------------
loaded_residual = residual_trainer.load_model()
loaded_attention = attention_trainer.load_model()

## 12. Final Evaluation of Tuned Models
We now perform a comprehensive evaluation of the reloaded models on the test set. Key metrics like precision, recall, F1-score, and ROC-AUC are examined. Here, the attention model clearly outperforms the residual network, especially in recall for the more ambiguous classes.


In [None]:
    # ---------------- Final Evaluation of Tuned Models ----------------
print("\n🔍 Evaluating Tuned Residual Model...")
analyst.full_analysis(loaded_residual, X_test, y_test)

print("\n🔍 Evaluating Tuned Attention Model...")
analyst.full_analysis(loaded_attention, X_test, y_test)

## 13. Training History Comparison Functions
To visualize model performance over time, we define a utility function to compare training histories. This function helps us understand how validation accuracy and loss evolve across epochs and compare learning dynamics between models.


In [None]:
# ---------------- Comparison Plot Functions ----------------
def compare_models(histories, labels, title_suffix):
    """Plot side-by-side comparison of validation accuracy and loss for multiple models."""
    plt.figure(figsize=(14, 5))
    # Validation Accuracy
    plt.subplot(1, 2, 1)
    for history, label in zip(histories, labels):
        if 'val_sparse_categorical_accuracy' in history.history:
            plt.plot(history.history['val_sparse_categorical_accuracy'], label=label)
    plt.title(f'Validation Accuracy Comparison - {title_suffix}')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    # Validation Loss
    plt.subplot(1, 2, 2)
    for history, label in zip(histories, labels):
        plt.plot(history.history['val_loss'], label=label)
    plt.title(f'Validation Loss Comparison - {title_suffix}')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.tight_layout()
    plt.show()

 ## 14. Individual Training History Plotting
This function plots training and validation accuracy and loss curves for a single model. It's helpful for spotting overfitting or underfitting by visualizing how the model behaves across training epochs. We'll use this to analyze each model's performance.


In [None]:
# --- Individual Training History Plot Function ---
def plot_individual(history, title):
    """Plot accuracy and loss curves for a single model training history."""
    plt.figure(figsize=(14, 5))
    plt.suptitle(title)
    # Accuracy
    plt.subplot(1, 2, 1)
    plt.plot(history.history['sparse_categorical_accuracy'], label='Train')
    plt.plot(history.history['val_sparse_categorical_accuracy'], label='Validation')
    plt.title('Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    # Loss
    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'], label='Train')
    plt.plot(history.history['val_loss'], label='Validation')
    plt.title('Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.tight_layout()
    plt.show()

## 15. Individual Model Training Curves
We now visualize each model's training history using the `plot_individual` function. 
These plots reveal key insights into how each model learned over time, and where overfitting or efficient learning occurred.


In [None]:
    # ---------------- Plot Individual Histories ----------------
# Each plot shows accuracy and loss for one model during training
plot_individual(residual_history, 'Residual Model (Manual Parameters)')
plot_individual(attention_history, 'Attention Model (Manual Parameters)')
plot_individual(residual_history_tuned, 'Residual Model (Tuned with Hyperparameters)')
plot_individual(attention_history_tuned, 'Attention Model (Tuned with Hyperparameters)')

## 16. Residual vs. Attention – Model Performance Comparison
Here we compare the residual and attention models side-by-side under both manual and tuned configurations. Validation accuracy and loss plots reveal that the attention model consistently outperforms the residual model in both scenarios.


In [None]:
    # ---------------- Comparative Plots ----------------
# Compare residual vs attention models under manual training
compare_models([residual_history, attention_history], ['Residual (Manual)', 'Attention (Manual)'], 'Manual Models')

# Compare residual vs attention models after hyperparameter tuning
compare_models([residual_history_tuned, attention_history_tuned], ['Residual (Tuned)', 'Attention (Tuned)'], 'Tuned Models')

# Compare manual vs tuned for residual model
compare_models([residual_history, residual_history_tuned], ['Residual (Manual)', 'Residual (Tuned)'], 'Residual: Manual vs Tuned')

# Compare manual vs tuned for attention model
compare_models([attention_history, attention_history_tuned], ['Attention (Manual)', 'Attention (Tuned)'], 'Attention: Manual vs Tuned')