# Digit Recognition System

This notebook focuses on building a neural network to accurately interpret handwritten digits from scanned forms using the **MNIST** dataset.  
The objective is to achieve at least **96% accuracy** on the test set while exploring the trade-offs between using a high-level machine learning library (**Scikit-learn**) and a deep learning framework (**TensorFlow**).

Before starting, the necessary libraries are imported to support data handling, model building, training, and evaluation. 📚

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import plotly.graph_objects as go
import plotly.express as px

from plotly.subplots import make_subplots
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import confusion_matrix

import tensorflow as tf

Sequential = tf.keras.models.Sequential
Dense = tf.keras.layers.Dense
Input = tf.keras.layers.Input
plot_model = tf.keras.utils.plot_model
Adam = tf.keras.optimizers.Adam


# Data Preprocessing

To prepare the dataset for training, the MNIST handwritten digits data is first loaded.  
Pixel values are normalized to the [0, 1] range to improve model performance and training stability.  
Additionally, input images are flattened into 1D vectors where required, ensuring compatibility with different machine learning models.

In [4]:
# Load the dataset
digits = load_digits()

# Extract images and labels
X = digits.images.astype(np.float32)
y = digits.target

# Normalize pixel values to the range [0, 1]
X /= X.max()
print(f"Original X shape: {X.shape}")

# Flatten the images to 1D vectors
X = X.reshape(X.shape[0], -1)
print(f"X shape after flattening: {X.shape}")

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


Original X shape: (1797, 8, 8)
X shape after flattening: (1797, 64)


# 2. Model Implementation

In this section, different machine learning approaches are used to build a neural network for handwritten digit recognition.  
Both a Scikit-learn and a TensorFlow version are developed, evaluated, and compared based on training performance and final test accuracy.

## A. Scikit-learn Version (MLPClassifier with GridSearchCV)

In this part, a multilayer perceptron (MLP) is built using Scikit-learn’s **MLPClassifier** and optimized with a **GridSearchCV** strategy.  
The base model uses the Adam solver with a maximum of 200 training iterations and a fixed random seed for reproducibility.  
A hyperparameter grid is explored, tuning architecture size, activation functions, batch size, learning rate strategy, and regularization strength.  

In [5]:
# Define the base MLP model (basic configuration)
mlp = MLPClassifier(
    solver='adam',        
    max_iter=200,         
    random_state=42       
)

# Define the hyperparameter grid to search
param_grid = {
    'hidden_layer_sizes': [(100,), (112, 112), (128,), (104, 100, 104)], 
    'activation': ['relu', 'tanh'],          
    'alpha': [0.0001, 0.001, 0.01],          
    'batch_size': [32, 64, 128],              
    'learning_rate': ['adaptive', 'constant'], 
    'learning_rate_init': [0.001, 0.01],      
    'early_stopping': [True, False],          
}

# Set up GridSearchCV for hyperparameter tuning
grid_search = GridSearchCV(
    estimator=mlp,         
    param_grid=param_grid, 
    cv=3,                 
    n_jobs=-1,             
    verbose=0
)

grid_search.fit(X_train, y_train)

In [6]:
print("Best parameters found:", grid_search.best_params_)
print("Best cross-validation score:", grid_search.best_score_)

Best parameters found: {'activation': 'relu', 'alpha': 0.01, 'batch_size': 64, 'early_stopping': False, 'hidden_layer_sizes': (112, 112), 'learning_rate': 'adaptive', 'learning_rate_init': 0.001}
Best cross-validation score: 0.9763395963813499


In [7]:
# Get the best model from grid search
best_mlp = grid_search.best_estimator_

# Get the loss curve from the best model
loss_curve = best_mlp.loss_curve_

# Plot the loss curve
fig = go.Figure()
fig.add_trace(go.Scatter(
    y=loss_curve,
    mode='lines',
    name='Loss Curve',
    line=dict(color='blue')
))

fig.update_layout(
    title='Loss Curve of MLP Classifier',
    xaxis_title='Iterations',
    yaxis_title='Loss',
    showlegend=True
)
fig.show()

In [8]:
y_pred = best_mlp.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

fig = px.imshow(
    cm,
    text_auto=True,
    color_continuous_scale='Blues',
    labels=dict(x="Predicted", y="True", color="Count"),
    x=best_mlp.classes_,
    y=best_mlp.classes_,
)

fig.update_layout(
    title="Confusion Matrix",
    xaxis_title="Predicted Label",
    yaxis_title="True Label",
    coloraxis_colorbar=dict(title="Count"),
)
fig.show()

### Conclusion MLPClassifier

The best model, selected through **GridSearchCV**, uses a two-hidden-layer architecture with 112 neurons each, ReLU activation, an adaptive learning rate, and moderate L2 regularization (alpha = 0.01).  
The final cross-validation score achieved was **97.63%**, comfortably exceeding the target of 96% accuracy.

The **training loss curve** shows a rapid and stable convergence, indicating that the model learned efficiently without significant overfitting or instability.  
The **confusion matrix** reveals strong performance across most digit classes, with very few misclassifications, particularly among similar digits.

Overall, the MLPClassifier demonstrates strong reliability for handwritten digit recognition on the MNIST-like dataset, confirming the effectiveness of both the architecture and the hyperparameter tuning strategy.

## B. TensorFlow
In this section, the multilayer perceptron (MLP) model is re-implemented using TensorFlow and Keras.
A Sequential architecture is built with dense layers and softmax output for multiclass digit classification.
The model is trained multiple times while varying hyperparameters such as learning rate, batch size, validation split, and number of epochs, in order to study their impact on performance.

Training and validation curves are analyzed for each configuration, and the best model is selected based on test accuracy and confusion matrix evaluation.

In [2]:
def build_and_train_model(validation_split=0.2, learning_rate=0.001, batch_size=32, epochs=20):
    model = Sequential([
        Input(shape=(X.shape[1],)),
        Dense(112, activation='relu'),
        Dense(112, activation='relu'),
        Dense(10, activation='softmax')
    ])

    model.compile(
        optimizer=Adam(learning_rate=learning_rate),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )

    history = model.fit(
        X_train, y_train,
        validation_split=validation_split,
        batch_size=batch_size,
        epochs=epochs,
        verbose=1
    )

    return model, history

In [5]:
configurations = [
    {'validation_split': 0.2, 'learning_rate': 0.001, 'batch_size': 64, 'epochs': 50},
    {'validation_split': 0.1, 'learning_rate': 0.005, 'batch_size': 32, 'epochs': 75},
    {'validation_split': 0.2, 'learning_rate': 0.01, 'batch_size': 128, 'epochs': 100},
    {'validation_split': 0.15, 'learning_rate': 0.001, 'batch_size': 32, 'epochs': 50},
    {'validation_split': 0.2, 'learning_rate': 0.005, 'batch_size': 64, 'epochs': 75},
]

histories = []
models = []

for i, config in enumerate(configurations):
    print(f"Training model {i+1} with config: {config}")
    
    model, history = build_and_train_model(
        validation_split=config['validation_split'],
        learning_rate=config['learning_rate'],
        batch_size=config['batch_size'],
        epochs=config['epochs']
    )
    
    models.append(model)
    histories.append(history)

Training model 1 with config: {'validation_split': 0.2, 'learning_rate': 0.001, 'batch_size': 64, 'epochs': 50}
Epoch 1/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 38ms/step - accuracy: 0.2956 - loss: 2.2190 - val_accuracy: 0.6806 - val_loss: 1.8049
Epoch 2/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.7413 - loss: 1.6368 - val_accuracy: 0.8889 - val_loss: 1.1339
Epoch 3/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.8925 - loss: 0.9972 - val_accuracy: 0.9062 - val_loss: 0.6084
Epoch 4/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.9256 - loss: 0.5409 - val_accuracy: 0.9062 - val_loss: 0.3919
Epoch 5/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - accuracy: 0.9404 - loss: 0.3287 - val_accuracy: 0.9201 - val_loss: 0.3184
Epoch 6/50
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/ste

In [6]:
num_models = len(models)

fig = make_subplots(
    rows=2, cols=3,
    subplot_titles=[f"Model {i+1}" for i in range(num_models)],
    horizontal_spacing=0.1,
    vertical_spacing=0.2
)

row = 1
col = 1

for i, history in enumerate(histories):
    fig.add_trace(
        go.Scatter(
            y=history.history['loss'],
            mode='lines',
            name=f'Training Loss Model {i+1}'
        ),
        row=row, col=col
    )

    fig.add_trace(
        go.Scatter(
            y=history.history['val_loss'],
            mode='lines',
            name=f'Validation Loss Model {i+1}'
        ),
        row=row, col=col
    )

    if col == 3:
        row += 1
        col = 1
    else:
        col += 1

fig.update_layout(
    height=500, width=800,
    title_text="Training and Validation Loss Across Models",
    showlegend=False,
    template="plotly_white"
)

fig.show()

In [8]:
num_models = len(models)

fig = make_subplots(
    rows=2, cols=3,
    subplot_titles=[f"Model {i+1}" for i in range(num_models)],
    horizontal_spacing=0.1,
    vertical_spacing=0.2
)

row = 1
col = 1

for i, history in enumerate(histories):
    fig.add_trace(
        go.Scatter(
            y=history.history['accuracy'],
            mode='lines',
            name=f'Training Accurancy Model {i+1}'
        ),
        row=row, col=col
    )

    fig.add_trace(
        go.Scatter(
            y=history.history['val_accuracy'],
            mode='lines',
            name=f'Validation Accurancy Model {i+1}'
        ),
        row=row, col=col
    )

    if col == 3:
        row += 1
        col = 1
    else:
        col += 1

fig.update_layout(
    height=500, width=800,
    title_text="Training and Validation Accurancy Across Models",
    showlegend=False,
    template="plotly_white"
)

fig.show()

In [9]:
for i, model in enumerate(models):
    print("="*40)
    print(f"Model {i+1} Summary")
    print("="*40)
    
    test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
    print(f"Test Loss:     {test_loss:.4f}")
    print(f"Test Accuracy: {test_accuracy:.4f}")
    print("\n" + "-"*40 + "\n")


Model 1 Summary
Test Loss:     0.1170
Test Accuracy: 0.9667

----------------------------------------

Model 2 Summary
Test Loss:     0.1301
Test Accuracy: 0.9806

----------------------------------------

Model 3 Summary
Test Loss:     0.1453
Test Accuracy: 0.9639

----------------------------------------

Model 4 Summary
Test Loss:     0.1027
Test Accuracy: 0.9667

----------------------------------------

Model 5 Summary
Test Loss:     0.1409
Test Accuracy: 0.9611

----------------------------------------



In [13]:
best_tf_model = models[1]
y_pred_tf = best_tf_model.predict(X_test)
y_pred_tf_classes = np.argmax(y_pred_tf, axis=1)

cm_tf = confusion_matrix(y_test, y_pred_tf_classes)
fig = px.imshow(
    cm_tf,
    text_auto=True,
    color_continuous_scale='Reds',
    labels=dict(x="Predicted Label", y="True Label", color="Count"),
    x=[str(i) for i in range(10)],
    y=[str(i) for i in range(10)],
)

fig.update_layout(
    title="Confusion Matrix for Best Model",
    template="plotly_white"
)

fig.show()

[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step


### Conclusion: TensorFlow Model Evaluation

All models achieved high test accuracy above 96%, demonstrating strong generalization to unseen data.  
The best performing model was **Model 2**, which achieved a test accuracy of **98.06%**, the highest among all variations tested.

The training and validation curves showed smooth convergence for most models, with minor fluctuations in a few cases but no significant overfitting observed.
The confusion matrix for the best model confirmed robust classification performance, with almost all digits correctly identified and very few misclassifications across classes.

# 3. Performance Comparison

## Test Accuracy

- The TensorFlow models generally achieved **higher test accuracy** compared to the Scikit-learn models.
- The best TensorFlow model reached a **test accuracy of 98.06%**, while the best Scikit-learn model achieved around **97.63%**.


## Model Flexibility

- **Scikit-learn MLPClassifier** was **easier to set up and experiment with**.  
  Using **GridSearchCV**, it was possible to test many hyperparameter combinations in a **structured and clean way** without writing complex code.
- **TensorFlow Sequential API** offered **more flexibility** to customize the model architecture (layers, neurons, activations, etc.), but testing different training configurations required **manual nested loops**, making it **harder to organize**.

---

## Discussion

- **Which was easier to implement?**  
  ➔ The **Scikit-learn MLPClassifier** was easier to implement and manage, especially for systematic hyperparameter testing.

- **Which gave better performance? Why?**  
  ➔ This improvement may be attributed to its more optimized training backend, more explicit control over the number of training epochs, and the ability to directly specify a validation split during training. In contrast, Scikit-learn's MLPClassifier relies on early stopping with an internal validation fraction, making training slightly less transparent and harder to monitor in detail.

Dictionary:

- `Activation function`: Determines the output of a neuron (relu, than)       
- `Alpha`: L2 regularization parameter. It penalizes large weights to prevent overfitting.         
- `Batch Size`: number of samples (data points) passed through the network before updating the model’s weights              
- `Learning Rate`: Defines how the learning rate evolves.(adaptive-reduces learning rate when model stops improving,constant-fixed value), 
- `early_stopping`: when True Stops training when validation loss stops improving. 
- `Validation Split`: Proportion of training data held out for validation. (x% of training data is used to evaluate model performance during training)
- `Epochs`: Number of full passes through the entire training dataset.