<a href="https://colab.research.google.com/github/sattwik-sahu/dse316-hw01/blob/main/src/Q-03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Importing Libraries

- Numpy (Numerical operations)
- Pandas (May be needed)
- Tensorflow (Creating and training models)
- Scikit-Learn (Preprocessing and datasets)
- Plotly (Data visualization)

Installing `keras-tuner` for **hyperparameter tuning**

In [60]:
!pip install -q -U keras-tuner

In [61]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer, MinMaxScaler
from sklearn.metrics import classification_report
from plotly import express as px

import keras_tuner as kt

# Load the Data

We use the MNIST digit image classification dataset here

In [9]:
digits = load_digits()
print(digits["DESCR"])

.. _digits_dataset:

Optical recognition of handwritten digits dataset
--------------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 1797
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each blo

## Convert data to images

- Convert the tabular data into individual images to visualize them better
- As described above, the images are `8 x 8` pixels

In [50]:
X = digits.data
y = digits.target
X_images = X.reshape(-1, 8, 8)

X.shape, X_images.shape, y.shape

((1797, 64), (1797, 8, 8), (1797,))

### Show Sample images

Let's print sample images along with the corresponding ground truth

In [55]:
def show_sample_images(X: np.ndarray, y: np.ndarray, n_samples: int = 5) -> None:
  sample_inx = np.random.randint(0, X.shape[0], n_samples)
  fig = px.imshow(X[sample_inx, :, :], facet_col=0)
  for i, j in enumerate(y[sample_inx]):
    fig.layout.annotations[i]['text'] = f"digit = {j}"

  fig.show()


show_sample_images(X=X_images, y=y, n_samples=6)

## Preprocessing

- The image data needs to be preprocessed, to make it suitable to pass into a `Dense` layer.
- The preprocessing required for this type of operation requires **flattening the image into a 1-D array**
- Although we have `X` suitable to pass into `Dense` layers directly, we will use `X_images` to show how the preprocessing takes place
- In the above description of the image data, it is said every element is a number from `0..16`. We scale the data to `0..1` range by dividing by `16`. We could also apply `MinMaxScaler` from `sklearn` for this, but since we have prior data, we proceed with the division method.

> Tensorflow's `Flatten` layer will be used for flattening the images.

In [56]:
X_ = X_images

# Scaling
X_ = X_images / 16.0

# Flattening is done in the model itself by adding a `Flatten` layer

### Show Scaled Images

There won't be any major difference apart from the scale of the numbers, which is exactly what we intended. We are not losing any data about the images

In [57]:
show_sample_images(X_, y)

In [62]:
Xmin, Xmax = X_.min(), X_.max()
print(f"Min = {Xmin}, Max = {Xmax}")

Max = 0.0, Min = 1.0


# Deep Learning Model Creation

In this part, we create the model

## Splitting Data

- We split the data into `train` and `test` splits, for training the model and validating the performance of the model.
- 75% of the data is used for training and 25% is used for validation (chosen arbitrarily)

In [68]:
X_train, X_test, y_train, y_test = train_test_split(
    X_, y, test_size=0.25,
    random_state=69,
    shuffle=True, stratify=y
)

X_train.shape, y_train.shape, X_test.shape, y_test.shape

((1347, 8, 8), (1347,), (450, 8, 8), (450,))

## Model with Hyperparameter Tuning

- Create a model with tuneable hyperparameters
- These hyperparameters will be searched for the best combination

In [101]:
def create_model(units1, units2, activation, dropout, learning_rate):
  # Create the layers of the model
  model = tf.keras.Sequential([
      tf.keras.layers.Flatten(),  # The flattening operation takes place here
      tf.keras.layers.Dense(units=units1, activation=activation),
      tf.keras.layers.Dropout(rate=dropout),
      tf.keras.layers.Dense(units=units2, activation=activation),
      tf.keras.layers.Dropout(rate=dropout),
      tf.keras.layers.Dense(units=10) # Not applying Softmax as `from_logits` in loss function takes care of it
  ])

  model.compile(optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
                loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                metrics=['accuracy'])

  return model

def model_builder(hp) -> tf.keras.Sequential:
  # Tuneable hyperparameters
  hp_units1 = hp.Int('units1', min_value=128, max_value=512, step=128)
  hp_units2 = hp.Int('units2', min_value=16, max_value=32, step=4)
  hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
  hp_dropout = hp.Choice('dropout', values=[0.0, 0.1, 0.2, 0.3])
  hp_activ = hp.Choice('activation', values=["relu", "tanh"])

  model = create_model(
      units1=hp_units1,
      units2=hp_units2,
      activation=hp_activ,
      dropout=hp_dropout,
      learning_rate=hp_learning_rate
  )

  return model

## Tuning

After defining the search, we now tune the hyperparameters.

### Instantiate Tuner

In [104]:
tuner = kt.RandomSearch(
    hypermodel=model_builder,
    objective="val_accuracy",
    max_trials=30,
    executions_per_trial=2,
    overwrite=True,
    directory="logs",
    project_name="mnist_tuner",
)

tuner.search_space_summary()

Search space summary
Default search space size: 5
units1 (Int)
{'default': None, 'conditions': [], 'min_value': 128, 'max_value': 512, 'step': 128, 'sampling': 'linear'}
units2 (Int)
{'default': None, 'conditions': [], 'min_value': 16, 'max_value': 32, 'step': 4, 'sampling': 'linear'}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}
dropout (Choice)
{'default': 0.0, 'conditions': [], 'values': [0.0, 0.1, 0.2, 0.3], 'ordered': True}
activation (Choice)
{'default': 'relu', 'conditions': [], 'values': ['relu', 'tanh'], 'ordered': False}


### Start the Search

Randomly search for the best hyperparameter combination

In [105]:
tuner.search(
    X_train, y_train,
    epochs=8,
    validation_data=(X_test, y_test),
)

Trial 30 Complete [00h 00m 09s]
val_accuracy: 0.800000011920929

Best val_accuracy So Far: 0.972222238779068
Total elapsed time: 00h 04m 08s


In [106]:
tuner.results_summary()

Results summary
Results in logs/mnist_tuner
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 17 summary
Hyperparameters:
units1: 384
units2: 24
learning_rate: 0.01
dropout: 0.0
activation: relu
Score: 0.972222238779068

Trial 07 summary
Hyperparameters:
units1: 128
units2: 20
learning_rate: 0.01
dropout: 0.1
activation: tanh
Score: 0.9622222185134888

Trial 02 summary
Hyperparameters:
units1: 512
units2: 24
learning_rate: 0.01
dropout: 0.1
activation: relu
Score: 0.9600000083446503

Trial 26 summary
Hyperparameters:
units1: 384
units2: 20
learning_rate: 0.01
dropout: 0.1
activation: relu
Score: 0.9588888883590698

Trial 28 summary
Hyperparameters:
units1: 384
units2: 32
learning_rate: 0.001
dropout: 0.2
activation: tanh
Score: 0.9588888883590698

Trial 00 summary
Hyperparameters:
units1: 128
units2: 24
learning_rate: 0.01
dropout: 0.0
activation: tanh
Score: 0.9555555582046509

Trial 20 summary
Hyperparameters:
units1: 256
units2: 24
learning_rate: 0.001
dr

### Get the best model

Now we get the best model and train it for more epochs

In [112]:
# best_model = tuner.get_best_models(num_models=1)[0]
best_hyperp = tuner.get_best_hyperparameters(num_trials=1)[0]
best_hyperp.values

{'units1': 384,
 'units2': 24,
 'learning_rate': 0.01,
 'dropout': 0.0,
 'activation': 'relu'}

In [115]:
best_model = model_builder(best_hyperp)
best_model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.25)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.src.callbacks.History at 0x7c0c0be2ec50>

In [116]:
best_model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_3 (Flatten)         (None, 64)                0         
                                                                 
 dense_9 (Dense)             (None, 384)               24960     
                                                                 
 dropout_6 (Dropout)         (None, 384)               0         
                                                                 
 dense_10 (Dense)            (None, 24)                9240      
                                                                 
 dropout_7 (Dropout)         (None, 24)                0         
                                                                 
 dense_11 (Dense)            (None, 10)                250       
                                                                 
Total params: 34450 (134.57 KB)
Trainable params: 3445

In [117]:
best_model.evaluate(X_test, y_test)



[0.097435861825943, 0.9755555391311646]

# Results

So we get an accuracy of **97.56%** from the best model trained on all data for 100 epochs