# Assignment 1


Objective: 
To design and implement a Convolutional Neural Network (CNN) to classify hand-drawn emojis from a small labeled dataset. The objective of this exercise is to accurately discriminate between various categories of emoji with a compact and efficient CNN architecture that you will build. 

Constraints: 

- No Transfer Learning:
You are prohibited from using any pre-trained models (e.g., VGG, ResNet, MobileNet, etc.). All components of the model must be constructed from scratch using conventional deep learning frameworks (e.g., TensorFlow/Keras or PyTorch).

- Model Simplicity: 
Your CNN is limited to a maximum of 3 hidden layers (don't include input and output layers) so you can limit the complexity of the architecture and take an emphasis to build on the core principles of CNNs, e.g., Convolution, pooling and dense layers. 

- Data Constraints: 
The dataset is small, it consists of hand-drawn images of emojis with various styles and categories. You must keep in mind the constraints of the data and design your model to appropriately account for this data, including the use of regularization or data augmentation appropriately.

For the dataset link: https://drive.google.com/drive/folders/1Uo5WCK3z35z8k4k3gVfHn_N-OoE9rLNt?usp=sharing

The following model tries to correctly classsify the hand-drawn emojis using CNNs. The accuracy achieved by the model is roughly around **50%** which is decent considering the size of the dataset. The scope of further improvements remains open and any further improvements will be pushed (documented as well).


In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np
import os
import tensorflow as tf
from sklearn.model_selection import train_test_split
import cv2

### Approach:
1. **Loading the Data** - The dataset contain 18 classes with each classs containing 30 400X400 images. This means the dataset contians a total of 540 examples. The images are loaded with the help of opencv, resized(128X128) and stored in lists. It is quite important to mention that the these are **transparent .png files** therefore the background is changed from transparent to white to ensure no background overlay problem occurs when we train the model, this is followed by grayscaling the images. The dataset is then split into three sets **Training, Validation, Test** using the train_test_split function from scikit learn.
2. **Augmentation of the Data** - Since the dataset is quite small, we use the ImageDataGenerator from Keras to augment our data. The following augmentations are performed:
    - Rotation
    - Horizontal & Vertical Shift
    - Skewing
    - Zoom
    - Flipping
    
    Lastly all the new pixels that might occur due to the augmentation are filled according to their nearest neighbour, this is to make sure  that the new pixels remain mostly as the background $i.e.$ white. 
3. **Model Architecture** - <pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃<span style="font-weight: bold"> Layer (type)                    </span>┃<span style="font-weight: bold"> Output Shape           </span>┃<span style="font-weight: bold">       Param # </span>┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ conv2d (<span style="color: #0087ff; text-decoration-color: #0087ff">Conv2D</span>)                 │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">128</span>, <span style="color: #00af00; text-decoration-color: #00af00">128</span>, <span style="color: #00af00; text-decoration-color: #00af00">16</span>)   │           <span style="color: #00af00; text-decoration-color: #00af00">272</span> │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization             │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">128</span>, <span style="color: #00af00; text-decoration-color: #00af00">128</span>, <span style="color: #00af00; text-decoration-color: #00af00">16</span>)   │            <span style="color: #00af00; text-decoration-color: #00af00">64</span> │
│ (<span style="color: #0087ff; text-decoration-color: #0087ff">BatchNormalization</span>)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d (<span style="color: #0087ff; text-decoration-color: #0087ff">MaxPooling2D</span>)    │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>, <span style="color: #00af00; text-decoration-color: #00af00">16</span>)     │             <span style="color: #00af00; text-decoration-color: #00af00">0</span> │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_1 (<span style="color: #0087ff; text-decoration-color: #0087ff">Conv2D</span>)               │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>, <span style="color: #00af00; text-decoration-color: #00af00">32</span>)     │         <span style="color: #00af00; text-decoration-color: #00af00">4,640</span> │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_1           │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>, <span style="color: #00af00; text-decoration-color: #00af00">32</span>)     │           <span style="color: #00af00; text-decoration-color: #00af00">128</span> │
│ (<span style="color: #0087ff; text-decoration-color: #0087ff">BatchNormalization</span>)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_1 (<span style="color: #0087ff; text-decoration-color: #0087ff">MaxPooling2D</span>)  │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">32</span>, <span style="color: #00af00; text-decoration-color: #00af00">32</span>, <span style="color: #00af00; text-decoration-color: #00af00">32</span>)     │             <span style="color: #00af00; text-decoration-color: #00af00">0</span> │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_2 (<span style="color: #0087ff; text-decoration-color: #0087ff">Conv2D</span>)               │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">32</span>, <span style="color: #00af00; text-decoration-color: #00af00">32</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>)     │        <span style="color: #00af00; text-decoration-color: #00af00">18,496</span> │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ batch_normalization_2           │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">32</span>, <span style="color: #00af00; text-decoration-color: #00af00">32</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>)     │           <span style="color: #00af00; text-decoration-color: #00af00">256</span> │
│ (<span style="color: #0087ff; text-decoration-color: #0087ff">BatchNormalization</span>)            │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_2 (<span style="color: #0087ff; text-decoration-color: #0087ff">MaxPooling2D</span>)  │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">16</span>, <span style="color: #00af00; text-decoration-color: #00af00">16</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>)     │             <span style="color: #00af00; text-decoration-color: #00af00">0</span> │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (<span style="color: #0087ff; text-decoration-color: #0087ff">Dropout</span>)               │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">16</span>, <span style="color: #00af00; text-decoration-color: #00af00">16</span>, <span style="color: #00af00; text-decoration-color: #00af00">64</span>)     │             <span style="color: #00af00; text-decoration-color: #00af00">0</span> │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten (<span style="color: #0087ff; text-decoration-color: #0087ff">Flatten</span>)               │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">16384</span>)          │             <span style="color: #00af00; text-decoration-color: #00af00">0</span> │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (<span style="color: #0087ff; text-decoration-color: #0087ff">Dropout</span>)             │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">16384</span>)          │             <span style="color: #00af00; text-decoration-color: #00af00">0</span> │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (<span style="color: #0087ff; text-decoration-color: #0087ff">Dense</span>)                   │ (<span style="color: #00d7ff; text-decoration-color: #00d7ff">None</span>, <span style="color: #00af00; text-decoration-color: #00af00">18</span>)             │       <span style="color: #00af00; text-decoration-color: #00af00">294,930</span> │
└─────────────────────────────────┴────────────────────────┴───────────────┘
</pre><pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-weight: bold"> Total params: </span><span style="color: #00af00; text-decoration-color: #00af00">318,786</span> (1.22 MB)
</pre><pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-weight: bold"> Trainable params: </span><span style="color: #00af00; text-decoration-color: #00af00">318,562</span> (1.22 MB)
</pre><pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-weight: bold"> Non-trainable params: </span><span style="color: #00af00; text-decoration-color: #00af00">224</span> (896.00 B)
</pre>
The model architecture also contains a special class whose purpose is to stop the epochs once 60% validation accuracy is reached. This was implemented since the accuracy was seen to deteriorate after a few more epochs leading to poor performance. 


In [None]:
project_dir = os.getcwd()
root_dir = os.path.join(project_dir, "dataset")
size_image = (128, 128)
x_set, y_set = [],[]
class_names = sorted(os.listdir(root_dir))
class_to_idx = {cls_name: i for i, cls_name in enumerate(class_names)}
for cls in class_names:
    cls_path = os.path.join(root_dir, cls)
    for img_file in os.listdir(cls_path):
        img_path = os.path.join(cls_path, img_file)
        img = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
        if img.shape[-1] == 4:
            alpha_channel = img[:, :, 3]
            rgb_channels = img[:, :, :3]
            white_background = np.ones_like(rgb_channels, dtype=np.uint8) * 255
            alpha_factor = alpha_channel[:, :, np.newaxis] / 255.0
            img = rgb_channels * alpha_factor + white_background * (1 - alpha_factor)
            img = img.astype(np.uint8)
        if len(img.shape) == 3:
            img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        img = cv2.resize(img, size_image)
        img = img.astype('float32') / 255.0
        x_set.append(img)
        y_set.append(class_to_idx[cls])
x_all = np.expand_dims(np.array(x_set), axis=-1)
y_all = np.array(y_set)
x_train, x_test, y_train, y_test = train_test_split(
    x_all, y_all, test_size=0.1, stratify=y_all, random_state=42)
x_train, x_val, y_train, y_val = train_test_split(
    x_train, y_train, test_size=0.3, stratify=y_train, random_state=42)   
x_train = np.expand_dims(np.array(x_train), axis=-1)
x_val = np.expand_dims(np.array(x_val), axis=-1)
x_test = np.expand_dims(np.array(x_test), axis=-1)
y_train = np.array(y_train)
y_val = np.array(y_val)
y_test = np.array(y_test)

In [4]:
Augementator = ImageDataGenerator(
    rotation_range = 5,
    width_shift_range = 0.01,
    height_shift_range  = 0.01,
    shear_range = 0.01,
    zoom_range  = 0.01,
    horizontal_flip  = True,
    fill_mode = 'nearest',
)
model = Sequential([
    tf.keras.Input(shape=(128,128,1)),
    Conv2D(16,kernel_size=4,activation='relu',padding='same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=2),
    Conv2D(32,kernel_size=3,activation='relu',padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=2),
    Conv2D(64,kernel_size=3,activation='relu',padding = 'same'),
    BatchNormalization(),
    MaxPooling2D(pool_size=2),
    Dropout(0.3),
    Flatten(),
    Dropout(0.3),
    Dense(18, activation='softmax'),

])

In [42]:
class Stop(tf.keras.callbacks.Callback):
    def __init__(self, target=0.60):
        super().__init__()
        self.target = target

    def on_epoch_end(self, epoch, logs=None):
        val_acc = logs.get("val_accuracy")
        if val_acc is not None and val_acc > self.target:
            print(f" \n{val_acc*100:.2f}% validation accuracy is reached which is optimal for early stoppage.\n")
            self.model.stop_training = True

model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
    metrics=['accuracy']
)
x_train = np.array(x_train).reshape(-1, 128, 128, 1)
x_test = np.array(x_test).reshape(-1, 128, 128, 1)
x_val = np.array(x_val).reshape(-1,128,128,1)
y_train = np.array(y_train)
y_val = np.array(y_val)
y_test = np.array(y_test)

print(np.array(x_train).shape, np.array(y_train).shape)
train_generator = Augementator.flow(x_train, y_train, batch_size=16)
history = model.fit(
    train_generator,
    epochs=50,
    validation_data=(x_val, y_val),
    callbacks = [Stop(0.60)]
)
test_loss, test_acc = model.evaluate(x_test, y_test)

(340, 128, 128, 1) (340,)
Epoch 1/50


[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 91ms/step - accuracy: 0.1135 - loss: 8.3974 - val_accuracy: 0.0548 - val_loss: 2.9504
Epoch 2/50
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 83ms/step - accuracy: 0.4075 - loss: 4.0757 - val_accuracy: 0.0548 - val_loss: 4.9435
Epoch 3/50
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 84ms/step - accuracy: 0.5200 - loss: 2.9474 - val_accuracy: 0.0548 - val_loss: 7.0416
Epoch 4/50
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 83ms/step - accuracy: 0.6038 - loss: 1.9284 - val_accuracy: 0.0548 - val_loss: 9.7866
Epoch 5/50
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 85ms/step - accuracy: 0.6965 - loss: 1.4598 - val_accuracy: 0.0548 - val_loss: 10.7700
Epoch 6/50
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 82ms/step - accuracy: 0.7431 - loss: 1.6102 - val_accuracy: 0.0548 - val_loss: 11.3566
Epoch 7/50
[1m22/22[0m [32m━━━━━━━━━━━━━