<h1 align="center"><b>Efficient EMNIST Compression with Autoencoders</b></h1>

<h3 align="center">Computing the size of compressed original EMNIST dataset at the end of encoder</h3>

---
---


<a id="Import necessary libraries"></a>
## Import necessary libraries

In [1]:
# Data Handling and Numerical Libraries
import random
import numpy as np

# Keras - Deep Learning API
import keras                                # High-level neural networks API
from extra_keras_datasets import emnist     # EMNIST dataset of hand-written digits and lowercase+uppercase English alphabets
from keras import backend as K 
from keras.layers import (                  # Neural network layers
    Conv2D, Conv2DTranspose, 
    Input, Flatten, Dense, 
    Reshape
)



<a id="Data Acquisition"></a>
## Data Acquisition

In [2]:
# Data loading
(x_train, _), (x_test, _) = emnist.load_data(type='balanced')

INFO:root:Loading dataset = emnist


In [3]:
print(f"Data Type : {x_train.dtype}")
print(f"Data Shape: {x_train.shape}")

Data Type : uint8
Data Shape: (112800, 28, 28)


In [4]:
x_train_size_bytes_uint8     = x_train.size * x_train.itemsize
x_train_size_megabytes_unit8 = x_train_size_bytes_uint8 / (1024 * 1024)

print(f"Size of x_train with unit8 format in memory: {x_train_size_megabytes_unit8} MB")

Size of x_train with unit8 format in memory: 84.33837890625 MB


**Note:** Data type: `uint8`, it means that the elements in the `x_train` array are of the type "unsigned 8-bit integer". In practical terms, a `uint8` data type can represent integers ranging from $0$ to $255$ (inclusive). This is a common format for image data where the intensity of each color channel (red, green, blue) in each pixel is represented as an integer from $0$ (no intensity) to $255$ (maximum intensity). So, if `x_train` is coming from an image dataset like EMNIST-Balanced, each element of `x_train` is an integer between $0$ and $255$ representing the grayscale intensity of a pixel in an image. Neural networks usually perform better with floating-point numbers, and work better with data in a normalized form, i.e., in the range of $0-1$. Converting the data type to 'float32' allows us to perform this normalization.

In [16]:
# Normalize the images to [0, 1]
x_train = x_train.astype('float32') / 255.
x_test  = x_test .astype('float32') / 255.

# Check again the type of elements in x_train after formatting
print(f"Data Type: {x_train.dtype}")

Data Type: float32


In [17]:
x_train_size_bytes_float32     = x_train.size * x_train.itemsize
x_train_size_megabytes_float32 = x_train_size_bytes_float32 / (1024 * 1024)

print(f"Size of x_train with float32 format in memory: {x_train_size_megabytes_float32} MB")

Size of x_train with float32 format in memory: 337.353515625 MB


In [18]:
# Retrieving the number of images and their dimensions from the train set
img_num_train, img_height, img_width = x_train.shape[:3]

# Displaying the train set information
print(f"The EMNIST-Balanced train set contains {img_num_train} images, each with dimensions:"
      f"\n(width x height) = ({img_width} x {img_height}) pixels.")

# Retrieving the number of images from the test set
img_num_test  = x_test.shape[0]

# Displaying the test set information
print(f"The EMNIST-Balanced test set also contains {img_num_test} images with the same dimensions as the train set.")

The EMNIST-Balanced train set contains 112800 images, each with dimensions:
(width x height) = (28 x 28) pixels.
The EMNIST-Balanced test set also contains 18800 images with the same dimensions as the train set.


**Note**: In machine learning libraries like Keras, images need to be formatted in a specific shape (height, width, channels). The term "channels" refers to the number of color channels in the image. For grayscale images, there is only one channel. Therefore, we need to reshape our image data to fit this format.

In [10]:
# Define the number of channels: 1 for grayscale images
num_channels = 1

# Reshape the training and test datasets to include the channel dimension
x_train = x_train.reshape(img_num_train, img_height, img_width, num_channels)
x_test  = x_test .reshape(img_num_test , img_height, img_width, num_channels)

# Define the input dimensions for the CNN
input_dimensions = (img_height, img_width, num_channels)

# Display the reshaped dimensions
print(f"Dimensions of each image for the model: (img_height, img_width, num_channels) = {input_dimensions}.")
print(f"Reshaped training data shape: {x_train.shape}")

Dimensions of each image for the model: (img_height, img_width, num_channels) = (28, 28, 1).
Reshaped training data shape: (112800, 28, 28, 1)


<a id="Model Architecture"></a>
## Model Architecture


Autoencoders consist of two main parts:

1. **Encoder:** This part of the network compresses the input into a latent-space representation. It encodes the input data as a compressed representation in a reduced dimension. The encoder layer is typically followed by several hidden layers that help the network learn complex patterns in the data.

In [11]:
# Encoder

# Input Layer: Defines the shape of the input data for the encoder.
encoder_input_layer = Input(shape=input_dimensions, name='encoder_input_layer')

# Convolution Layers: Applies convolution operations to extract features from the input image.
encoder_layer = Conv2D(32, 3, padding='same', activation='relu')(encoder_input_layer)
encoder_layer = Conv2D(64, 3, padding='same', activation='relu', strides=(2, 2))(encoder_layer)
encoder_layer = Conv2D(64, 3, padding='same', activation='relu')(encoder_layer)
encoder_layer = Conv2D(64, 3, padding='same', activation='relu')(encoder_layer)

# Flattening Layer: Converts the 3D output of convolution layers into a 1D tensor for dense layers.
encoder_layer = Flatten()(encoder_layer)

# Dense Layer: A fully connected layer that combines extracted features and performs further learning.
encoder_layer = Dense(32, activation='relu')(encoder_layer)

# Storing the output shape for use in the decoder
encoder_output_shape = K.int_shape(encoder_layer)
print(f"Output shape of encoder: {encoder_output_shape}")

Output shape of encoder: (None, 32)


In [12]:
# Shape of the encoder is (None, 32) and using float32 data type
compressed_size_per_image = 32 * 4  # 32 elements, each 4 bytes for float32

# Total number of images in the dataset
num_images = 112800  

# Total compressed size in bytes
total_compressed_size_bytes = num_images * compressed_size_per_image

# Convert bytes to megabytes
total_compressed_size_megabytes = total_compressed_size_bytes / (1024 * 1024)

print(f"Total size of compressed dataset: {total_compressed_size_megabytes} MB")


Total size of compressed dataset: 13.76953125 MB
