## Name: Exploring Temporal Latent Bottlenecks for Image Classification
## Author: Aritra Roy Gosthipaty, Suvaditya Mukherjee
## Date Created: 06/03/2023
## Last Modified: 07/03/2023
## Description: Performing Image Classification with State-of-the-art Temporal Latent Bottleneck Mechanism.

## Introduction

The following example explores how we can make use of the new Temporal Latent Bottleneck mechanism to perform image classification on the CIFAR-100 dataset. We implement this model by making a custom `RNNCell` implementation in order to make a performant and vectorized design, as proposed by [Didolkar et. al](https://arxiv.org/abs/2205.14794).  

A simple Recurrent Neural Network displays strong [inductive bias](https://en.wikipedia.org/wiki/Inductive_bias), i.e. the ability to generalize well within a specific domain. But it faces the significant problem of Vanishing/Exploding Gradients, along with the inability to store hidden-state information for long sequences.  

On the other end of the spectrum, the concept of the [Attention-based Transformer mechanism as introduced by Vaswani et. al](https://arxiv.org/abs/1706.03762) has shown considerable improvements in those departments, wherein it has achieved State-of-the-art results in Natural Language Processing tasks while also being adapted and used considerably in the Vision domain. While the Transformer has the ability to "attend" to different sections of the input sequence, it suffers from lacking inductive bias. This makes the mechanism prone to not generalizing well to domain-specific tasks.  

This paper combines the concepts from both ends of the spectrum in order to make a new mechanism which has the ability to tackle the problem of inductive biases, vanishing/exploding gradient and loss of information with higher sequence lengths. While this method has the novelty of introducing different processing streams in order to preserve and process latent states, it has parallels drawn in other works like the [Perceiver Mechanism by Jaegle et. al.](https://arxiv.org/abs/2103.03206) and [Grounded Language Learning Fast and Slow by Hill et. al.](https://arxiv.org/pdf/2009.01719.pdf).  

This example is structured as follows:
- Perform necessary imports
- Load the [CIFAR-100 dataset](https://www.cs.toronto.edu/~kriz/cifar.html)
- Visualize random samples from the dataset
- Define Base layer for Attention and `PatchEmbed` layer for performing Patching and Embedding operations
- Define the `SelfAttentionWithFFN` and `CrossAttentionWithFFN` layers
- Compose the Perceptual Module and Temporal Latent Bottleneck Module as a stack of `SelfAttentionWithFFN` and `CrossAttentionWithFFN` layers
- Create custom `RNNCell` implementation which makes use of the above-mentioned modules (vectorized) and load into a Recurrent Neural Network
- Define hyperparameters and `model.fit()` pipeline
- Perform inference and testing