#***Vision Transformer using TensorFlow 2.0***

By Nakshatra Singh

# Using Google GPU for Training
Google colab offers free GPUs and TPUs! Since we'll be training a large model it's best to take advantage of this (in this case we'll use GPU), otherwise training can take long time.

A GPU can be added by going to the menu and selecting:

`Edit -> Notebook Settings -> Hardware Accelerator -> (GPU)`

Then run the following cell to confirm that a GPU is detected.

In [1]:
import tensorflow as tf
# Get the device GPU name 
device_name = tf.test.gpu_device_name()

# The device name should look like the following:
if device_name == '/device:GPU:0':
  print('Found GPU at : {}'.format(device_name)) 
else:
  raise SystemError('GPU not found!') 

Found GPU at : /device:GPU:0


Firstly, we'll install a flexible and powerful python package for tensor operations to provide readable and reliable code. Supports numpy, pytorch and tensorflow.

In [2]:
!pip install einops==0.3.0 

Collecting einops==0.3.0
  Downloading https://files.pythonhosted.org/packages/5d/a0/9935e030634bf60ecd572c775f64ace82ceddf2f504a5fd3902438f07090/einops-0.3.0-py2.py3-none-any.whl
Installing collected packages: einops
Successfully installed einops-0.3.0


We'll be using the tensorflow.keras.datasets module which provides a few toy datasets (already-vectorized, in Numpy format) that can be used for debugging a model or creating simple code examples.

In [None]:
from tensorflow.keras import datasets

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data() 

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


Now, we'll cast the vectorized numpy arrays (in case of Tensor) or x.values (in case of SparseTensor or IndexedSlices) to .*float32* dtype and also reshape it to `n 32x32 RGB images`.


In [None]:
train_images = tf.cast(train_images.reshape((-1, 3, 32, 32)), dtype=tf.float32)
test_images = tf.cast(test_images.reshape((-1, 3, 32, 32)), dtype=tf.float32)
train_images, test_images = train_images / 255.0, test_images / 255.0 

Now, With the help of `tf.data.Dataset.from_tensor_slices()` method, we can get the slices of an array in the form of objects.

In [None]:
# Simplest way to create a train dataset is to create it from a python list
train_x = tf.data.Dataset.from_tensor_slices(train_images)
train_y = tf.data.Dataset.from_tensor_slices(train_labels)

# Zip it together
train_dataset = tf.data.Dataset.zip((train_x, train_y))
test_x = tf.data.Dataset.from_tensor_slices(test_images)

# Simplest way to create a test dataset is to create it from a python list
test_y = tf.data.Dataset.from_tensor_slices(test_labels)
test_dataset = tf.data.Dataset.zip((test_x, test_y)) 

We'll import the `TrainerConfig` class from the trainer.py file and set up the model optimization parameters. You can definitely change these parameters to achieve SOTA performance according to your problem.

In [None]:
from trainer import TrainerConfig 

tconf = TrainerConfig(max_epochs=10, batch_size=64, learning_rate=1e-3) 

We'll also initialize a sample model config which we will use for training the ViT model.

In [None]:
# NOTE: image dimensions must be divisible by the patch size
model_config = {"image_size":32,
                "patch_size":4,
                "num_classes":10,
                "dim":64,
                "depth":3,
                "heads":4,
                "mlp_dim":128} 

We'll import the `Trainer` class from the trainer.py file and `ViT` class from model.py file.

In [None]:
from trainer import Trainer
from model import ViT

trainer = Trainer(ViT, model_config, train_dataset, len(train_images), test_dataset, len(test_images), tconf) 

Finally, we'll train our CIFAR10 dataset, the performance is below average. The main purpose of the notebook is to demonstrate how to use Vision Transformers using Tensorflow 2.0, the authors of the [paper](https://arxiv.org/abs/2006.03677) have achieved SOTA performace with this amazing technique.

In [None]:
trainer.train() 

Epoch 1-> Train Loss 1.78300. Train Accuracy 0.34862
Epoch 1-> Test Loss 1.54439. Test Accuracy 0.43700
Epoch 2-> Train Loss 1.50706. Train Accuracy 0.45196
Epoch 2-> Test Loss 1.44042. Test Accuracy 0.47530
Epoch 3-> Train Loss 1.41017. Train Accuracy 0.48946
Epoch 3-> Test Loss 1.40274. Test Accuracy 0.49240
Epoch 4-> Train Loss 1.34558. Train Accuracy 0.51278
Epoch 4-> Test Loss 1.38715. Test Accuracy 0.49490
Epoch 5-> Train Loss 1.29309. Train Accuracy 0.53382
Epoch 5-> Test Loss 1.39821. Test Accuracy 0.49330
Epoch 6-> Train Loss 1.25046. Train Accuracy 0.54902
Epoch 6-> Test Loss 1.37738. Test Accuracy 0.50480
Epoch 7-> Train Loss 1.21084. Train Accuracy 0.56498
Epoch 7-> Test Loss 1.36944. Test Accuracy 0.50970
Epoch 8-> Train Loss 1.18018. Train Accuracy 0.57266
Epoch 8-> Test Loss 1.36997. Test Accuracy 0.51350
Epoch 9-> Train Loss 1.15001. Train Accuracy 0.58420
Epoch 9-> Test Loss 1.38436. Test Accuracy 0.51770
Epoch 10-> Train Loss 1.11912. Train Accuracy 0.59814
Epoch 10->