<a href="https://colab.research.google.com/github/sanikak96/Data-Science-Projects/blob/master/EASTER2.0_Notebook_Implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Handwriting Text Recognition: EASTER 2.0

Handwriting Recognition also termed as HTR(Handwritten Text Recognition) is a machine learning method that aims at giving the machines an ability to read human handwriting from real-world documents(images). 

### Challenges in HTR 
- They require a huge amount of labeled training data.
- Due to a large number of training parameters, they are hard to train and slow in inference.
- As they are slow, they require huge deployment cost(hardware requirements) to make them useful in real-time applications.
-  Models are complex in nature and difficult to scale(stacked LSTMs, complex attention layers).

EASTER stands for Efficient and Scalable Text Recognizer. It is a fully convolutional architecture that utilizes only 1-D Convolutional layers in the encoder and adds a CTC-decoder(Connectionist Temporal Classification) at the end.

### How do you apply the one dimensional convolutions on a two-dimensional image?
-  Consider an input image of size 600 X 50 (W X H). Here, if you draw any vertical line in this image, you will only cut a single character (if not drawn in white-space), and if you draw a horizontal line you will probably end up cutting all the characters.
- Along the height of the image you will only find the properties of a single character while along the width you will find all different characters as you move from left-to-right.
- A one-dimensional filter of kernel size-3 actually means a filter of dimension 3 in the time dimension(along the width, 3 pixels at a time) that covers the overall height of 50 pixels(H). So, basically a filter of kernel size-3 means a filter of 3×50 (or 3xH) dimensions
- Each scan stores the information of the observed character(or part of the character).
- This information is finally passed to a softmax layer that gives a probability distribution over all the characters possible for each time-step along the width. This probability distribution is then passed to the CTC decoding layer to generate the final output sequence.

### EASTER 2.0
- Easter2.0 is built by stacking 14 layers of standard 1D convolutional layers, batch normalization layers, ReLU, and Dropout. 
- With the target of balancing the number of channels, a residual connection is first projected through a 1 x 1 convolution layer, followed by a batch normalization layer. The result of this batch normalization layer is combined with the SE layer’s output in the final convolution block, which comes before the ReLU and Dropout layers. 
- Finally, a softmax layer is used to predict the distribution of probabilities over the characters of a given vocabulary. 
- Using the SE module, Easter2.0 can access global context similarly to RNN/Transformers and has CNN’s speed and parameter efficiency. 

- In addition to the new suggested architecture, the authors introduced a novel data-augmentation technique (TACo). TACo divides the image into multiple small tiles of the same size. Then, vertical and horizontal tiles are corrupted by random noise. Finally, the modified tiles are joined back in the same order to create the new image. With TACo augmentations, the network can acquire valuable features and produce good results even on relatively small training sets.


- To evaluate Easter2.0, the authors use IAM, publicly available datasets, and focus only on the line-level dataset. The contribution of several components of the model has been evaluated, such as the effect of TACo Augmentations, the effect of Residual Connections, and the effect of Squeeze-and-Excitation. The metric elected to compare results in the experiments is the case-sensitive Character Error Rate (CER). Results show that dealing with a small training dataset, Easter2.0 surpass SOTA works.

### Downloading the Required Libraries

In [None]:
from google.colab import drive
drive.mount('/content/drive')
! pip install -r /content/drive/MyDrive/DEL_Data/requirements.txt

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Download Data 
- Create an account by registering
- Activate the account using the link in email id

### Run following cells if you wish to download the data from the notebook

In [None]:
USER = "sanika.kulkarni@ucdconnect.ie"
PASSWORD = "12345"

In [None]:
!wget  --save-cookies cookies.txt\
      --keep-session-cookies\
      --post-data 'email=$USER&password=$PASSWORD'\
      --delete-after\
        https://fki.tic.heia-fr.ch/login

--2022-10-08 21:06:42--  https://fki.tic.heia-fr.ch/login
Resolving fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)... 160.98.46.146
Connecting to fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)|160.98.46.146|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4787 (4.7K) [text/html]
Saving to: ‘login.tmp’


2022-10-08 21:06:43 (572 MB/s) - ‘login.tmp’ saved [4787/4787]

Removing login.tmp.


In [None]:
!wget --load-cookies cookies.txt https://fki.tic.heia-fr.ch/DBs/iamDB/data/lines.tgz

--2022-10-08 21:06:43--  https://fki.tic.heia-fr.ch/DBs/iamDB/data/lines.tgz
Resolving fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)... 160.98.46.146
Connecting to fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)|160.98.46.146|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: http://fki.tic.heia-fr.ch/login [following]
--2022-10-08 21:06:44--  http://fki.tic.heia-fr.ch/login
Connecting to fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)|160.98.46.146|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://fki.tic.heia-fr.ch/login [following]
--2022-10-08 21:06:44--  https://fki.tic.heia-fr.ch/login
Connecting to fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)|160.98.46.146|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4720 (4.6K) [text/html]
Saving to: ‘lines.tgz’


2022-10-08 21:06:45 (421 MB/s) - ‘lines.tgz’ saved [4720/4720]



In [None]:
!wget --load-cookies cookies.txt https://fki.tic.heia-fr.ch/DBs/iamDB/data/ascii.tgz

--2022-10-08 21:06:45--  https://fki.tic.heia-fr.ch/DBs/iamDB/data/ascii.tgz
Resolving fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)... 160.98.46.146
Connecting to fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)|160.98.46.146|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: http://fki.tic.heia-fr.ch/login [following]
--2022-10-08 21:06:46--  http://fki.tic.heia-fr.ch/login
Connecting to fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)|160.98.46.146|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://fki.tic.heia-fr.ch/login [following]
--2022-10-08 21:06:46--  https://fki.tic.heia-fr.ch/login
Connecting to fki.tic.heia-fr.ch (fki.tic.heia-fr.ch)|160.98.46.146|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4720 (4.6K) [text/html]
Saving to: ‘ascii.tgz’


2022-10-08 21:06:47 (520 MB/s) - ‘ascii.tgz’ saved [4720/4720]



In [None]:
!wget https://www.openslr.org/resources/56/splits.zip

--2022-10-08 21:06:47--  https://www.openslr.org/resources/56/splits.zip
Resolving www.openslr.org (www.openslr.org)... 46.101.158.64
Connecting to www.openslr.org (www.openslr.org)|46.101.158.64|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://us.openslr.org/resources/56/splits.zip [following]
--2022-10-08 21:06:48--  http://us.openslr.org/resources/56/splits.zip
Resolving us.openslr.org (us.openslr.org)... 46.101.158.64
Connecting to us.openslr.org (us.openslr.org)|46.101.158.64|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3304 (3.2K) [application/zip]
Saving to: ‘splits.zip’


2022-10-08 21:06:49 (448 MB/s) - ‘splits.zip’ saved [3304/3304]



In [None]:
!mkdir /content/data
!mkdir /content/data/lines
!mkdir /content/data/LWRT

In [None]:
!ls /content/drive/MyDrive/DEL_Data/

ascii.tar  lines.tar  requirements.txt	src


In [None]:
!tar xf /content/drive/MyDrive/DEL_Data/lines.tar -C /content/data/lines

In [None]:
!tar xf /content/drive/MyDrive/DEL_Data/ascii.tar -C /content/

In [None]:
!unzip splits.zip
!mv splits/train.uttlist  /content/data/LWRT/
!mv splits/test.uttlist  /content/data/LWRT/
!mv splits/validation.uttlist  /content/data/LWRT/

Archive:  splits.zip
  inflating: splits/test.uttlist     
  inflating: splits/train.uttlist    
  inflating: splits/validation.uttlist  


In [None]:
!mv /content/lines.txt /content/data/lines.txt

### Easter_model.py 
Creating the model architecture

### Notes on config.py
This file is used to define configuration settings. It should be edited before running the code. 
Settings for this code includes defining the foolowing:

1.   Data paths and input parameters
         DATA_PATH, INPUT_HEIGHT, INPUT_WIDTH, INPUT_SHAPE,TACO_AUGMENTAION_FRACTION

2.  Long lines augmentation options
      LONG_LINES (T/F), LONG_LINES_FRACTION

2.   Model training parameters:
          BATCH_SIZE, EPOCHS, VOCAB_SIZE = 80, DROPOUT (it is a flag), OUTPUT_SHAPE
         
3.  Learning parameters
         Learning rate and batch normalization parameters
4.  Option to initialize weights from pre-trained model by providing checkpoint path
5.  Paths to store model checkpoints when training
          CHECKPOINT_PATH, LOGS_DIR, BEST_MODEL_PATH 

In [None]:
import pandas as pd
import numpy as np
import cv2
import random
import itertools, os, time
import config
config.DATA_PATH = '/content/data/'
# Checkpoints parametes
config.CHECKPOINT_PATH = '/content/weights/EASTER2--{epoch:02d}--{loss:.02f}.hdf5'
config.LOGS_DIR = '/content/logs'
config.BEST_MODEL_PATH = "/content/weights/saved_checkpoint.hdf5"
import matplotlib.pyplot as plt
from tacobox import Taco
config.EPOCHS = 10

import tensorflow
import tensorflow.keras.backend as K

### Code for Loading the data and using Taco augmentation

TACo divides the image into multiple small tiles of the same size. Then, vertical and horizontal tiles are corrupted by random noise. Finally, the modified tiles are joined back in the same order to create the new image. With TACo augmentations, the network can acquire valuable features and produce good results even on relatively small training sets.

In [None]:
class Sample:
    "sample from the dataset"
    def __init__(self, gtText, filePath):
        self.gtText = gtText
        self.filePath = filePath
        
class data_loader:
    def __init__(self, path, batch_size):
        self.batchSize = batch_size
        self.samples = []
        self.currIdx = 0
        self.charList = []
        
        # creating taco object for augmentation (checkout Easter2.0 paper)
        self.mytaco = Taco(
            cp_vertical=0.2,
            cp_horizontal=0.25,
            max_tw_vertical=100,
            min_tw_vertical=10,
            max_tw_horizontal=50,
            min_tw_horizontal=10
        )
        
        f = open(path + 'lines.txt')
        chars = set()
        for line in f:
            if not line or line[0]=='#':
                continue
            lineSplit = line.strip().split(' ')
            assert len(lineSplit) >= 9
            fileNameSplit = lineSplit[0].split('-')
            fileName = path + 'lines/' + fileNameSplit[0] + '/' +\
                       fileNameSplit[0] + '-' + fileNameSplit[1] + '/' + lineSplit[0] + '.png'
            
            gtText = lineSplit[8].strip(" ").replace("|", " ")
            
            chars = chars.union(set(list(gtText)))
            self.samples.append(Sample(gtText, fileName))
        
        train_folders = [x.strip("\n") for x in open(path+"LWRT/train.uttlist").readlines()]
        validation_folders = [x.strip("\n") for x in open(path+"LWRT/validation.uttlist").readlines()]
        test_folders = [x.strip("\n") for x in open(path+"LWRT/test.uttlist").readlines()]

        self.trainSamples = []
        self.validationSamples = []
        self.testSamples = []

        for i in range(0, len(self.samples)):
            file = self.samples[i].filePath.split("/")[-1][:-4].strip(" ")
            folder = "-".join(file.split("-")[:-1])
            if (folder in train_folders): 
                self.trainSamples.append(self.samples[i])
            elif folder in validation_folders:
                self.validationSamples.append(self.samples[i])
            elif folder in test_folders:
                self.testSamples.append(self.samples[i])
        self.trainSet()
        self.charList = sorted(list(chars))
        
        
    def trainSet(self):
        self.currIdx = 0
        random.shuffle(self.trainSamples)
        self.samples = self.trainSamples

    def validationSet(self):
        self.currIdx = 0
        self.samples = self.validationSamples
        
    def testSet(self):
        self.currIdx = 0
        self.samples = self.testSamples
        
    def getIteratorInfo(self):
        return (self.currIdx // self.batchSize + 1, len(self.samples) // self.batchSize)

    def hasNext(self):
        return self.currIdx + self.batchSize <= len(self.samples)
    
    def preprocess(self, img, augment=True):
        if augment:
            img = self.apply_taco_augmentations(img)
            
        # scaling image [0, 1]
        img = img/255
        img = img.swapaxes(-2,-1)[...,::-1]
        target = np.ones((config.INPUT_WIDTH, config.INPUT_HEIGHT))
        new_x = config.INPUT_WIDTH/img.shape[0]
        new_y = config.INPUT_HEIGHT/img.shape[1]
        min_xy = min(new_x, new_y)
        new_x = int(img.shape[0]*min_xy)
        new_y = int(img.shape[1]*min_xy)
        img2 = cv2.resize(img, (new_y,new_x))
        target[:new_x,:new_y] = img2
        return 1 - (target)
    
    def apply_taco_augmentations(self, input_img):
        random_value = random.random()
        if random_value <= config.TACO_AUGMENTAION_FRACTION:
            augmented_img = self.mytaco.apply_vertical_taco(
                input_img, 
                corruption_type='random'
            )
        else:
            augmented_img = input_img
        return augmented_img

    def getNext(self, what='train'):
        while True:
            if ((self.currIdx + self.batchSize) <= len(self.samples)):
                
                itr = self.getIteratorInfo()
                batchRange = range(self.currIdx, self.currIdx + self.batchSize)
                if config.LONG_LINES:
                    random_batch_range = random.choices(range(0, len(self.samples)), k=self.batchSize)
                    
                gtTexts = np.ones([self.batchSize, config.OUTPUT_SHAPE])
                input_length = np.ones((self.batchSize,1))*config.OUTPUT_SHAPE
                label_length = np.zeros((self.batchSize,1))
                imgs = np.ones([self.batchSize, config.INPUT_WIDTH, config.INPUT_HEIGHT])
                j = 0;
                for ix, i in enumerate(batchRange):
                    img = cv2.imread(self.samples[i].filePath, cv2.IMREAD_GRAYSCALE)
                    if img is None:
                        img = np.zeros([config.INPUT_WIDTH, config.INPUT_HEIGHT])
                    text = self.samples[i].gtText
                    
                    if config.LONG_LINES:
                        if random.random() <= config.LONG_LINES_FRACTION:
                            index = random_batch_range[ix]
                            img2 = cv2.imread(self.samples[index].filePath, cv2.IMREAD_GRAYSCALE)
                            if img2 is None:
                                img2 = np.zeros([config.INPUT_WIDTH, config.INPUT_HEIGHT])
                            text2 = self.samples[index].gtText
                            
                            avg_w = (img.shape[1] + img2.shape[1])//2
                            avg_h = (img.shape[0] + img2.shape[0])//2
                            
                            resized1 = cv2.resize(img, (avg_w, avg_h))
                            resized2 = cv2.resize(img2, (avg_w, avg_h))
                            space_width = random.randint(config.INPUT_HEIGHT//4, 2*config.INPUT_HEIGHT)
                            space = np.ones((avg_h, space_width))*255
                            
                            img = np.hstack([resized1, space, resized2])
                            text = text + " " + text2
                            
                    if len(self.samples) < 3000:# FOR VALIDATION AND TEST SETS
                        eraser=-1
                    img = self.preprocess(img)                    
                    imgs[j] = img
                    
                    val = list(map(lambda x: self.charList.index(x), text))
                    while len(val) < config.OUTPUT_SHAPE:
                        val.append(len(self.charList))
                        
                    gtTexts[j] = (val)
                    label_length[j] = len(text)
                    input_length[j] = config.OUTPUT_SHAPE
                    j = j + 1
                    if False:
                        plt.figure( figsize = (20, 20))
                        plt.imshow(img)
                        plt.show()
                        
                self.currIdx += self.batchSize
                inputs = {
                        'the_input': imgs,
                        'the_labels': gtTexts,
                        'input_length': input_length,
                        'label_length': label_length,
                }
                outputs = {'ctc': np.zeros([self.batchSize])}
                yield (inputs,outputs)
            else:
                self.currIdx = 0
                
    def getValidationImage(self):
        batchRange = range(0, len(self.samples))
        imgs = []
        texts = []
        reals = []
        for i in batchRange:
            img1 = cv2.imread(self.samples[i].filePath, cv2.IMREAD_GRAYSCALE)
            real = cv2.imread(self.samples[i].filePath)
            if img1 is None:
                img1 = np.zeros([config.INPUT_WIDTH, config.INPUT_HEIGHT])
            img = self.preprocess(img1, augment=False)
            img = np.expand_dims(img,  0)
            text = self.samples[i].gtText
            imgs.append(img)
            texts.append(text)
            reals.append(real)
        self.currIdx += self.batchSize
        return imgs,texts,reals
    
    def getTestImage(self):
        batchRange = range(0, len(self.samples))
        imgs = []
        texts = []
        reals = []
        for i in batchRange:
            img1 = cv2.imread(self.samples[i].filePath, cv2.IMREAD_GRAYSCALE)
            real = cv2.imread(self.samples[i].filePath)
            if img1 is None:
                img1 = np.zeros([config.INPUT_WIDTH, config.INPUT_HEIGHT])
            img = self.preprocess(img1, augment=False)
            img = np.expand_dims(img,  0)
            text = self.samples[i].gtText
            imgs.append(img)
            texts.append(text)
            reals.append(real)
        self.currIdx += self.batchSize
        return imgs,texts,reals

### Code for building easter model
1. Define function for calculating CTC loss

4. Easter Unit 
### Notes on easter_model.py
https://sid2697.github.io/Blog_Sid/algorithm/2019/10/19/CTC-Loss.html

**CTC Loss:** This is used to calculate loss when it comes to HTR tasks.
1. **Encoding**
 There can be letters of different width and thus characters can take up more than one time-step in the image. This can lead to wrong predictions. To solve this issue, CTC merges all the repeating characters into a single character. 

![](https://sid2697.github.io/Blog_Sid/assets/images/CTC_2.png)



For example, if the word in the image is ‘hey’ where ‘h’ takes three time-steps, ‘e’ and ‘y’ take one time-step each. Then the output from the network using CTC will be ‘hhhey’, which as per our encoding scheme, gets collapsed to ‘hey’.

What about the words where there are repeating characters? For handling those cases, CTC introduces a pseudo-character called blank denoted as “-“ in the following examples. While encoding the text, if a character repeats, then a blank is placed between the characters in the output text. Let’s consider the word ‘meet’, possible encodings for it will be, ‘mm-ee-ee-t’, ‘mmm-e-e-ttt’, wrong encoding will be ‘mm-eee-tt’, as it’ll result in ‘met’ when decoded. The CRNN is trained to output the encoded text.



2. **Loss calculation and Decoding:**

![](https://sid2697.github.io/Blog_Sid/assets/images/CTC_3.png)

Corresponding character scores are multiplied together to get the score for one path. In Fig above, the score for the path “a–” is 0.4x0.7x0.6 = 0.168, and for the path “aaa” is 0.4x0.3x0.4 = 0.048. For getting the score corresponding to given ground truth, scores of all the paths to the corresponding text are summed up. The loss is the negative logarithm of probability, it can be calculated easily. This loss can be back-propagated and the network can be trained.

Decoding consists of the following two steps:

- Calculates the best path by considering the character with max probability at every time-step.
- This step involves removing blanks and duplicate characters, which results in the actual text.

For example, let us consider the matrix in Fig. If we consider the first time-step t0, then the character with maximum probability is ‘b’. For t1 and t2 character with maximum probability is ‘-‘ and ‘-‘ respectively. So, the output text according to the best path algorithm for matrix in Fig.3 after decoding is ‘b’.


**Advantages:**
- CTC is formulated in such a way, that it only requires the text that occurs in the image. We can ignore both the width and position of the characters in an image.
- There is no need for post-processing the output of the CTC operation! Using decoding techniques, we can directly get the result of the network.


In [None]:
def ctc_loss(args):
    y_pred, labels, input_length, label_length = args
    return K.ctc_batch_cost(
        labels, 
        y_pred, 
        input_length, 
        label_length
    )

def ctc_custom(args):
    """
    custom CTC loss
    """
    y_pred, labels, input_length, label_length = args
    ctc_loss = K.ctc_batch_cost(
        labels, 
        y_pred, 
        input_length, 
        label_length
    )
    p = tensorflow.exp(-ctc_loss)
    gamma = 0.5
    alpha=0.25 
    return alpha*(K.pow((1-p),gamma))*ctc_loss

2. Batch normalization function

In [None]:
def batch_norm(inputs):
    return tensorflow.keras.layers.BatchNormalization(
        momentum= config.BATCH_NORM_DECAY, 
        epsilon = config.BATCH_NORM_EPSILON
    )(inputs)


3. Squeeze and Excitation function for adding global context

CNNs use their convolutional filters to extract hierarchal information from images. Lower layers find trivial pieces of context like edges or high frequencies, while upper layers can detect faces, text or other complex geometrical shapes. They extract whatever is necessary to solve a task efficiently.

All of this works by fusing the spatial and channel information of an image. The different filters will first find spatial features in each input channel before adding the information across all available output channels.

Originally, CNNs gave same weights to all channels of the image. SE changes that by:

1. Getting a global understanding of each channel by squeezing the feature maps to a single numeric value. This results in a vector of size n, where n is equal to the number of convolutional channels. 
2. Afterwards, it is fed through a two-layer neural network having RELU and Sigmoid functions to obtain weights. These are then used on the original feature map to scale each channel based on its importance.

The local features are squeezed into a single global context vector of weights. The SE module broadcasts this context to each local feature vector thanks to an element-wise multiplication of context weights with features.

In [None]:
def add_global_context(data, filters):
    """
    1D Squeeze and Excitation Layer. 
    """
    pool = tensorflow.keras.layers.GlobalAveragePooling1D()(data)
    
    pool = tensorflow.keras.layers.Dense(
        filters//8, 
        activation='relu'
    )(pool)
    
    pool = tensorflow.keras.layers.Dense(
        filters, 
        activation='sigmoid'
    )(pool) 
    
    final = tensorflow.keras.layers.Multiply()([data, pool])
    return final

4. Easter block



   1. Components of EASTER block (Repeated): 
   
   Each block contains multiple repeating sub-blocks consisting of layers for 1-D Conv, Batch Normalisation, ReLU and Dropout. Different blocks utilize different number of convolutional filters and other hyperparameters. 
   
   2. Residual connections and global context:

   Residual connections are used to connect the output of older convolution layer with the input of current layer by skipping few layers in between.
   
   The third block conatins SE layer before ReLU and dropout to add global context.
   The current and old inputs are added toghether and this is then added to the output of SE layer before passing through ReLU and Dropout steps.

![](http://www.marktechpost.com/wp-content/uploads/2022/08/Screen-Shot-2022-08-03-at-8.50.55-AM.png)

In [1]:
def easter_unit(old, data, filters, kernel, stride, dropouts):
    """
    Easter unit with dense residual connections
    """
    old = tensorflow.keras.layers.Conv1D(
        filters = filters, 
        kernel_size = (1), 
        strides = (1),
        padding = "same"
    )(old)
    old = batch_norm(old)
    
    this = tensorflow.keras.layers.Conv1D(
        filters = filters, 
        kernel_size = (1), 
        strides = (1),
        padding = "same"
    )(data)
    this = batch_norm(this)
    
    old = tensorflow.keras.layers.Add()([old, this])
    
    #First Block
    data = tensorflow.keras.layers.Conv1D(
        filters = filters, 
        kernel_size = (kernel), 
        strides = (stride),
        padding = "same"
    )(data)
    
    data = batch_norm(data)
    data = tensorflow.keras.layers.Activation('relu')(data)
    data = tensorflow.keras.layers.Dropout(dropouts)(data)
    
    #Second Block
    data = tensorflow.keras.layers.Conv1D(
        filters = filters, 
        kernel_size = (kernel), 
        strides = (stride),
        padding = "same"
    )(data)
    
    data = batch_norm(data)
    data = tensorflow.keras.layers.Activation('relu')(data)
    data = tensorflow.keras.layers.Dropout(dropouts)(data)
    
    #Third Block
    data = tensorflow.keras.layers.Conv1D(
        filters = filters, 
        kernel_size = (kernel), 
        strides = (stride),
        padding = "same"
    )(data)
    
    data = batch_norm(data)
    
    #squeeze and excitation
    data = add_global_context(data, filters)
    
    final = tensorflow.keras.layers.Add()([old,data])
    
    data = tensorflow.keras.layers.Activation('relu')(final)
    data = tensorflow.keras.layers.Dropout(dropouts)(data)
       
    return data, old

5. Easter 2 code:

  1. First an input layer is created using input shape
  2. Two layers of conv1d are created (Block type A) with different filters
  3. The output of second layer is copied to 'old' which is then passed to the above `easter()` function along with the output itself and few more parameter values
  4. Three such blocks are created. 
  5. Again 2 Block type A are created with different filter sizes 
  6. Then a final output layer is created with softmax activation function
  7. Adam optimizer is used to optimize the ctc loss
  8. Model is complied

In [2]:
def Easter2():
    input_data = tensorflow.keras.layers.Input(
        name='the_input', 
        shape = config.INPUT_SHAPE
    )
    
    data = tensorflow.keras.layers.Conv1D(
        filters = 128, 
        kernel_size = (3), 
        strides = (2), 
        padding = "same"
    )(input_data)
    
    data = batch_norm(data)
    data = tensorflow.keras.layers.Activation('relu')(data)
    data = tensorflow.keras.layers.Dropout(0.2)(data)

    data = tensorflow.keras.layers.Conv1D(
        filters = 128, 
        kernel_size = (3), 
        strides = (2), 
        padding = "same"
    )(data)
    
    data = batch_norm(data)
    data = tensorflow.keras.layers.Activation('relu')(data)
    data = tensorflow.keras.layers.Dropout(0.2)(data)

    old = data

    # 3 * 3 Easter Blocks (with dense residuals)
    data, old = easter_unit(old, data, 256, 5, 1, 0.2)
    data, old = easter_unit(old, data, 256, 7, 1, 0.2 )
    data, old = easter_unit(old, data, 256, 9, 1, 0.3 )

    data = tensorflow.keras.layers.Conv1D(
        filters = 512, 
        kernel_size = (11), 
        strides = (1), 
        padding = "same", 
        dilation_rate = 2
    )(data)
    
    data = batch_norm(data)
    data = tensorflow.keras.layers.Activation('relu')(data)
    data = tensorflow.keras.layers.Dropout(0.4)(data)

    data = tensorflow.keras.layers.Conv1D(
        filters = 512, 
        kernel_size = (1), 
        strides = (1), 
        padding = "same"
    )(data)
    
    data = batch_norm(data)
    data = tensorflow.keras.layers.Activation('relu')(data)
    data = tensorflow.keras.layers.Dropout(0.4)(data)

    data = tensorflow.keras.layers.Conv1D(
        filters = config.VOCAB_SIZE, 
        kernel_size = (1), 
        strides = (1), 
        padding = "same"
    )(data)
    
    y_pred = tensorflow.keras.layers.Activation('softmax',name="Final")(data)

    # print model summary
    tensorflow.keras.models.Model(inputs = input_data, outputs = y_pred).summary()
 
    # Defining other training parameters
    Optimizer = tensorflow.keras.optimizers.Adam(lr = config.LEARNING_RATE)
    
    labels = tensorflow.keras.layers.Input(
        name = 'the_labels', 
        shape=[config.OUTPUT_SHAPE], 
        dtype='float32'
    )
    input_length = tensorflow.keras.layers.Input(
        name='input_length', 
        shape=[1],
        dtype='int64'
    )
    label_length = tensorflow.keras.layers.Input(
        name='label_length',
        shape=[1],
        dtype='int64'
    )
    
    output = tensorflow.keras.layers.Lambda(
        ctc_custom, output_shape=(1,),name='ctc'
    )([y_pred, labels, input_length, label_length])

    # compiling model
    model = tensorflow.keras.models.Model(
        inputs = [input_data, labels, input_length, label_length], outputs= output
    )
    
    model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer = Optimizer)
    return model

In [3]:
def train():
    #Creating Easter2 object
    model = Easter2()
    
    '''
    # Loading checkpoint for transfer/resuming learning
    if config.LOAD:
        print ("Intializing from checkpoint : ", config.LOAD_CHECKPOINT_PATH)
        model.load_weights(config.LOAD_CHECKPOINT_PATH)
        print ("Init weights loaded successfully....")'''
        
    # Loading Metadata, about training, validation and Test sets
    print ("loading metdata...")
    training_data = data_loader(config.DATA_PATH, config.BATCH_SIZE)
    validation_data = data_loader(config.DATA_PATH, config.BATCH_SIZE)
    test_data = data_loader(config.DATA_PATH, config.BATCH_SIZE)

    training_data.trainSet()
    validation_data.validationSet()
    test_data.testSet()

    print("Training Samples : ", len(training_data.samples))
    print("Validation Samples : ", len(validation_data.samples))
    print("Test Samples : ", len(test_data.samples))
    print("CharList Size : ", len(training_data.charList))
    
    # callback arguments
    CHECKPOINT = tensorflow.keras.callbacks.ModelCheckpoint(
        filepath = config.CHECKPOINT_PATH,
        monitor='loss', 
        verbose=1, 
        mode='min', 
        period = 2
    )
    
    TENSOR_BOARD = tensorflow.keras.callbacks.TensorBoard(
        log_dir=config.LOGS_DIR, 
        histogram_freq=0, 
        write_graph=True,
        write_images=False, 
        embeddings_freq=0
    )
    
    # steps per epoch calculation based on number of samples and batch size
    STEPS_PER_EPOCH = len(training_data.samples)//config.BATCH_SIZE
    VALIDATION_STEPS = len(validation_data.samples)//config.BATCH_SIZE

    # Start training with given parameters
    print ("Training Model...")
    model.fit_generator(
        generator = training_data.getNext(), 
        steps_per_epoch = STEPS_PER_EPOCH,
        epochs = config.EPOCHS,
        callbacks=[CHECKPOINT, TENSOR_BOARD],
        validation_data = validation_data.getNext(), 
        validation_steps = VALIDATION_STEPS
    )

In [None]:
train()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 the_input (InputLayer)         [(None, 2000, 80)]   0           []                               
                                                                                                  
 conv1d (Conv1D)                (None, 1000, 128)    30848       ['the_input[0][0]']              
                                                                                                  
 batch_normalization (BatchNorm  (None, 1000, 128)   512         ['conv1d[0][0]']                 
 alization)                                                                                       
                                                                                                  
 activation (Activation)        (None, 1000, 128)    0           ['batch_normalization[0][0]']

  super(Adam, self).__init__(name, **kwargs)


loading metdata...




Training Samples :  6482
Validation Samples :  976
Test Samples :  2915
CharList Size :  79
Training Model...




Epoch 1/10
Epoch 2/10
Epoch 2: saving model to /content/weights/EASTER2--02--43.77.hdf5
Epoch 3/10
Epoch 4/10
Epoch 4: saving model to /content/weights/EASTER2--04--39.54.hdf5
Epoch 5/10
Epoch 6/10
Epoch 6: saving model to /content/weights/EASTER2--06--37.15.hdf5
Epoch 7/10
Epoch 8/10
Epoch 8: saving model to /content/weights/EASTER2--08--33.78.hdf5
Epoch 9/10
Epoch 10/10
Epoch 10: saving model to /content/weights/EASTER2--10--31.49.hdf5


### Code for predicting using the trained model

In [None]:
import itertools
import numpy as np
from editdistance import eval as edit_distance
from tqdm import tqdm
import tensorflow as tf

1. Code to load the model
- If checkpoint path is empty then copy the path of best model using the config file
- Load model
- If unable to load then print the error message.

In [None]:
def load_easter_model(checkpoint_path):
    if checkpoint_path == "Empty":
        checkpoint_path = config.BEST_MODEL_PATH
    try:
        checkpoint = tensorflow.keras.models.load_model(
            checkpoint_path,
            custom_objects={'<lambda>': lambda x, y: y,
            'tf':tf}
        )
        
        EASTER = tensorflow.keras.models.Model(
            checkpoint.get_layer('the_input').input,
            checkpoint.get_layer('Final').output
        )
    except:
        print ("Unable to Load Checkpoint.")
        return None
    return EASTER

In [None]:
def decoder(output,letters):
    ret = []
    for j in range(output.shape[0]):
        out_best = list(np.argmax(output[j,:], 1))
        out_best = [k for k, g in itertools.groupby(out_best)]
        outstr = ''
        for c in out_best:
            if c < len(letters):
                outstr += letters[c]
        ret.append(outstr)
    return ret

In [None]:
def test_on_iam(show = True, partition='test', uncased=False, checkpoint="Empty"):
    
    print ("loading metdata...")
    training_data = data_loader(config.DATA_PATH, config.BATCH_SIZE)
    validation_data = data_loader(config.DATA_PATH, config.BATCH_SIZE)
    test_data = data_loader(config.DATA_PATH, config.BATCH_SIZE)

    training_data.trainSet()
    validation_data.validationSet()
    test_data.testSet()
    charlist = training_data.charList
    print ("loading checkpoint...")
    print ("calculating results...")
    
    model = load_easter_model(checkpoint)
    char_error = 0
    total_chars = 0
    
    batches = 1
    while batches > 0:
        batches = batches - 1
        if partition == 'validation':
            print ("Using Validation Partition")
            imgs, truths, _ = validation_data.getValidationImage()
        else:
            print ("Using Test Partition")
            imgs,truths,_ = test_data.getTestImage()

        print ("Number of Samples : ",len(imgs))
        for i in tqdm(range(0,len(imgs))):
            img = imgs[i]
            truth = truths[i].strip(" ").replace("  "," ")
            output = model.predict(img)
            prediction = decoder(output, charlist)
            output = (prediction[0].strip(" ").replace("  ", " "))
            if uncased:
                char_error += edit_distance(output.lower(),truth.lower())
            else:
                char_error += edit_distance(output,truth)
                
            total_chars += len(truth)
            if show:
                print ("Ground Truth :", truth)
                print("Prediction [",edit_distance(output,truth),"]  : ",output)
                print ("*"*50)
    print ("Character error rate is : ",(char_error/total_chars)*100)

In [None]:
checkpoint_path = "/content/weights/EASTER2--10--31.49.hdf5"

In [None]:
test_on_iam(show=False, partition="validation", checkpoint=checkpoint_path, uncased=True)

loading metdata...
loading checkpoint...
calculating results...
Using Validation Partition
Number of Samples :  976


100%|██████████| 976/976 [00:50<00:00, 19.37it/s]


Character error rate is :  63.972951736946115


In [None]:
test_on_iam(show=False, partition="test", checkpoint=checkpoint_path, uncased=True)

loading metdata...
loading checkpoint...
calculating results...
Using Test Partition
Number of Samples :  2915


100%|██████████| 2915/2915 [02:47<00:00, 17.45it/s]


Character error rate is :  67.88227971336003
