## Importing Libraries and Setting Environment Variables

In this cell, we import necessary libraries and set environment variables for TensorFlow and Keras. The imported libraries include:
- `numpy` and `matplotlib` for numerical operations and plotting.
- `os` for handling environment variables.
- `cv2` and `PIL` for image processing.
- `tensorflow.keras` for building and training the neural network model.


In [1]:
# Import necessary libraries and set environment variables for TensorFlow and Keras
import numpy as np
import matplotlib.pyplot as plt
from numpy.random import RandomState
import os

# Ensure correct GPU order for TensorFlow
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"

# Additional libraries for file handling, image processing, and machine learning
import string
from shutil import copyfile, rmtree
import re
import cv2
from PIL import Image, ImageDraw
import glob
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Reshape, Bidirectional, LSTM, Dense, Lambda
import tensorflow.keras.backend as K





## Global Variables and Helper Functions

This cell defines a global variable for the ground truth data path and several helper functions for processing text data:
- `get_Word(name)`: Extracts a word from a ground truth file.
- `evaluate_word(name)`: Processes the word extracted by `get_Word` by replacing certain characters.
- `get_lexicon_2(names)`: Generates a list of unique labels from a list of file names.
- `get_lengths(names)`: Creates a dictionary mapping file names to the length of the words they contain.
- `open_image(name, img_size=[100, 300])`: Opens and processes an image file, resizing and thresholding it, and retrieves the associated word.


In [2]:

sets = ['set_a','set_b','set_c' ]

def get_Word(name):
    file_name = name.split("/")[-1].split(".")[0]
    load_profile = open('/'.join(name.split('/')[:len(name.split('/'))-2]) + "/tru/" + file_name + ".tru", "r",encoding="latin-1")
    label = load_profile.read().splitlines()[6]
    word = re.search(r"AW2:(.*?);", label).group(1).split('|')[:-1]
    return word

def evaluate_word(name):
    word = get_Word(name)
    for i, car in enumerate(word):
        if car[-1] == "1" or car[-1] == "2":
            word[i] = "-"
    return word

def get_lexicon_2(names):
    arabic_labels = []
    for name in names:
        arabic_labels = arabic_labels + evaluate_word(name)
    return list(dict.fromkeys(arabic_labels))

def get_lengths(names):
    d = {}
    for name in names:
        file_name = name.split("/")[-1].split(".")[0]
        word = get_Word(name)
        d[file_name] = len(word)
    return d

def open_image(name, img_size=[100, 300]):
    img = cv2.imread(name, 0)
    img = cv2.resize(img, (img_size[1], img_size[0]), Image.LANCZOS)
    img = cv2.threshold(img, 255 // 2, 255, cv2.THRESH_BINARY)[1]
    img = cv2.bitwise_not(img)
    word = get_Word(name)
    return img, word

class Readf:
    def __init__(self, img_size=(100, 300), max_len=17, normed=False, batch_size=64, classes={}, mean=118.2423, std=36.72):
        self.batch_size = batch_size
        self.img_size = img_size
        self.normed = normed
        self.classes = classes
        self.max_len = max_len
        self.mean = mean
        self.std = std
        self.voc = list(self.classes.keys())
        if type(classes) == dict:
            self.blank = classes["-"]
   
    def make_target(self, text):
        return np.array([self.classes[char] if char in self.voc else self.classes['-'] for char in text])

    def get_labels(self, names):
        Y_data = np.full([len(names), self.max_len], self.blank)
        for i, name in enumerate(names):
            img, word = open_image(name, self.img_size)
            word = self.make_target(word)
            Y_data[i, 0:len(word)] = word
        return Y_data

    def get_blank_matrices(self):
        shape = (self.batch_size,) + self.img_size
        X_data = np.empty(shape)
        Y_data = np.full([self.batch_size, self.max_len], self.blank)
        input_length = np.ones((self.batch_size, 1))
        label_length = np.zeros((self.batch_size, 1))
        return X_data, Y_data, input_length, label_length

    def run_generator(self, names, downsample_factor=2):
        n_instances = len(names)
        N = n_instances // self.batch_size
        rem = n_instances % self.batch_size

        while True:
            X_data, Y_data, input_length, label_length = self.get_blank_matrices()
            i, n = 0, 0
            for name in names:
                img, word = open_image(name, self.img_size)
                word = self.make_target(word)
                if len(word) == 0:
                    continue
                Y_data[i, 0:len(word)] = word
                label_length[i] = len(word)
                input_length[i] = (self.img_size[0] + 4) // downsample_factor - 2
                X_data[i] = img[np.newaxis, :, :]
                i += 1
                if i == self.batch_size:
                    n += 1
                    inputs = {
                        'the_input': X_data,
                        'the_labels': Y_data,
                        'input_length': input_length,
                        'label_length': label_length,
                    }
                    outputs = {'ctc': np.zeros([self.batch_size])}
                    yield (inputs, outputs)
                    X_data, Y_data, input_length, label_length = self.get_blank_matrices()
                    i = 0
            if rem > 0:
                inputs = {
                    'the_input': X_data[:rem],
                    'the_labels': Y_data[:rem],
                    'input_length': input_length[:rem],
                    'label_length': label_length[:rem],
                }
                outputs = {'ctc': np.zeros([rem])}
                yield (inputs, outputs)

## Data Reading and Preprocessing Class

This cell defines the `Readf` class, which is responsible for reading and preprocessing image and text data. The class constructor (`__init__`) initializes various parameters such as image size, batch size, normalization parameters, and class labels. 

`make_target`: Converts text into a numerical array using the classes dictionary.   
`get_labels`: Creates a label matrix for given file names.   
`get_blank_matrices`: Initializes blank matrices for input data, labels, input lengths, and label lengths.   
`get_label_from_path`: Extracts the label from the file path.   
`run_generator`: Generator function to yield batches of data for training.   


In [3]:
class Readf:
    def __init__(self, img_size=(100, 300), max_len=17, normed=False, batch_size=64, classes={}, mean=118.2423, std=36.72):
        self.batch_size = batch_size
        self.img_size = img_size
        self.normed = normed
        self.classes = classes
        self.max_len = max_len
        self.mean = mean
        self.std = std
        self.voc = list(self.classes.keys())

        if type(classes) == dict:
            self.blank = classes["-"]

    def make_target(self, text):
        return np.array([self.classes[char] if char in self.voc else self.classes['-'] for char in text])

    def get_labels(self, names):
        Y_data = np.full([len(names), self.max_len], self.blank)
        for i, name in enumerate(names):
            img, word = open_image(name, self.img_size)
            word = self.make_target(word)
            Y_data[i, 0:len(word)] = word
        return Y_data

    def get_blank_matrices(self):
        shape = (self.batch_size,) + self.img_size
        X_data = np.empty(shape)
        Y_data = np.full([self.batch_size, self.max_len], self.blank)
        input_length = np.ones((self.batch_size, 1))
        label_length = np.zeros((self.batch_size, 1))
        return X_data, Y_data, input_length, label_length

    def get_label_from_path(self, img_path):
        # Example function to extract label from file path
        # You need to modify this according to how your labels are stored or derived
        file_name = os.path.basename(img_path)
        label_text = file_name.split('_')[0]  # Assuming the label is part of the file name
        
        # Convert label text to class indices, handle characters not in self.classes
        label = []
        for char in label_text:
            if char in self.classes:
                label.append(self.classes[char])
            else:
                print(f"Warning: Character '{char}' not found in classes.")
                label.append(self.blank)  # Append blank class or handle as needed

        return label

    def run_generator(self, names, downsample_factor=2):
        n_instances = len(names)
        N = n_instances // self.batch_size
        rem = n_instances % self.batch_size
    
        while True:
            X_data, Y_data, input_length, label_length = self.get_blank_matrices()
    
            i, n = 0, 0
    
            for name in names:
                img, word = open_image(name, self.img_size)
                word = self.make_target(word)
    
                # Skip if word length is zero
                if len(word) == 0:
                    continue
    
                Y_data[i, 0:len(word)] = word
                label_length[i] = len(word)
                input_length[i] = (self.img_size[0] + 4) // downsample_factor - 2
    
                X_data[i] = img[np.newaxis, :, :]
                i += 1
    
                if i == self.batch_size:
                    n += 1
                    inputs = {
                        'the_input': X_data,
                        'the_labels': Y_data,
                        'input_length': input_length,
                        'label_length': label_length,
                    }
                    outputs = {'ctc': np.zeros([self.batch_size])}
                    yield (inputs, outputs)
    
                    # Reset everything
                    X_data, Y_data, input_length, label_length = self.get_blank_matrices()
                    i = 0
    
            # Handle remaining instances
            if rem > 0:
                inputs = {
                    'the_input': X_data[:rem],
                    'the_labels': Y_data[:rem],
                    'input_length': input_length[:rem],
                    'label_length': label_length[:rem],
                }
                outputs = {'ctc': np.zeros([rem])}
                yield (inputs, outputs)


## CNN-LSTM Model Class

This cell defines the `CRNN` class, which creates a model combining Convolutional Neural Networks (CNN) and Long Short-Term Memory networks (LSTM). The class constructor (`__init__`) initializes various model parameters. The `create_model` method constructs the model architecture, including convolutional layers, reshaping layers, dense layers, bidirectional LSTM layers, and output layers. A custom Connectionist Temporal Classification (CTC) loss function is also defined to handle sequence prediction tasks.

`class CRNN`: Defines a class to create a CNN-LSTM model for OCR.   
`__init__`: Initializes the model parameters like image width, height, output size, maximum length, etc.   
`ctc_lambda_func`: Defines a custom CTC loss function for sequence prediction tasks.   
`create_model`: Creates the CNN-LSTM model.  
Defines input data shape.   
Adds convolutional layers with activation and pooling layers.   
Adds a reshaping layer to prepare data for the RNN.   
Adds dense and bidirectional LSTM layers.   
Adds output layers with softmax activation.   
Defines inputs for CTC loss and constructs the final model.   
Prints the model summary.   


In [4]:
class CRNN:
    def __init__(self, img_w, img_h, output_size, max_len):
        self.img_w = img_w
        self.img_h = img_h
        self.output_size = output_size
        self.max_len = max_len

        # Network parameters
        self.conv_filters = 16
        self.kernel_size = (3, 3)
        self.pool_size = 2
        self.time_dense_size = 32
        self.rnn_size = 512

        self.model = self.build_model()

    def ctc_lambda_func(self, args):
        y_pred, labels, input_length, label_length = args
        #y_pred = y_pred[:, 2:, :]

        return K.ctc_batch_cost(labels, y_pred, input_length, label_length)

    def build_model(self):
        # Input layer
        input_data = Input(name='the_input', shape=(self.img_h ,self.img_w), dtype='float32')
        
        # Expand dimensions to include channel dimension
        expanded_input = Lambda(lambda x: K.expand_dims(x, axis=-1))(input_data)
        
        # Convolutional layers
        conv_1 = Conv2D(self.conv_filters, self.kernel_size, padding='same', activation='relu', name='conv1')(expanded_input)
        pool_1 = MaxPooling2D(pool_size=(self.pool_size, self.pool_size), name='pool1')(conv_1)
        
        conv_2 = Conv2D(self.conv_filters, self.kernel_size, padding='same', activation='relu', name='conv2')(pool_1)
        pool_2 = MaxPooling2D(pool_size=(self.pool_size, self.pool_size), name='pool2')(conv_2)
        

        #Reshaping the outputs from a CNN to be the inputs of an RNN enables the integration of spatially encoded features with temporal dependencies for sequential data analysis
        conv_to_rnn_dims = (self.img_w // (self.pool_size * 2), self.img_h // (self.pool_size * 2) * self.conv_filters)
        reshaped = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(pool_2)
        
        # Dense layer
        dense = Dense(self.time_dense_size, activation='relu', name='dense')(reshaped)
        
        # RNN layers
        rnn = Bidirectional(LSTM(self.rnn_size, return_sequences=True), name='biLSTM')(dense)
        
        # Output layer
        y_pred = Dense(self.output_size, activation='softmax', name='softmax')(rnn)
        
        
       
        
        labels = Input(name='the_labels', shape=[self.max_len], dtype='float32')
        input_length = Input(name='input_length', shape=[1], dtype='int64')
        label_length = Input(name='label_length', shape=[1], dtype='int64')
        
        ctc_loss = Lambda(self.ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
        
        model = Model(inputs=[input_data, labels, input_length, label_length], outputs=[ctc_loss, y_pred])
        model.summary()



        return model

## CTC Loss Explained in the CRNN Model

The code above defines a custom CTC loss function (`ctc_lambda_func`) used within a CRNN model for Optical Character Recognition (OCR). Here's a breakdown of CTC and its role in this context:

**Connectionist Temporal Classification (CTC):**

* CTC is a loss function specifically designed for sequence labeling tasks, particularly those with variable-length outputs like OCR. 
* In OCR, the model predicts a sequence of characters for each image, and the length of the predicted sequence might differ from the actual text length.
* CTC can handle these variable-length sequences by aligning the predicted characters with the ground truth (actual text) in an optimal way, even if insertions or deletions occur in the prediction. 

**How CTC Works:**

* CTC considers all possible alignments between the predicted sequence and the ground truth.
* It calculates a probability score for each alignment, considering the likelihood of each character prediction at each position.
* The CTC loss function then computes the negative log-likelihood of the most probable alignment. 

**Implementation in the Code:**

* The `ctc_lambda_func` defines the custom CTC loss function. 
* It takes four arguments:
    * `y_pred`: The model's predicted probabilities for each character at each position in the sequence.
    * `labels`: The ground truth labels (one-hot encoded) representing the actual characters.
    * `input_length`: An array indicating the length of the input sequence (likely all set to the same value).
    * `label_length`: An array indicating the length of the ground truth label sequence (may vary depending on the actual text length).
* The function uses Keras' `K.ctc_batch_cost` function to calculate the CTC loss based on the provided arguments.

**Benefits of CTC:**

* CTC's ability to handle variable-length sequences makes it well-suited for OCR tasks.
* It allows the model to learn even from partially incorrect predictions by considering all possible alignments.

**Overall, the CTC loss function plays a crucial role in training the CRNN model. It helps the model learn to predict the most likely character sequences for the input images by penalizing deviations from the ground truth while accounting for potential variations in sequence lengths.**