<a href="https://colab.research.google.com/github/pushpakmangal/pushpakmangal/blob/main/id_NIC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Objective:
To create a model that summarizes the details on any government ID's . It generates the
textual information and the picture of that government ID in a summarized manner.

##Approach:
1. Data Collection and pre-processing:
a) Gather a large dataset of government ID images and their corresponding textual
information. Ensure that the data is diverse and representative of various ID types and
formats. b) Preprocess the images and textual data to prepare them for the training process. This
includes resizing images to a consistent size, converting textual data into numerical
representations, and normalizing the data. Eg for pre-processing code.
2. Building the Model:
a) Use TensorFlow to create a deep learning model that combines computer vision and
natural language processing components. b) For the image processing part, consider using convolutional neural networks (CNNs) to
extract features from the ID images. We can use pre-trained CNN architectures like VGG, ResNet, or custom-designed CNN layers. c) For the text processing part, we can use techniques like word embeddings (e.g., Word2Vec, GloVe) to convert words into numerical representations and feed them through recurrent
neural networks (RNNs) or transformer-based models like BERT. d) Merge the outputs from the image and text processing components to create a joint
representation.
3. Loss Function and Optimization:
a) Define an appropriate loss function that takes into account both the image and textual
components of the model. b) Use optimization algorithms like stochastic gradient descent (SGD), Adam, or RMSprop to
train the model.
4. Training:
a) Split your dataset into training and validation sets. b) Train the model on the training set and monitor its performance on the validation set. Adjust hyperparameters as needed.
5. Evaluation:
a) Evaluate the model on a separate test dataset to assess its performance in generating
accurate summaries.
6. Deployment:
a) Once the model is trained and evaluated, you can deploy it in a production environment
where it can take an input image of a government ID and provide both textual information
and a summarized image.

##Sample Code:

Note: This code is written for demonstration purpose, to get the intuition behind the task.

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, LSTM, Dense
from tensorflow.keras.models import Model
import cv2


# Dummy data
# In real world, it is replaced with actual ID images and corresponding texts.
images = tf.random.normal((100, 224, 224, 3))
texts = ["hello", "world", "tensorflow", "example", "text", "extraction"] * 20

def preprocess_image(image_path):
    # Read the image from the provided file path
    image = cv2.imread(image_path)

    # Convert the image to grayscale
    gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    # Apply Gaussian blur to the grayscale image
    blurred_image = cv2.GaussianBlur(gray_image, (3, 3), 0)

    # Apply thresholding to make the text more prominent
    _, threshold_image = cv2.threshold(blurred_image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

    return threshold_image

#purpose of this callback is to monitor the training progress and halt the training process if a certain condition is met
class myCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    '''
    Halts the training after reaching 60 percent accuracy

    Args:
      epoch (integer) - index of epoch (required but unused in the function definition below)
      logs (dict) - metric results from the training epoch
    '''

    # Check accuracy
    if(logs.get('loss') < 0.1):

      # Stop if threshold is met
      print("\nLoss is lower than 0.4 so cancelling training!")
      self.model.stop_training = True

# Instantiate class
callbacks = myCallback()


"""
Below code is using the ImageDataGenerator class from Keras to generate augmented image data for training a machine learning model.
The ImageDataGenerator class is a powerful tool that allows you to perform various data augmentation techniques on the fly during training,
which helps in improving model generalization and robustness.
After defining the augmentation and validation generators using ImageDataGenerator,
the code creates two data generators: train_data_gen and val_data_gen.
By using ImageDataGenerator with various augmentation techniques, the model is exposed to a wider range of data during training,
which helps prevent overfitting and improves its ability to generalize to unseen data.
It is a common practice to use data augmentation when training deep learning models, especially with limited training data

augmented_image_gen = ImageDataGenerator(
    rescale = 1/255.0,
    rotation_range=2,
    width_shift_range=.1,
    height_shift_range=.1,
    zoom_range=0.1,
    shear_range=2,
    brightness_range=[0.9, 1.1],
    validation_split=0.2,
   )

normal_image_gen = ImageDataGenerator(
    rescale = 1/255.0,
    validation_split=0.2,
   )

train_data_gen = augmented_image_gen.flow_from_directory(batch_size=batch_size,
      directory="tensorflow_training_images",
      shuffle=True,
      target_size=(IMG_HEIGHT, IMG_WIDTH),
      class_mode="categorical",
      subset='training')
val_data_gen = normal_image_gen.flow_from_directory(batch_size=batch_size,
      directory="tensorflow_training_images",
      shuffle=True,
      target_size=(IMG_HEIGHT, IMG_WIDTH),
      class_mode="categorical",
      subset='validation')

"""


# Create a vocabulary from all unique characters in the texts
vocab = sorted(set("".join(texts)))
vocab_size = len(vocab)

# Create a mapping from characters to indices and vice versa
char_to_idx = {char: idx for idx, char in enumerate(vocab)}
idx_to_char = {idx: char for char, idx in char_to_idx.items()}

# Convert texts to numerical sequences
sequences = [[char_to_idx[char] for char in text] for text in texts]

# Pad sequences to have the same length
max_seq_length = max(len(seq) for seq in sequences)
padded_sequences = tf.keras.preprocessing.sequence.pad_sequences(
    sequences, maxlen=max_seq_length, padding='post'
)

# Model architecture
input_shape = (224, 224, 3)
num_units = 128

# CNN for image processing
model_cnn = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(num_units, activation='relu')
])

# RNN for text decoding
model_rnn = tf.keras.models.Sequential([
    tf.keras.layers.Embedding(vocab_size, num_units, input_length=max_seq_length),
    tf.keras.layers.LSTM(num_units),
    tf.keras.layers.Dense(num_units, activation='relu')
])

# Combine CNN and RNN outputs
combined_output = tf.keras.layers.concatenate([model_cnn.output, model_rnn.output])
output = tf.keras.layers.Dense(vocab_size, activation='softmax')(combined_output)

# Create the model
model = tf.keras.models.Model(inputs=[model_cnn.input, model_rnn.input], outputs=output)

# Compile the model
optimizer = tf.keras.optimizers.SGD(lr=0.01)
model.compile(optimizer=optimizer, loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['categorical_accuracy'])

# Train the model
history = model.fit(x=[images, padded_sequences], y=padded_sequences[:, 1:], batch_size=32, epochs=150, callbacks=[callbacks])

# Function to predict text from an image
def extract_text_from_image(image_array):
    # Preprocess the image (if required)
    # image_array = preprocess_image(image_path)

    # Convert image_array to batch format
    image_array = tf.expand_dims(image_array, 0)

    # Get the CNN output for the image
    cnn_output = model.get_layer('dense').output

    # Create a dummy sequence for the decoder
    start_token = tf.constant([[char_to_idx['^']]])

    # Initialize the output sequence
    output_sequence = start_token

    # Maximum length of the predicted sequence
    max_length = 100

    # Predict the text character by character using the RNN
    for _ in range(max_length):
        prediction = model.predict([image_array, output_sequence])
        predicted_char_index = tf.argmax(prediction, axis=-1)[:, -1]
        output_sequence = tf.concat([output_sequence, predicted_char_index], axis=-1)

        # Stop if the end token is predicted
        if idx_to_char[predicted_char_index.numpy()[0, -1]] == '$':
            break

    # Convert the numerical sequence back to text
    predicted_text = "".join([idx_to_char[idx] for idx in output_sequence.numpy()[0]])

    # Remove start and end tokens and return the predicted text
    return predicted_text.replace('^', '').replace('$', '')

# Example usage:
image_path = 'image_path.jpg'
image_array = preprocess_image(image_path)
extracted_text = extract_text_from_image(image_array)
print("Extracted Text:")
print(extracted_text)
