# **Text Detection in Natural Images.**

**Task: Given an image, the model should predict the bounding boxes of all the texts in an image.**

Model Input: Image

Model Output: Text bounding boxes



**Model Used** : Frozen EAST Model
<br>
The Frozen EAST (Efficient and Accurate Scene Text detection) model is a deep learning-based text detection model designed to detect text efficiently and accurately in natural scenes.It is a smart computer model that's really good at finding text in pictures. It looks at images and can tell you where the text is, how it's oriented, and where the corners are. People use it for reading text in photos or documents.


## Methodology

The script processes a dataset of images using a text detection model. The steps involved include:

1. **Setting Up:**
   - Default dimensions (`inpWidth` and `inpHeight`) are established for image preprocessing.
   - Counters (`k` and `l`) are initialized for tracking processed images.

2. **Directory and Image Looping:**
   - It iterates through images in the specified directory.
   - Each image is read and its path is printed.

3. **Image Processing:**
   - Images are preprocessed for the neural network using OpenCV.
   - Preprocessed images are set as input to the neural network.

4. **Neural Network Inference:**
   - A forward pass through the neural network yields text detection results.
   - The `decode` function is applied for result decoding and post-processing, including Non-Maximum Suppression (NMS).

5. **Bounding Box Drawing:**
   - Ratios between input image dimensions and original image dimensions are computed.
   - Bounding boxes are drawn on the original images using OpenCV, connecting the 4 corners of rotated rectangles.

6. **Saving Processed Images:**
   - Processed images with drawn bounding boxes are saved to an output directory (`/content/gdrive/MyDrive/scene/Output`) with filenames like `processed_0image.jpg`.

7. **Iteration and Counter Update:**
   - The counter (`k`) is incremented for each processed image.

Uploading Dataset

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
%cd

/root


In [None]:
# !unzip /content/gdrive/MyDrive/Internship/scene.zip

Archive:  /content/gdrive/MyDrive/Internship/scene.zip
   creating: scene/SceneTrialTrain/
   creating: scene/SceneTrialTrain/apanar_06.08.2002/
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1247.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1252.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1253.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1255.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1259.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1261.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1263.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1265.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1269.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1281.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1282.JPG  
  inflating: scene/SceneTrialTrain/apanar_06.08.2002/IMG_1283.JPG  
  inflating: scene/SceneTrialTrain/apan

In [None]:
!unzip /content/gdrive/MyDrive/Internship/frozen_east_text_detection.tar.gz

Archive:  /content/gdrive/MyDrive/Internship/frozen_east_text_detection.tar.gz
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /content/gdrive/MyDrive/Internship/frozen_east_text_detection.tar.gz or
        /content/gdrive/MyDrive/Internship/frozen_east_text_detection.tar.gz.zip, and cannot find /content/gdrive/MyDrive/Internship/frozen_east_text_detection.tar.gz.ZIP, period.


Import libraries

In [None]:
import os
import cv2
import math
import numpy as np


In [None]:
# the path to the directory containing the dataset
dataset_path = '/content/gdrive/MyDrive/scene/SceneTrialTrain/apanar_06.08.2002'  # Update with the correct path

# Print the dataset directory
print(os.listdir(dataset_path))


['IMG_1247.JPG', 'IMG_1304.JPG', 'IMG_1284.JPG', 'IMG_1252.JPG', 'IMG_1308.JPG', 'IMG_1289.JPG', 'IMG_1306.JPG', 'IMG_1263.JPG', 'IMG_1312.JPG', 'IMG_1311.JPG', 'IMG_1294.JPG', 'IMG_1299.JPG', 'IMG_1315.JPG', 'IMG_1303.JPG', 'IMG_1255.JPG', 'IMG_1286.JPG', 'IMG_1293.JPG', 'IMG_1282.JPG', 'IMG_1301.JPG', 'IMG_1288.JPG', 'IMG_1281.JPG', 'IMG_1292.JPG', 'IMG_1269.JPG', 'IMG_1307.JPG', 'IMG_1253.JPG', 'IMG_1285.JPG', 'IMG_1265.JPG', 'IMG_1317.JPG', 'IMG_1261.JPG', 'IMG_1283.JPG', 'Img_1305.jpg', 'IMG_1302.JPG', 'IMG_1316.JPG', 'IMG_1300.JPG', 'IMG_1298.JPG', 'IMG_1290.JPG', 'Img_1313.jpg', 'IMG_1291.JPG', 'IMG_1259.JPG']


In [None]:
net = cv2.dnn.readNet("/content/gdrive/MyDrive/Internship/frozen_east_text_detection.pb")   #This is the model we get after extraction
frame = cv2.imread("/content/gdrive/MyDrive/scene/SceneTrialTrain/apanar_06.08.2002/IMG_1247.JPG")
inpWidth = inpHeight = 320  # A default dimension
# Preparing a blob to pass the image through the neural network
# Subtracting mean values used while training the model.
image_blob = cv2.dnn.blobFromImage(frame, 1.0, (inpWidth, inpHeight), (123.68, 116.78, 103.94), True, False)

**"feature_fusion/Conv_7/Sigmoid":**

- This is  part of the neural network responsible for determining potential locations of text in an image.
- It as a filter that examines various areas of the image and assigns a score (confidence) to each location, indicating the likelihood of containing text.
- The "Sigmoid" component is crucial because it ensures that the scores fall within the range of 0 to 1. Higher scores signify a greater confidence in detecting text.


**"feature_fusion/concat_3":**

- It combines the diverse pieces of information regarding the image, with a specific focus on the layout and structure of text.

- The "concat_3" designation indicates that it is merging information from three distinct sources or layers, suggesting a comprehensive synthesis of multiple aspects.


In [None]:
output_layer = []
output_layer.append("feature_fusion/Conv_7/Sigmoid")
output_layer.append("feature_fusion/concat_3")

In [None]:
net.setInput(image_blob)
output = net.forward(output_layer)
scores = output[0]
geometry = output[1]

In [None]:
def decode(scores, geometry, scoreThresh):
    detections = []
    confidences = []

    ############ CHECK DIMENSIONS AND SHAPES OF geometry AND scores ############
    assert len(scores.shape) == 4, "Incorrect dimensions of scores"
    assert len(geometry.shape) == 4, "Incorrect dimensions of geometry"
    assert scores.shape[0] == 1, "Invalid dimensions of scores"
    assert geometry.shape[0] == 1, "Invalid dimensions of geometry"
    assert scores.shape[1] == 1, "Invalid dimensions of scores"
    assert geometry.shape[1] == 5, "Invalid dimensions of geometry"
    assert scores.shape[2] == geometry.shape[2], "Invalid dimensions of scores and geometry"
    assert scores.shape[3] == geometry.shape[3], "Invalid dimensions of scores and geometry"
    height = scores.shape[2]
    width = scores.shape[3]
    for y in range(0, height):

        # Extract data from scores
        scoresData = scores[0][0][y]
        x0_data = geometry[0][0][y]
        x1_data = geometry[0][1][y]
        x2_data = geometry[0][2][y]
        x3_data = geometry[0][3][y]
        anglesData = geometry[0][4][y]
        for x in range(0, width):
            score = scoresData[x]

            # If score is lower than threshold score, move to next x
            if (score < scoreThresh):
                continue

            # Calculate offset
            offsetX = x * 4.0
            offsetY = y * 4.0
            angle = anglesData[x]

            # Calculate cos and sin of angle
            cosA = math.cos(angle)
            sinA = math.sin(angle)
            h = x0_data[x] + x2_data[x]
            w = x1_data[x] + x3_data[x]

            # Calculate offset
            offset = ([offsetX + cosA * x1_data[x] + sinA * x2_data[x], offsetY - sinA * x1_data[x] + cosA * x2_data[x]])

            # Find points for rectangle
            p1 = (-sinA * h + offset[0], -cosA * h + offset[1])
            p3 = (-cosA * w + offset[0], sinA * w + offset[1])
            center = (0.5 * (p1[0] + p3[0]), 0.5 * (p1[1] + p3[1]))
            detections.append((center, (w, h), -1 * angle * 180.0 / math.pi))
            confidences.append(float(score))

    # Return detections and confidences
    return [detections, confidences]

In [None]:
confThreshold = 0.5
nmsThreshold = 0.3
[boxes, confidences] = decode(scores, geometry, confThreshold)
print("Before NMS - Boxes:", boxes)

indices = cv2.dnn.NMSBoxesRotated(boxes, confidences, confThreshold, nmsThreshold)
print("After NMS - Indices:", indices)


Before NMS - Boxes: [((161.9157034615821, 126.76009177596072), (70.32239, 234.0636), -2.1353971723019565), ((164.93022602236312, 125.41121802325212), (70.495255, 234.05621), -2.502564236419747), ((160.92649868650665, 121.06581245211466), (66.555565, 212.1246), -1.9601857446182045), ((163.185644027931, 119.29403321174371), (66.61362, 214.11603), -2.3956103042065173), ((165.83314089460546, 119.38563417710002), (67.63495, 218.023), -2.644031114509051), ((167.85450328300774, 118.93717710933227), (68.709175, 218.36014), -3.3362878623627683), ((161.21534947540727, 119.41888125497927), (64.538376, 199.77347), -2.032944334751765), ((162.36764892336294, 113.18743150738108), (63.45588, 191.17123), -2.269241559277526), ((163.99697781784545, 112.52121118268353), (64.09837, 197.59515), -2.1343897193993797), ((166.46078314441604, 113.3611904777438), (65.39198, 201.9715), -2.920899662704688), ((166.95614593794204, 113.28275523064343), (66.61212, 202.83267), -3.396584772355641), ((162.24050915882572, 

In [None]:
# Loop over the indices after NMS
    # Get the rotated bounding box parameters
(cx, cy), (w, h), angle = boxes[77]

    # Get the 4 corners of the rotated rectangl
rect = cv2.boxPoints(((cx, cy), (w, h), angle))
rect = np.int0(rect)

    # Draw the rotated bounding box on the image
cv2.drawContours(frame, [rect], 0, (0, 255, 0), 2)




array([[[122, 126, 127],
        [121, 125, 126],
        [122, 125, 129],
        ...,
        [135, 138, 142],
        [135, 138, 142],
        [135, 138, 143]],

       [[123, 128, 127],
        [123, 127, 128],
        [125, 129, 130],
        ...,
        [133, 136, 141],
        [133, 136, 141],
        [132, 134, 142]],

       [[123, 128, 126],
        [123, 128, 127],
        [125, 130, 129],
        ...,
        [135, 137, 145],
        [135, 136, 146],
        [136, 137, 147]],

       ...,

       [[ 94,  99, 108],
        [ 95, 100, 109],
        [ 95, 100, 109],
        ...,
        [ 94, 106, 112],
        [ 96, 107, 111],
        [ 94, 105, 109]],

       [[ 95,  98, 106],
        [ 97, 100, 108],
        [ 97, 100, 108],
        ...,
        [ 94, 108, 114],
        [ 96, 108, 112],
        [ 93, 105, 109]],

       [[ 95,  98, 106],
        [ 98, 101, 109],
        [ 99, 102, 110],
        ...,
        [ 96, 110, 116],
        [ 98, 110, 114],
        [ 95, 107, 111]]

In [None]:
# Set the default dimensions
inpWidth, inpHeight = 320, 320
k = 0
l = 0

dir = "/content/gdrive/MyDrive/scene/SceneTrialTrain"
# Loop through each image in the dataset
for path in [x[1] for x in os.walk("/content/gdrive/MyDrive/scene/SceneTrialTrain")]:
  for pt in path:
    for filename in os.listdir(os.path.join(dir,pt)):
        if filename.endswith(('.jpg', '.jpeg', '.png','JPG')):
            # Read the image
            img_path = os.path.join(os.path.join(dir,pt), filename)
            print(img_path)
            frame = cv2.imread(img_path)
            # Preprocess the image
            image_blob = cv2.dnn.blobFromImage(frame, 1.0, (inpWidth, inpHeight), (123.68, 116.78, 103.94), True, False)

            # Set the input for the neural network
            net.setInput(image_blob)

            # Forward pass through the neural network
            output = net.forward(["feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3"])
            scores = output[0]
            geometry = output[1]

            # Decode and post-process the results (use your decode function)
            confThreshold = 0.5
            nmsThreshold = 0.3
            [boxes, confidences] = decode(scores, geometry, confThreshold)
            indices = cv2.dnn.NMSBoxesRotated(boxes, confidences, confThreshold, nmsThreshold)
            # Post-processing: Decode and apply NMS

            # Get image dimensions
            height_ = frame.shape[0]
            width_ = frame.shape[1]
            rW = width_ / float(inpWidth)
            rH = height_ / float(inpHeight)

            # Draw bounding boxes on the original image
            for i in indices:
                # Get 4 corners of the rotated rect
                vertices = cv2.boxPoints(boxes[i])

                # Scale the bounding box coordinates based on the respective ratios
                for j in range(4):
                    vertices[j][0] *= rW
                    vertices[j][1] *= rH

                # Draw lines connecting the 4 corners to form the rotated rectangle
                for j in range(4):
                    p1 = (int(vertices[j][0]), int(vertices[j][1]))
                    p2 = (int(vertices[(j + 1) % 4][0]), int(vertices[(j + 1) % 4][1]))
                    cv2.line(frame, p1, p2, (0, 255, 0), 3)

            # Save the processed image
            cv2.imwrite(os.path.join('/content/gdrive/MyDrive/scene/Output', f"processed_{k}" + filename), frame)
            k = k + 1




/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2504.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2468.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2509.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2466.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2510.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2484.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2473.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2518.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2520.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2461.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2511.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2506.JPG
/content/gdrive/MyDrive/scene/SceneTrialTrain/lfsosa_12.08.2002/IMG_2478.JPG

**Using Testing Data**

In [None]:
 #Set the default dimensions
inpWidth, inpHeight = 320, 320
k = 0
l = 0

dir = "/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest"
# Loop through each image in the dataset
for path in [x[1] for x in os.walk("/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest")]:
  for pt in path:
    for filename in os.listdir(os.path.join(dir,pt)):
        if filename.endswith(('.jpg', '.jpeg', '.png','JPG')):  # Add more image extensions if needed
            # Read the image
            img_path = os.path.join(os.path.join(dir,pt), filename)
            print(img_path)
            frame = cv2.imread(img_path)
            # Preprocess the image
            image_blob = cv2.dnn.blobFromImage(frame, 1.0, (inpWidth, inpHeight), (123.68, 116.78, 103.94), True, False)

            # Set the input for the neural network
            net.setInput(image_blob)

            # Forward pass through the neural network
            output = net.forward(["feature_fusion/Conv_7/Sigmoid", "feature_fusion/concat_3"])
            scores = output[0]
            geometry = output[1]

            # Decode and post-process the results (use your decode function)
            confThreshold = 0.5
            nmsThreshold = 0.3
            [boxes, confidences] = decode(scores, geometry, confThreshold)
            indices = cv2.dnn.NMSBoxesRotated(boxes, confidences, confThreshold, nmsThreshold)
            # Post-processing: Decode and apply NMS

            # Get image dimensions
            height_ = frame.shape[0]
            width_ = frame.shape[1]
            rW = width_ / float(inpWidth)
            rH = height_ / float(inpHeight)

            # Draw bounding boxes on the original image
            for i in indices:
                # Get 4 corners of the rotated rect
                vertices = cv2.boxPoints(boxes[i])

                # Scale the bounding box coordinates based on the respective ratios
                for j in range(4):
                    vertices[j][0] *= rW
                    vertices[j][1] *= rH

                # Draw lines connecting the 4 corners to form the rotated rectangle
                for j in range(4):
                    p1 = (int(vertices[j][0]), int(vertices[j][1]))
                    p2 = (int(vertices[(j + 1) % 4][0]), int(vertices[(j + 1) % 4][1]))
                    cv2.line(frame, p1, p2, (0, 255, 0), 3)

            # Save the processed image
            cv2.imwrite(os.path.join('/content/gdrive/MyDrive/scene/Train_Output', f"processed_{k}" + filename), frame)
            k = k + 1

/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICT0010.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICT0050.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICT0045.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICTs0004.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICT0015.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICT0051.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICT0040.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICT0014.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICT0029.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/PICT0048.JPG
/content/gdrive/MyDrive/Internship/Test data/SceneTrialTest/ryoungt_05.08.2002/

## Results


[**Train_Output**](https://drive.google.com/drive/folders/1SgWaD2dJnHdO6UMq-aiEUy515DzCsRSE?usp=sharing)

[**Test_Output**](https://drive.google.com/drive/folders/1SgWaD2dJnHdO6UMq-aiEUy515DzCsRSE?usp=sharing)



## Conclusion

The script successfully applies text detection to a set of images, visualizing the results by drawing bounding boxes on the original images.

