#**Connected component labeling**
Component labeling is basically extracting a region from the original image except that we try to find only the components which are “connected” which is determined by the application of the graph theory.



###Step 1: Image Loading and Preprocessing

Let’s first load our image and convert it to a grayscale image, this makes the algorithm much more efficient and accurate.

In [59]:
import cv2
import numpy as np
from google.colab.patches import cv2_imshow

# Loading the image
img = cv2.imread('images/img5.png')

# preprocess the image
gray_img = cv2.cvtColor(img , cv2.COLOR_BGR2GRAY)

#cv2_imshow(gray_img)

After this we’ll also apply a 7×7 Gaussian blur, this helps to remove unwanted edges and helps in a much more clear segmentation, which we’ll do in the next step.

In [60]:
# Applying 7x7 Gaussian Blur
blurred = cv2.GaussianBlur(gray_img, (7, 7), 0)

#cv2_imshow(blurred)

###Step 2: Thresholding
Thresholding is a very basic image segmentation technique that helps us separate the background and the foreground objects that are of interest to us. After applying the blur we’ll use the cv2.threshold function for image segmentation.

In [61]:
# Applying threshold
threshold = cv2.threshold(blurred, 0, 255,cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

#cv2_imshow(threshold)

###Step 3: Applying the Component Analysis Method
We first apply the cv2.connectedComponentsWithStats and then unpack the values it returns in different variables which we will use in the following steps, and let’s also create a new array to store all the components that we find.


In [62]:
# Apply the Component analysis function
analysis = cv2.connectedComponentsWithStats(threshold,
											4,
											cv2.CV_32S)
(totalLabels, label_ids, values, centroid) = analysis

# Initialize a new image to store
# all the output components
output = np.zeros(gray_img.shape, dtype="uint8")

#cv2_imshow(output)

Now that we have our components and analysis, let’s loop through each of the components and filter out the useful components.

###Step 4: Filter Out Useful Components
Let’s loop through each of the components and use the statistics we got in the last step to filter out useful components. For example, here I have used the Area value to filter out only the characters in the image. And after filtering out the components, we’ll use the label_ids variable to create a mask for the component that we’re looping through and use the bitwise_or operation on the mask to generate our final output. It sounds hard, but you’ll understand it better after implementing the code yourself.


In [63]:
# Loop through each component
for i in range(1, totalLabels):
    area = values[i, cv2.CC_STAT_AREA]  
  
    if (area > 140) and (area < 400):
        
        # Labels stores all the IDs of the components on the each pixel
        # It has the same dimension as the threshold
        # So we'll check the component
        # then convert it to 255 value to mark it white
        componentMask = (label_ids == i).astype("uint8") * 255
          
        # Creating the Final output mask
        output = cv2.bitwise_or(output, componentMask)



###Step 5: Visualize The Final Output
Now our final step is to simply display our original image and the final mask that we obtained.


In [65]:
#Original image
cv2_imshow(img)
#Filtered Components
cv2_imshow(output)