# Task 1: Building Sign Detection
This notebook implements a machine perception pipeline to detect and extract building signs and their digits. Each detection is marked by a bounding box. OpenCV has been used as the computer vision library.

### Download Data
The data is available for download through a public link. After downloading, unzip the folder to get access to the data.

In [None]:
import gdown

# Download training and validation set
url = 'https://drive.google.com/uc?id=1Gdxb0R8ohGqI4yB4KufWYESl0wIc8r8o'
output = 'Data_2022_assignment_COMP3007.zip'
gdown.download(url, output)

In [2]:
!unzip {output} >/dev/null

In [None]:
# Download testing set
url = 'https://drive.google.com/uc?id=1vc5avjn2lRfnIDC2i7XOq22R70m6UTrH'
output = 'Testing_Data_2022.zip'
gdown.download(url, output)

In [4]:
!unzip {output} >/dev/null

### Define Directories
To access the data, various directories need to be defined. The data directory contains two subdirectories that correspond to the training and validation set. The test data directory contains the testing set. An output directory is also created in order to store the results of the extraction.

In [9]:
import os

# Helper function to create a directory
def create_dir(path):
  if not os.path.exists(path):
    os.mkdir(path)

# Train and valid directories
DATA_PATH = os.path.join('Data', 'BuildingSignageDetection')
TRAIN_PATH = os.path.join(DATA_PATH, 'train')
VALID_PATH = os.path.join(DATA_PATH, 'val')

# Test directory
TEST_PATH = os.path.join('TestData', 'BuildingSignageDetection')

# Output directories
OUT_PATH = 'task1_result'
TRAIN_OUT_PATH = os.path.join(OUT_PATH, 'train')
VALID_OUT_PATH = os.path.join(OUT_PATH, 'val')
TEST_OUT_PATH = os.path.join(OUT_PATH, 'test')

# Create the output directories
create_dir(OUT_PATH)
create_dir(TRAIN_OUT_PATH)
create_dir(VALID_OUT_PATH)
create_dir(TEST_OUT_PATH)

### The Detection Pipeline
A function has been created to perform the detection of the building sign. It needs a path to an image as well as an optional output path to store the result. If required, the bounding boxes can be visualized directly in the notebook.

The detection pipeline can be summarised in 5 steps:
1. Perform basic preprocessing to remove noise and standardise the image
2. Extract blobs using connected component labeling
3. Find digit candidates based on their dimension
4. Filter out wrong candidates by checking the closeness of their y-coordinates
5. Merge the contours of the digits to detect the entire building sign

As the digits are defined well in the image with strong contrast and consistent dimensions, it is easier to detect them first rather than go with a sign-first approach. The building sign is harder to detect due to the potential of low contrast against shadows and dark walls. Performing the detection the other way around has proven to be successful.

https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html

https://pyimagesearch.com/2021/02/22/opencv-connected-component-labeling-and-analysis/

https://docs.opencv.org/3.4/dd/d49/tutorial_py_contour_features.html

In [14]:
import cv2
import numpy as np
from google.colab.patches import cv2_imshow

def extract_sign(path, out_path=None, visualize=False):
  # Create a directory to store the outputs
  if out_path is not None:
    file_name = path.split('/')[-1]
    img_name = file_name.split('.')[0]
    sign_out_path = os.path.join(out_path, img_name)
    create_dir(sign_out_path)

  # Get the image
  img = cv2.imread(path)
  img_copy = img.copy()

  # Preprocess the image
  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  blur = cv2.GaussianBlur(gray, (5, 5), 0)
  thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[-1]

  # Create an empty image to store the digits
  digits = np.zeros(thresh.shape)

  # Perform blob extraction with connected component labeling
  output = cv2.connectedComponentsWithStats(thresh, 4, cv2.CV_32S)
  num_labels, labels, stats, centroids = output

  # Find possible digits
  good_stats = []
  for stat in stats:
    # Get the dimensions
    w, h = stat[cv2.CC_STAT_WIDTH], stat[cv2.CC_STAT_HEIGHT]

    # Assume the digits are relatively small and are tall in dimension
    if 5 <= w <= 45 and 15 <= h <= 65 and w < h:
      good_stats.append(stat)

  # Loop 3 candidates at a time to find the 3 digits of the sign
  for i in range(len(good_stats) - 1):
    # Get a list of 3 candidates
    candidates = good_stats[i:i+3]

    # Calculate the closeness of the candidates' y-coordinates
    y_coords = [candidate[cv2.CC_STAT_TOP] for candidate in candidates]
    y_coords_std = np.std(y_coords)

    # Calculate the closeness of the candidates' heights
    heights = [candidate[cv2.CC_STAT_HEIGHT] for candidate in candidates]
    heights_std = np.std(heights)

    # Extract digits if the candidates have similar y-coordinates and heights
    if y_coords_std < 10 and heights_std < 5:
      # Sort the digits from left to right
      candidates = sorted(candidates, key=lambda x:int(x[cv2.CC_STAT_LEFT]))

      # Extract the digits
      for j, candidate in enumerate(candidates):
        # Get the coordinates and dimensions
        x, y = candidate[cv2.CC_STAT_LEFT], candidate[cv2.CC_STAT_TOP]
        w, h = candidate[cv2.CC_STAT_WIDTH], candidate[cv2.CC_STAT_HEIGHT]

        # Add the digit to the digits image
        digit = np.zeros(thresh.shape)
        digit[y:y+h, x:x+w] = thresh[y:y+h, x:x+w]
        digits += digit

        # Save the cropped digit
        if out_path is not None:
          digit_path = os.path.join(sign_out_path, 'digit{}.jpg'.format(j))
          digit_crop = img_copy[y:y+h, x:x+w]
          cv2.imwrite(digit_path, digit_crop)

        # Draw the bounding box of the digit
        cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
      
      # Stop finding more digits
      break
    
  # Get all digit contours
  digits = np.uint8(digits)
  cnts = cv2.findContours(digits, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[0]

  # Merge the contours
  merged_cnt = []
  for cnt in cnts:
    for c in cnt:
      merged_cnt.append(c)
  merged_cnt = np.array(merged_cnt)

  # Get the bounding box of the sign from the merged contour
  x, y, w, h = cv2.boundingRect(merged_cnt)
  w_offset, h_offset = int(w * 0.25), int(h * 0.5)
  x -= w_offset
  y -= h_offset
  w += 2 * w_offset
  h += 2 * h_offset

  # Save the cropped sign
  if out_path is not None:
    sign_path = os.path.join(sign_out_path, 'sign.jpg')
    sign_crop = img_copy[y:y+h, x:x+w]
    cv2.imwrite(sign_path, sign_crop)

  # Draw the bounding box of the sign
  cv2.rectangle(img, (x, y), (x+w, y+h), (0, 0, 255), 2)

  # Save the bounding box image
  if out_path is not None:
    print('Output of {} has been saved to {}'.format(file_name, sign_out_path))
    boxes_path = os.path.join(sign_out_path, 'bounding_boxes.jpg')
    cv2.imwrite(boxes_path, img)

  # Visualize the bounding boxes
  if visualize:
    cv2_imshow(img)

### Detect Train Images
This cell detects all building signs in the train subdirectory.

In [None]:
train_img_paths = sorted([os.path.join(TRAIN_PATH, img_path)
                          for img_path in os.listdir(TRAIN_PATH)])

for img_path in train_img_paths:
  extract_sign(img_path, out_path=TRAIN_OUT_PATH, visualize=True)

### Detect Valid Images
This cell detects all building signs in the valid subdirectory.

In [None]:
valid_img_paths = sorted([os.path.join(VALID_PATH, img_path)
                          for img_path in os.listdir(VALID_PATH)])

for img_path in valid_img_paths:
  extract_sign(img_path, out_path=VALID_OUT_PATH, visualize=True)

### Detect Test Images
This cell detects all building signs in the test subdirectory.

In [None]:
test_img_paths = sorted([os.path.join(TEST_PATH, img_path)
                         for img_path in os.listdir(TEST_PATH)])

for img_path in test_img_paths:
  extract_sign(img_path, out_path=TEST_OUT_PATH, visualize=True)

### Detect Single Image
This cell detects a building sign from a single image.

In [None]:
img_path = os.path.join(TEST_PATH, 'test04.jpg')
extract_sign(img_path, visualize=True)