# Assignment 2 by Josh Thyng


## Part 1: Object Size Measurement

For this task, work on “book.jpg” to measure the size of a book placed on an A4 paper
with dimensions 27.8cm by 21.5cm.

1. Detection and Measurement: Utilize appropriate techniques to detect the book in
   the image and measure its dimensions (height and width in centimeters). Describe
   the methodology and rationale for your approach in your report.
2. Annotation: After detecting the book, annotate the image by drawing a rectangle
   around the book. Clearly display the measured width and height on the image.
3. Comparison and Documentation: The actual size of the book is 8 cm by 10.6 cm.
   Compare your measurements to these dimensions and document the results in
   your report. Aim for an error rate of less than 10% with the ideal method.
   Ensure that your report is detailed, including descriptions of the methods used, your
   results (screenshots), and any challenges you encountered.


## Write-Up

This project is meant to detect the size of a book placed on an A4 sheet of paper, outlining the book with a rectangle, and calculating its real-world dimensions (8cm x 10.6cm). The process was very difficult and took a couple steps including contour detection, _pixel-to-centimeter conversion_, and ordering of the book's corner points. Below is an image showing the final result:

![Book Detection Results](book_results.png)

## Challenges and Solutions

### 1. Contour Detection

Detecting the contours of the book was one of the most difficult parts of the project. Noise, lighting conditions, and irregularities in the image made it challenging to isolate the book's outline.

- **How I Did It**: I converted the image to the HSV color space for better color segmentation, created a mask based on the color of the book, and applied [morphological operations](https://docs.opencv.org/4.x/d9/d61/tutorial_py_morphological_ops.html) to clean up the mask.

### 2. Outlining the Book

After detecting the contours, outlining the book with a rectangle in my opinion seemed like the easiest way to help get the actual size.

- **How I Did It**: I used the `cv2.minAreaRect()` function to handle rotated objects and extract the smallest rectangle around the book. The rectangle was then drawn using the corner points calculated by `cv2.boxPoints()`. This was not to bad.

### 3. Converting Size to Pixels

Once the book was outlined, calculating its dimensions in pixels was the next challenge. Because the book could be tilted, the measurement needed to account for any angle.

- **How I Did it**: I used the Euclidean distance between opposite corners of the rectangle to calculate the width and height of the book in pixels.

### 4. The `order_points()` Function

The detected corner points were not always ordered in a consistent way, which complicated this a lot more.

- **How I Did It**: I wrote the `order_points()` function to reorder the four points into a predictable sequence: top-left, top-right, bottom-right, and bottom-left. This let me make sure it was reliable when computing the book's dimensions.

## Conclusion

This kind-of (not 100% accurately) detected and measured the dimensions of a book placed on an A4 sheet of paper.


In [1]:
import cv2
import numpy as np
import pytesseract


In [None]:
def load_and_preprocess_image(image_path):
    img = cv2.imread(image_path)  # reads in the 'book.jpg'
    hsv = cv2.cvtColor(
        img, cv2.COLOR_BGR2HSV
    )  # this just converts the image from BGR to HSV, this will help isolate the book
    return img, hsv


def create_mask(hsv, lower_color, upper_color, kernel_size=(7, 7)):
    mask = cv2.inRange(
        hsv, lower_color, upper_color
    )  # this creates us a binary image where if a pixel is in our color range they are white and if not they are black (from HSV value)
    kernel = np.ones(
        kernel_size, np.uint8
    )  # will be used later to perform transformations
    mask = cv2.morphologyEx(
        mask, cv2.MORPH_CLOSE, kernel
    )  # this makes our mask even clearer by filling in black holes if there are any
    mask = cv2.dilate(
        mask, kernel, iterations=1
    )  # this isn't needed but makes the area larger making contours more complete essentially
    return mask


def find_largest_contour(mask, min_area=5000):
    contours, _ = cv2.findContours(
        mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )  # this just gets us the boundaries of the white regions in the binary mask (what we did in previous function)
    large_contours = [
        cnt for cnt in contours if cv2.contourArea(cnt) > min_area
    ]  # just filters out smaller contours
    return (
        max(large_contours, key=cv2.contourArea) if large_contours else None
    )  # returns the contour with the largest area from filtered list


# function finds smallest rectangle even if rotated that encloses the objected represented by contour and draws it around the image.
def draw_rectangle(img, contour, color=(0, 0, 255), thickness=2):
    rect = cv2.minAreaRect(contour)
    box = cv2.boxPoints(rect)
    box = np.int0(box)
    cv2.drawContours(img, [box], 0, color, thickness)
    return box


# simply put this function will take in the unorder points of the four corners and then put them in order
# this is done with top-left, top-right, bottom-left, and bottom-right.
def order_points(pts):
    rect = np.zeros((4, 2), dtype="float32")
    s = pts.sum(axis=1)
    diff = np.diff(pts, axis=1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]
    return rect


# this function will just calcualte the Uuclidean distance between the corners, then averages the distances to account for any irregualrites in the measurement
# can happen if distored or rotated
def compute_dimensions(ordered_pts):
    (tl, tr, br, bl) = (
        ordered_pts  # unpacks the ordered points into four seperate variables, representing four corners.
    )
    widthA = np.linalg.norm(br - bl)
    widthB = np.linalg.norm(tr - tl)
    heightA = np.linalg.norm(tr - br)
    heightB = np.linalg.norm(tl - bl)
    return (widthA + widthB) / 2.0, (heightA + heightB) / 2.0


# this function is used to get the scale of the image by calculating the number of pixels per centimeter.
def get_paper_dimensions(image, known_dims_cm=(21.5, 27.8)):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edges = cv2.Canny(blurred, 50, 150)
    contours, _ = cv2.findContours(
        edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE
    )  # finds contours in the binary edge-detected image

    max_area, paper_contour = 0, None
    for cnt in contours:
        area = cv2.contourArea(cnt)
        peri = cv2.arcLength(cnt, True)
        approx = cv2.approxPolyDP(cnt, 0.02 * peri, True)
        if len(approx) == 4 and area > max_area:
            paper_contour, max_area = approx, area

    if paper_contour is None:
        print("no paper contour found")
        exit()

    ordered_pts = order_points(paper_contour.reshape(4, 2))
    width_px, height_px = compute_dimensions(ordered_pts)

    # compute pixels per cm
    pixels_per_cm = (
        (width_px / known_dims_cm[0]) + (height_px / known_dims_cm[1])
    ) / 2.0
    return pixels_per_cm


def detect_book_size(img, hsv, pixels_per_cm):
    # define color range for navy blue book cover
    lower_blue = np.array([90, 50, 20])
    upper_blue = np.array([130, 255, 255])

    mask = create_mask(hsv, lower_blue, upper_blue)
    book_contour = find_largest_contour(mask)

    if book_contour is not None:
        box = draw_rectangle(img, book_contour)
        ordered_box = order_points(box)
        book_width_px, book_height_px = compute_dimensions(ordered_box)

        # dalculate real-world book dimensions
        width_cm = round(book_width_px / pixels_per_cm, 1)
        height_cm = round(book_height_px / pixels_per_cm, 1)

        # display book dimensions on the image
        size_text = f"{width_cm} cm x {height_cm} cm"
        cv2.putText(
            img,
            size_text,
            (int(ordered_box[0][1]), int(ordered_box[0][1] - 250)),
            cv2.FONT_HERSHEY_COMPLEX,
            1,
            (0, 255, 255),
            2,
        )
    else:
        print("no contours found for the book")


def main():
    img, hsv = load_and_preprocess_image("book.jpg")

    # Get paper's pixels per cm ratio
    pixels_per_cm = get_paper_dimensions(img.copy())

    # Detect and measure the book on the paper
    detect_book_size(img, hsv, pixels_per_cm)

    # Display the final image with book size
    cv2.imshow("book with size", img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()


if __name__ == "__main__":
    main()

# Part 2: Text Recognition

## Write-Up

The goal was to detect and recognize text on traffic signs, but the OCR system did not succeed in producing accurate results for all signs.

![Sign](test_results.png)

## Lessons Learned

One of the main takeaways from this project was the complexity of using computer vision techniques to detect text. Although the model failed to recognize text in all images, **using Matplotlib** to visualize the image processing stages in real time was pretty helpful in understanding how to improve each case.

While I could not achieve high accuracy, Matplotlib helped me analyze how preprocessing affected the input images and showed what went wrong with certain transformations.

## Methods Used

### 1. **Preprocessing Techniques**

The first step to improving OCR accuracy was to preprocess the images:

- **Grayscale Conversion:** Simplified the images by reducing color variations.
- **Gaussian Blurring:** Reduced noise while preserving the edges of text characters.
- **Histogram Equalization:** Improved contrast between text and the background.
- **Adaptive Thresholding:** Improved binarization, particularly under uneven lighting conditions.
- **Morphological Transformations:** Removed small noise and filled gaps in text characters.
- **Image Resizing:** Enlarged the image to increase OCR performance on small text.
- **Sharpening:** Applied to improve the edges of text characters.

### 2. **Tesseract OCR Engine Configurations**

- **PSM (Page Segmentation Mode):** Depending on the expected layout of the text, different PSM modes were applied:
  - `--psm 8`: Treats the image as a single word.
  - `--psm 6`: Assumes a single uniform block of text.
- **OEM (OCR Engine Mode):** The LSTM OCR engine (`--oem 1`) was used for better recognition through deep learning models (_i don't think it worked for me -\_-_)

### 3. **Matplotlib for Testing and Debugging**

The most important factor in improving the model was using **Matplotlib** to display the images at various stages of preprocessing. By visualizing the changes applied at each step (blurring, thresholding, morphological operations), I got insight into how the changes were affecting the recognition results.

---

## Conclusion

Improving OCR accuracy for traffic sign recognition was a super hard. Some preprocessing techniques, like adaptive thresholding and morphological transformations, helped improve text detection, the task was complicated by the variations in image quality, text size, and environmental conditions.

So in conclusion I should have used a different model for each grapic instead of using one for each image.



In [3]:
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"


def preprocess_image(image_path):
    img = cv2.imread(image_path, cv2.IMREAD_COLOR)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    filtered = cv2.bilateralFilter(gray, 9, 75, 75)

    _, thresh = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
    morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=1)

    resized = cv2.resize(morph, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)

    kernel_sharpening = np.array([[-1, -1, -1], [-1, 9, -1], [-1, -1, -1]])
    cleaned_up = cv2.filter2D(resized, -1, kernel_sharpening)

    return cleaned_up


def extract_text(image, psm=6):

    custom_config = r"--oem 3 --psm {} -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789".format(
        psm
    )
    text = pytesseract.image_to_string(image, config=custom_config)
    return text.strip()


def calculate_accuracy(expected_text, recognized_text):
    expected_text = expected_text.lower()
    recognized_text = recognized_text.lower()

    correct_chars = sum(1 for e, r in zip(expected_text, recognized_text) if e == r)
    total_chars = len(expected_text)

    return correct_chars / total_chars * 100 if total_chars > 0 else 0


def process_sign_image(image_path, expected_text):
    processed_img = preprocess_image(image_path)
    if len(expected_text.split()) <= 1:
        psm = 8
    else:
        psm = 6

    recognized_text = extract_text(processed_img, psm=psm)
    accuracy = calculate_accuracy(expected_text, recognized_text)

    print(f"Expected Text: {expected_text}")
    print(f"Recognized Text: {recognized_text}")
    print(f"Accuracy: {accuracy:.2f}%\n")

    return recognized_text, accuracy


sign_images = [
    ("sign1.jpg", "DRIVE CAREFULLY"),
    ("sign2.jpg", "AHEAD"),
    ("sign3.jpg", "UTILITY WORK AHEAD"),
    ("sign4.jpg", "NO PASSING ZONE"),
]

for img_path, expected_text in sign_images:
    process_sign_image(img_path, expected_text)

Expected Text: DRIVE CAREFULLY
Recognized Text: DRIVE
CAREFULLY
Accuracy: 93.33%

Expected Text: AHEAD
Recognized Text: BO
Accuracy: 0.00%

Expected Text: UTILITY WORK AHEAD
Recognized Text: SO
OP 4
Accuracy: 0.00%

Expected Text: NO PASSING ZONE
Recognized Text: NQ
PASSING
ZONE
Accuracy: 80.00%

