# Assignment 4: Wheres Waldo?
### Name: Eileanor LaRocco
In this assignment, you will develop an object detection algorithm to locate Waldo in a set of images. You will develop a model to detect the bounding box around Waldo. Your final task is to submit your predictions on Kaggle for evaluation.

### Process/Issues
- Double-checked that the images we were given were correctly bounded (did this by visualizing the boxes on the images - they look good!)
- Complication: Originally when I creating augmented images, the bounding box labels did not also augment. I also had to try out a few types of augmentation to see what made sense for waldo. The augmented images may still not be as different from one another as they could be which could allow the model to favor the training images that occur more frequently.
- Complication: Similarly, when resizing the images, ensuring the bounding boxes not only are also adjusted if necessary, but ensuring they do not get cut off and the image is not stretched/shrunk too much.
- Tried Yolo architecture first but produced too many boxes and did not work well. Tried faster rcnn architecture next and the inputs and outputs and processing steps for each were very different which was frustrating

### Imports

In [5]:
import os
import opendatasets as od
import pandas as pd
import numpy as np
import random
import csv
import matplotlib.pyplot as plt
import cv2

import shutil
from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data
from torch.utils.data import DataLoader
import torchvision
import torch.nn.functional as F
import torchvision.transforms as transforms
from PIL import Image

In [2]:
SEED = 1

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

device = device = torch.device("mps")
print(device)

mps


### Download Data

In [3]:
od.download('https://www.kaggle.com/competitions/2024-fall-ml-3-hw-4-wheres-waldo/data')

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username:Your Kaggle Key:Downloading 2024-fall-ml-3-hw-4-wheres-waldo.zip to ./2024-fall-ml-3-hw-4-wheres-waldo


100%|██████████| 38.2M/38.2M [00:01<00:00, 36.8MB/s]



Extracting archive ./2024-fall-ml-3-hw-4-wheres-waldo/2024-fall-ml-3-hw-4-wheres-waldo.zip to ./2024-fall-ml-3-hw-4-wheres-waldo


### Paths

In [3]:
train_folder = "2024-fall-ml-3-hw-4-wheres-waldo/train/train" # Original Train Images
test_folder = "2024-fall-ml-3-hw-4-wheres-waldo/test/test" # Original Test Images
annotations_file = "2024-fall-ml-3-hw-4-wheres-waldo/annotations.csv" # Original Annotations File

# Preprocessing

### Check size of each training imaage

In [6]:
# Train Images
import cv2

# Iterate over all images in the folder
for image_name in os.listdir(train_folder):
    if image_name.endswith((".jpg")):  # Check for common image extensions
        image_path = os.path.join(train_folder, image_name)
        
        # Read the image using OpenCV
        img = cv2.imread(image_path)
        if img is not None:
            height, width, channels = img.shape  # Get image size (height, width, channels)
            print(f"Image: {image_name}, Width: {width}, Height: {height}")
        else:
            print(f"Could not read image: {image_name}")

#Image 27 includes waldo 6 times, 10 2 times, 8 2 times
#16,22-27 do not include waldo in a background

Image: 8.jpg, Width: 2800, Height: 1760
Image: 9.jpg, Width: 1298, Height: 951
Image: 14.jpg, Width: 1700, Height: 2340
Image: 15.jpg, Width: 1600, Height: 1006
Image: 17.jpg, Width: 1599, Height: 1230
Image: 12.jpg, Width: 1276, Height: 1754
Image: 13.jpg, Width: 1280, Height: 864
Image: 11.jpg, Width: 2828, Height: 1828
Image: 10.jpg, Width: 1600, Height: 980
Image: 21.jpg, Width: 2048, Height: 1515
Image: 20.jpg, Width: 2953, Height: 2088
Image: 18.jpg, Width: 1590, Height: 981
Image: 19.jpg, Width: 1280, Height: 864
Image: 4.jpg, Width: 2048, Height: 1272
Image: 5.jpg, Width: 2100, Height: 1760
Image: 7.jpg, Width: 1949, Height: 1419
Image: 6.jpg, Width: 2048, Height: 1454
Image: 2.jpg, Width: 1286, Height: 946
Image: 3.jpg, Width: 2048, Height: 1346
Image: 1.jpg, Width: 2048, Height: 1251


### Draw bounding boxes on each training image to check accuracy

In [6]:
# Paths
output_folder = "2024-fall-ml-3-hw-4-wheres-waldo/checks"  # Folder to save images with drawn boxes

# Create the output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Read the CSV file
# Assumes the CSV columns are: filename, xmin, ymin, xmax, ymax
annotations = pd.read_csv(annotations_file)

# Iterate through each image in the annotations
for _, row in annotations.iterrows():
    image_name = row["filename"]
    x_min, y_min, x_max, y_max = row["xmin"], row["ymin"], row["xmax"], row["ymax"]
    
    # Load the image
    image_path = os.path.join(train_folder, image_name)
    if not os.path.exists(image_path):
        print(f"Image {image_path} not found. Skipping...")
        continue
    image = cv2.imread(image_path)
    
    # Draw the bounding box
    # cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (B, G, R), thickness)
    cv2.rectangle(image, (int(x_min), int(y_min)), (int(x_max), int(y_max)), (0, 255, 0), 4)
    
    # Optionally, add a label or text
    label = "Waldo"
    cv2.putText(image, label, (int(x_min), int(y_min) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    
    # Save the image
    output_path = os.path.join(output_folder, image_name)
    cv2.imwrite(output_path, image)

    print(f"Annotated image saved to {output_path}")

print("All bounding boxes have been drawn and saved.")

Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/1.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/10.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/11.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/12.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/13.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/14.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/15.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/17.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/18.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/19.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/2.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/3.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/4.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/checks/5.j

### Create More Training Images

In [7]:
#Capture each waldo bounding box and save to waldo folder to create set of waldos

# Define the paths
image_folder = train_folder
csv_path = annotations_file
output_folder = "2024-fall-ml-3-hw-4-wheres-waldo/train/waldo"


# Create the output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Read the CSV file
with open(csv_path, mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)  # Use DictReader for column names
    counter = 1

    for row in csv_reader:
        # Extract information from the CSV
        filename = row["filename"]
        x_min = int(row["xmin"])
        y_min = int(row["ymin"])
        x_max = int(row["xmax"])
        y_max = int(row["ymax"])

        base, ext = os.path.splitext(filename)  # Split filename into base and extension
        counter += 1

        # Construct the full path to the image
        image_path = os.path.join(image_folder, filename)

        # Check if the image exists
        if os.path.exists(image_path):
            with Image.open(image_path) as img:
                # Crop the image using bounding box coordinates
                cropped_img = img.crop((x_min, y_min, x_max, y_max))

                # Save the cropped image
                if os.path.exists(os.path.join(output_folder, f"cropped_{filename}")):
                    filename = f"{base}_{counter}{ext}"  # Add suffix to filename
                    #counter += 1
                    output_path = os.path.join(output_folder, f"cropped_{filename}")
                    cropped_img.save(output_path)
                else:
                    output_path = os.path.join(output_folder, f"cropped_{filename}")
                    cropped_img.save(output_path)
        else:
            print(f"Warning: {filename} not found in {image_folder}")

In [8]:
### Remove Waldo images that are wonky (too big)

def delete_files_from_list(folder_path, file_names_to_delete):
    """Deletes files in the given folder if their name is in the provided list."""

    for filename in os.listdir(folder_path):
        if filename in file_names_to_delete:
            file_path = os.path.join(folder_path, filename)
            os.remove(file_path)
            print(f"Deleted: {file_path}")

if __name__ == "__main__":
    folder_path =  "2024-fall-ml-3-hw-4-wheres-waldo/train/waldo"
    file_names_to_delete = ["cropped_8_20.jpg", "cropped_10_21.jpg", "cropped_16.jpg", "cropped_24.jpg", "cropped_25.jpg", "cropped_26.jpg", "cropped_27_35.jpg"]
    delete_files_from_list(folder_path, file_names_to_delete)

Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/waldo/cropped_16.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/waldo/cropped_8_20.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/waldo/cropped_10_21.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/waldo/cropped_25.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/waldo/cropped_24.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/waldo/cropped_26.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/waldo/cropped_27_35.jpg


In [None]:
#Exclude no-busy-background images as they don't appear in the test set (16, 22-27)

if __name__ == "__main__":
    folder_path = train_folder
    file_names_to_delete = ["16.jpg", "22.jpg", "23.jpg", "24.jpg", "25.jpg", "26.jpg", "27.jpg"]  # Replace with your file names
    delete_files_from_list(folder_path, file_names_to_delete)

Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/train/16.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/train/22.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/train/23.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/train/27.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/train/26.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/train/24.jpg
Deleted: 2024-fall-ml-3-hw-4-wheres-waldo/train/train/25.jpg


In [9]:
# Define the paths
input_folder = train_folder
overlay_folder = "2024-fall-ml-3-hw-4-wheres-waldo/train/waldo"
output_folder = "2024-fall-ml-3-hw-4-wheres-waldo/train/chunks"
csv_path = "2024-fall-ml-3-hw-4-wheres-waldo/train/annotations_chunks.csv"

# Create the output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Open the CSV file for saving bounding box annotations
with open(csv_path, mode='w', newline='') as csv_file:
    csv_writer = csv.writer(csv_file)
    # Write header
    csv_writer.writerow(["filename", "x_min", "y_min", "x_max", "y_max"])

    # Parameters
    chunk_size = 512

    # Get the list of image files
    large_images = [os.path.join(input_folder, f) for f in os.listdir(input_folder) if f.endswith('.jpg')]
    smaller_images = [os.path.join(overlay_folder, f) for f in os.listdir(overlay_folder) if f.endswith('.jpg')]

    # Process each large image
    for img_idx, large_image_path in enumerate(large_images):
        with Image.open(large_image_path) as img:
            img = img.convert("RGBA")  # Ensure RGBA mode for transparency
            width, height = img.size

            # Chop into 128x128 chunks
            for top in range(0, height, chunk_size):
                for left in range(0, width, chunk_size):
                    box = (left, top, left + chunk_size, top + chunk_size)
                    chunk = img.crop(box)

                    # If the chunk is smaller than 128x128, pad it
                    if chunk.size != (chunk_size, chunk_size):
                        padded_chunk = Image.new("RGBA", (chunk_size, chunk_size), (255, 255, 255, 0))
                        padded_chunk.paste(chunk, (0, 0))
                        chunk = padded_chunk

                    # Randomly select a smaller image
                    overlay_path = random.choice(smaller_images)
                    try:
                        with Image.open(overlay_path) as overlay:
                            overlay = overlay.convert("RGBA")  # Ensure RGBA mode

                            # Resize overlay to fit within the chunk
                            max_overlay_size = (random.randint(32, chunk_size), random.randint(32, chunk_size))
                            overlay.thumbnail(max_overlay_size, Image.Resampling.LANCZOS)

                            # Random position for overlay
                            overlay_x = random.randint(0, chunk_size - overlay.width)
                            overlay_y = random.randint(0, chunk_size - overlay.height)

                            # Superimpose the overlay onto the chunk
                            chunk.paste(overlay, (overlay_x, overlay_y), overlay)

                            # Calculate bounding box
                            x_min = overlay_x
                            y_min = overlay_y
                            x_max = overlay_x + overlay.width
                            y_max = overlay_y + overlay.height

                            # Save bounding box to CSV
                            output_filename = f"chunk_{img_idx}_{top}_{left}.jpg"
                            csv_writer.writerow([output_filename, x_min, y_min, x_max, y_max])

                            # Save the resulting 128x128 image
                            output_path = os.path.join(output_folder, output_filename)
                            chunk.convert("RGB").save(output_path, "JPEG")

                    except Exception as e:
                        print(f"Error processing overlay {overlay_path}: {e}")


# Train/Test Split

In [10]:
# Split training data into train and validation sets
annotations = pd.read_csv("2024-fall-ml-3-hw-4-wheres-waldo/train/annotations_chunks.csv")
image_files = annotations["filename"].unique()
train_images, val_images = train_test_split(image_files, test_size=0.2, random_state=42)

def filter_csv_by_column(input_csv, output_csv, column_name, values_list):

    # Load the CSV into a DataFrame
    df = pd.read_csv(input_csv)

    # Filter the DataFrame
    filtered_df = df[df[column_name].isin(values_list)]

    # Save the filtered DataFrame to a new CSV file
    filtered_df.to_csv(output_csv, index=False)

#Train Annotations
values_list = list(train_images)
output_csv = "2024-fall-ml-3-hw-4-wheres-waldo/train_annotations.csv"  # Replace with your output file path
column_name = "filename"  # Replace with the column you want to filter
filter_csv_by_column("2024-fall-ml-3-hw-4-wheres-waldo/train/annotations_chunks.csv", output_csv, column_name, values_list)

#Test Annotations
values_list = list(val_images)
output_csv = "2024-fall-ml-3-hw-4-wheres-waldo/test_annotations.csv"  # Replace with your output file path
column_name = "filename"  # Replace with the column you want to filter
filter_csv_by_column("2024-fall-ml-3-hw-4-wheres-waldo/train/annotations_chunks.csv", output_csv, column_name, values_list)

#Train/Test Split (80/20)
def split_directory(source_dir, target_dir, file_list):
    """Splits files from source_dir to target_dir based on file_list."""

    if not os.path.exists(target_dir):
        os.makedirs(target_dir)

    for file_name in file_list:
        source_path = os.path.join(source_dir, file_name)
        target_path = os.path.join(target_dir, file_name)

        if os.path.exists(source_path):
            shutil.move(source_path, target_path)
            print(f"Moved: {file_name}")
        else:
            print(f"File not found: {file_name}")

if __name__ == "__main__":
    source_dir = "2024-fall-ml-3-hw-4-wheres-waldo/train/chunks"
    target_dir = "2024-fall-ml-3-hw-4-wheres-waldo/train/val"
    file_list = list(val_images)

    split_directory(source_dir, target_dir, file_list)

Moved: chunk_18_1024_1024.jpg
Moved: chunk_15_512_1536.jpg
Moved: chunk_14_512_512.jpg
Moved: chunk_11_512_1024.jpg
Moved: chunk_4_1024_0.jpg
Moved: chunk_10_0_2560.jpg
Moved: chunk_12_512_1024.jpg
Moved: chunk_19_1024_1024.jpg
Moved: chunk_0_512_1536.jpg
Moved: chunk_8_0_1024.jpg
Moved: chunk_1_0_0.jpg
Moved: chunk_16_512_512.jpg
Moved: chunk_2_2048_0.jpg
Moved: chunk_14_1536_1536.jpg
Moved: chunk_14_0_2048.jpg
Moved: chunk_19_0_1536.jpg
Moved: chunk_5_1536_0.jpg
Moved: chunk_0_512_0.jpg
Moved: chunk_15_0_512.jpg
Moved: chunk_4_1024_512.jpg
Moved: chunk_10_512_1024.jpg
Moved: chunk_10_1024_1024.jpg
Moved: chunk_0_1536_512.jpg
Moved: chunk_15_512_512.jpg
Moved: chunk_7_1536_2560.jpg
Moved: chunk_10_1536_0.jpg
Moved: chunk_5_1024_1024.jpg
Moved: chunk_4_1024_1024.jpg
Moved: chunk_7_0_1024.jpg
Moved: chunk_0_1024_1536.jpg
Moved: chunk_2_0_0.jpg
Moved: chunk_14_512_0.jpg
Moved: chunk_16_1024_1024.jpg
Moved: chunk_7_1536_1536.jpg
Moved: chunk_2_0_1536.jpg
Moved: chunk_13_1024_512.jpg
Moved