# Assignment 4: Wheres Waldo?
### Name: Eileanor LaRocco
In this assignment, you will develop an object detection algorithm to locate Waldo in a set of images. You will develop a model to detect the bounding box around Waldo. Your final task is to submit your predictions on Kaggle for evaluation.

### Process/Issues
- Double-checked that the images we were given were correctly bounded (did this by visualizing the boxes on the images - they look good!)
- Complication: Originally when I creating augmented images, the bounding box labels did not also augment. I also had to try out a few types of augmentation to see what made sense for waldo. The augmented images may still not be as different from one another as they could be which could allow the model to favor the training images that occur more frequently.
- Complication: Similarly, when resizing the images, ensuring the bounding boxes not only are also adjusted if necessary, but ensuring they do not get cut off and the image is not stretched/shrunk too much.

### Imports

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import torch
from torchvision.io import read_image
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from torchvision.transforms import functional as F
from tqdm import tqdm
import csv
import opendatasets as od
import cv2
import albumentations as A
import random
import shutil
from sklearn.model_selection import train_test_split
from ultralytics import YOLO
import torch
import torch.nn as nn

  data = fetch_version_info()


In [2]:
SEED = 1

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

device = device = torch.device("mps")
print(device)

mps


### Download Data

In [3]:
od.download('https://www.kaggle.com/competitions/2024-fall-ml-3-hw-4-wheres-waldo/data')

Skipping, found downloaded files in "./2024-fall-ml-3-hw-4-wheres-waldo" (use force=True to force download)


### Paths

In [4]:
train_folder = "2024-fall-ml-3-hw-4-wheres-waldo/train/train" # Original Train Images
test_folder = "2024-fall-ml-3-hw-4-wheres-waldo/test/test" # Original Test Images
annotations_file = "2024-fall-ml-3-hw-4-wheres-waldo/annotations.csv" # Original Annotations File

# Preprocess Images (Crop/Augment)

### Check Image Sizes

In [5]:
# Train Images

# Iterate over all images in the folder
for image_name in os.listdir(train_folder):
    if image_name.endswith((".jpg")):  # Check for common image extensions
        image_path = os.path.join(train_folder, image_name)
        
        # Read the image using OpenCV
        img = cv2.imread(image_path)
        if img is not None:
            height, width, channels = img.shape  # Get image size (height, width, channels)
            print(f"Image: {image_name}, Width: {width}, Height: {height}")
        else:
            print(f"Could not read image: {image_name}")

# Test Images

# Iterate over all images in the folder
for image_name in os.listdir(test_folder):
    if image_name.endswith((".jpg")):  # Check for common image extensions
        image_path = os.path.join(test_folder, image_name)
        
        # Read the image using OpenCV
        img = cv2.imread(image_path)
        if img is not None:
            height, width, channels = img.shape  # Get image size (height, width, channels)
            print(f"Image: {image_name}, Width: {width}, Height: {height}")
        else:
            print(f"Could not read image: {image_name}")


Image: 8.jpg, Width: 2800, Height: 1760
Image: 9.jpg, Width: 1298, Height: 951
Image: 14.jpg, Width: 1700, Height: 2340
Image: 15.jpg, Width: 1600, Height: 1006
Image: 17.jpg, Width: 1599, Height: 1230
Image: 16.jpg, Width: 1525, Height: 3415
Image: 12.jpg, Width: 1276, Height: 1754
Image: 13.jpg, Width: 1280, Height: 864
Image: 11.jpg, Width: 2828, Height: 1828
Image: 10.jpg, Width: 1600, Height: 980
Image: 21.jpg, Width: 2048, Height: 1515
Image: 20.jpg, Width: 2953, Height: 2088
Image: 22.jpg, Width: 500, Height: 256
Image: 23.jpg, Width: 325, Height: 300
Image: 27.jpg, Width: 591, Height: 629
Image: 26.jpg, Width: 600, Height: 374
Image: 18.jpg, Width: 1590, Height: 981
Image: 24.jpg, Width: 456, Height: 256
Image: 25.jpg, Width: 413, Height: 500
Image: 19.jpg, Width: 1280, Height: 864
Image: 4.jpg, Width: 2048, Height: 1272
Image: 5.jpg, Width: 2100, Height: 1760
Image: 7.jpg, Width: 1949, Height: 1419
Image: 6.jpg, Width: 2048, Height: 1454
Image: 2.jpg, Width: 1286, Height: 946


### Resize Images

In [6]:
# Paths
resized_folder = "2024-fall-ml-3-hw-4-wheres-waldo/train/train/resized"
resized_annotations_file = "2024-fall-ml-3-hw-4-wheres-waldo/resized_annotations.csv"
target_size = (1500, 1000)  # Target size for resizing images

# Read the annotations CSV file
annotations_df = pd.read_csv(annotations_file)

# Create the output folder if it doesn't exist
os.makedirs(resized_folder, exist_ok=True)

# List to store updated bounding boxes
updated_annotations = []

# Iterate over all images in the annotation file
for index, row in annotations_df.iterrows():
    image_name = row["filename"]
    xmin, ymin, xmax, ymax = row["xmin"], row["ymin"], row["xmax"], row["ymax"]
    
    # Load the image
    image_path = os.path.join(train_folder, image_name)
    img = cv2.imread(image_path)
    
    if img is not None:
        original_height, original_width = img.shape[:2]
        
        # Calculate the resizing scale factors
        scale_x = target_size[0] / original_width
        scale_y = target_size[1] / original_height
        
        # Resize the image
        resized_img = cv2.resize(img, target_size)
        
        # Adjust bounding boxes based on the scaling factors
        xmin_new = int(xmin * scale_x)
        ymin_new = int(ymin * scale_y)
        xmax_new = int(xmax * scale_x)
        ymax_new = int(ymax * scale_y)
        
        # Save the resized image
        resized_image_path = os.path.join(resized_folder, image_name)
        cv2.imwrite(resized_image_path, resized_img)
        
        # Add the updated annotation to the list
        updated_annotations.append([image_name, xmin_new, ymin_new, xmax_new, ymax_new])

# Save the updated annotations to a new CSV file
updated_annotations_df = pd.DataFrame(updated_annotations, columns=["filename", "xmin", "ymin", "xmax", "ymax"])
updated_annotations_df.to_csv(resized_annotations_file, index=False)

print("Images and annotations resized and saved.")


Images and annotations resized and saved.


### Augment Images

In [8]:
import os
import cv2
import pandas as pd
import random
import albumentations as A

# Paths
augmented_folder = "2024-fall-ml-3-hw-4-wheres-waldo/train/train/resized_augmented_images"  # Folder to save augmented images
os.makedirs(augmented_folder, exist_ok=True)

# Load annotations
annotations = pd.read_csv(resized_annotations_file)

# Define augmentation pipeline
transform = A.Compose(
    [
        A.HorizontalFlip(p=0.7),  # Randomly flip the image horizontally
        A.RandomBrightnessContrast(p=0.7),  # Adjust brightness and contrast
        A.Rotate(limit=15, p=0.5),  # Randomly rotate the image
    ],
    bbox_params=A.BboxParams(format='pascal_voc', label_fields=['class_labels'])
)

# Process each image
augmented_annotations = []

for _, row in annotations.iterrows():
    image_name = row['filename']
    x_min, y_min, x_max, y_max = row['xmin'], row['ymin'], row['xmax'], row['ymax']
    
    # Load image
    image_path = os.path.join(resized_folder, image_name)
    image = cv2.imread(image_path)
    if image is None:
        print(f"Image {image_path} not found. Skipping...")
        continue
    
    # Prepare bounding boxes and labels
    bboxes = [[x_min, y_min, x_max, y_max]]
    class_labels = ["waldo"]
    
    for i in range(5):
        # Apply augmentation
        augmented = transform(image=image, bboxes=bboxes, class_labels=class_labels)
        
        # Save augmented image
        aug_image_name = f"aug_{random.randint(1000, 9999)}_{image_name}"
        aug_image_path = os.path.join(augmented_folder, aug_image_name)
        cv2.imwrite(aug_image_path, augmented['image'])
        
        # Save augmented bounding boxes
        for bbox, label in zip(augmented['bboxes'], augmented['class_labels']):
            x_min, y_min, x_max, y_max = bbox  # Bounding boxes are already in Pascal VOC format
            augmented_annotations.append([aug_image_name, x_min, y_min, x_max, y_max])

# Save augmented annotations to CSV
augmented_csv_path = os.path.join("2024-fall-ml-3-hw-4-wheres-waldo", "resized_augmented_annotations.csv")
augmented_df = pd.DataFrame(augmented_annotations, columns=["filename", "xmin", "ymin", "xmax", "ymax"])
augmented_df.to_csv(augmented_csv_path, index=False)

print(f"Augmented images and annotations saved to {augmented_folder}")


Augmented images and annotations saved to 2024-fall-ml-3-hw-4-wheres-waldo/train/train/resized_augmented_images


### Draw bounding boxes on train images to check accuracy

In [9]:
# Paths
output_folder = "2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images"  # Folder to save images with drawn boxes

# Create the output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Read the CSV file
# Assumes the CSV columns are: filename, xmin, ymin, xmax, ymax
annotations = pd.read_csv(augmented_csv_path)

# Iterate through each image in the annotations
for _, row in annotations.iterrows():
    image_name = row["filename"]
    x_min, y_min, x_max, y_max = row["xmin"], row["ymin"], row["xmax"], row["ymax"]
    
    # Load the image
    image_path = os.path.join(augmented_folder, image_name)
    if not os.path.exists(image_path):
        print(f"Image {image_path} not found. Skipping...")
        continue
    image = cv2.imread(image_path)
    
    # Draw the bounding box
    # cv2.rectangle(image, (x_min, y_min), (x_max, y_max), (B, G, R), thickness)
    cv2.rectangle(image, (int(x_min), int(y_min)), (int(x_max), int(y_max)), (0, 255, 0), 4)
    
    # Optionally, add a label or text
    label = "Waldo"
    cv2.putText(image, label, (int(x_min), int(y_min) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    
    # Save the image
    output_path = os.path.join(output_folder, image_name)
    cv2.imwrite(output_path, image)

    print(f"Annotated image saved to {output_path}")

print("All bounding boxes have been drawn and saved.")


Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images/aug_3201_1.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images/aug_2033_1.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images/aug_5179_1.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images/aug_2931_1.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images/aug_9117_1.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images/aug_8364_10.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images/aug_8737_10.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images/aug_7219_10.jpg
Annotated image saved to 2024-fall-ml-3-hw-4-wheres-waldo/annotated_resized_augmented_images/aug_4439_10.jpg
Annotated image saved to

# Preprocessing (for model)

In [10]:
# Paths
yolo_train_dir = "datasets/yolo_dataset/train"
yolo_val_dir = "datasets/yolo_dataset/val"

#Saved Predictions
yolo_test_dir = "yolo_test_predictions"

# Create necessary folders
os.makedirs(yolo_train_dir, exist_ok=True)
os.makedirs(yolo_val_dir, exist_ok=True)
os.makedirs(yolo_test_dir, exist_ok=True)

# Load annotations
annotations = pd.read_csv(augmented_csv_path)

# Function to convert annotations to YOLO format
def convert_to_yolo_format(row, img_width, img_height):
    x_center = (row["xmin"] + row["xmax"]) / 2 / img_width
    y_center = (row["ymin"] + row["ymax"]) / 2 / img_height
    width = (row["xmax"] - row["xmin"]) / img_width
    height = (row["ymax"] - row["ymin"]) / img_height
    return f"0 {x_center} {y_center} {width} {height}"

# Split training data into train and validation sets
image_files = annotations["filename"].unique()
train_images, val_images = train_test_split(image_files, test_size=0.2, random_state=42)

# Function to prepare YOLO format data
def prepare_yolo_data(image_list, output_dir):
    for img_name in image_list:
        img_path = os.path.join(augmented_folder, img_name)
        img = cv2.imread(img_path)
        if img is None:
            continue
        img_height, img_width, _ = img.shape

        # Filter annotations for this image
        image_annotations = annotations[annotations["filename"] == img_name]

        # YOLO annotations file
        yolo_annotations = []
        for _, row in image_annotations.iterrows():
            yolo_line = convert_to_yolo_format(row, img_width, img_height)
            yolo_annotations.append(yolo_line)

        # Save image and annotation
        base_name = os.path.splitext(img_name)[0]
        shutil.copy(img_path, os.path.join(output_dir, f"{base_name}.jpg"))
        with open(os.path.join(output_dir, f"{base_name}.txt"), "w") as f:
            f.write("\n".join(yolo_annotations))

# Prepare training and validation data
prepare_yolo_data(train_images, yolo_train_dir)
prepare_yolo_data(val_images, yolo_val_dir)


In [49]:
def filter_csv_by_column(input_csv, output_csv, column_name, values_list):
    """
    Filters rows in a CSV file and keeps only those where the specified column's value is in a given list.

    Parameters:
        input_csv (str): Path to the input CSV file.
        output_csv (str): Path to save the filtered CSV file.
        column_name (str): Column to filter on.
        values_list (list): List of values to keep.
    """
    # Load the CSV into a DataFrame
    df = pd.read_csv(input_csv)

    # Filter the DataFrame
    filtered_df = df[df[column_name].isin(values_list)]

    # Save the filtered DataFrame to a new CSV file
    filtered_df.to_csv(output_csv, index=False)

In [50]:
#split annotations into train and val
values_list = []
directory = "datasets/yolo_dataset/train"
for filename in os.listdir(directory):
    if filename.endswith('.jpg'):
        values_list.append(filename)

# Example usage
input_csv = "2024-fall-ml-3-hw-4-wheres-waldo/resized_augmented_annotations.csv"  # Replace with your input file path
output_csv = "2024-fall-ml-3-hw-4-wheres-waldo/train_annotations.csv"  # Replace with your output file path
column_name = "filename"  # Replace with the column you want to filter

filter_csv_by_column(input_csv, output_csv, column_name, values_list)




values_list = []
directory = "datasets/yolo_dataset/val"
for filename in os.listdir(directory):
    if filename.endswith('.jpg'):
        values_list.append(filename)

# Example usage
input_csv = "2024-fall-ml-3-hw-4-wheres-waldo/resized_augmented_annotations.csv"  # Replace with your input file path
output_csv = "2024-fall-ml-3-hw-4-wheres-waldo/test_annotations.csv"  # Replace with your output file path
column_name = "filename"  # Replace with the column you want to filter

filter_csv_by_column(input_csv, output_csv, column_name, values_list)



In [77]:
from PIL import Image
import torchvision

class WaldoDataset(torch.utils.data.Dataset):
    def __init__(self, annotations_file, img_dir):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = torchvision.transforms.Compose([
            torchvision.transforms.ToPILImage(),
            torchvision.transforms.ToTensor()
        ])

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        # Load image
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = Image.open(img_path).convert("RGB")
        image = F.to_tensor(image)
        image = np.array(image)
        
        # Read bounding box data, ensuring all are converted to float
        box_data = self.img_labels.iloc[idx, 1:5].values
        boxes = []
        for item in box_data:
            try:
                boxes.append(float(item))
            except ValueError as e:
                raise ValueError(f"Error converting bounding box data to float: {e}")

        # Create tensors
        boxes = torch.as_tensor([boxes], dtype=torch.float32)
        labels = torch.ones((1,), dtype=torch.int64)
        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        iscrowd = torch.zeros((1,), dtype=torch.int64)
        
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        image, target = self.transform(image, target)

        target = F.to_tensor(target)

        return image, target


# Example usage:
# Create the dataset
train_dataset = WaldoDataset(annotations_file= "2024-fall-ml-3-hw-4-wheres-waldo/train_annotations.csv", img_dir="datasets/yolo_dataset/train")
test_dataset = WaldoDataset(annotations_file= "2024-fall-ml-3-hw-4-wheres-waldo/test_annotations.csv", img_dir="datasets/yolo_dataset/val")

# Now, you can use this dataset with a DataLoader to train your model
from torch.utils.data import DataLoader

train_loader = DataLoader(
    train_dataset,
    batch_size=2,
    shuffle=True,
    #collate_fn=lambda x: tuple(zip(*x))
)

test_loader = DataLoader(
    test_dataset,
    batch_size=2,
    shuffle=True,
    #collate_fn=lambda x: tuple(zip(*x))
)

# Model

### Architecture

In [17]:
import torch
import torch.nn as nn

class SimpleYOLOv3(nn.Module):
    def __init__(self, num_classes):
        super(SimpleYOLOv3, self).__init__()

        # Backbone: Feature extractor (simplified)
        self.backbone = nn.Sequential(
            nn.Conv2d(3, 16, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            # Add more convolutional layers as needed
        )

        # Detection head (simplified)
        self.head = nn.Sequential(
            nn.Conv2d(16, 32, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, (5 + num_classes) * 3, 1),  # 3 bounding boxes per cell
        )

    def forward(self, x):
        x = self.backbone(x)
        print(x.size())
        x = self.head(x)
        print(x.size())
        return x 

# Instantiate and check the model
model = SimpleYOLOv3(num_classes=1)
input_image = torch.randn(1, 3, 1500, 1000)  # Example batch
output = model(input_image)
print(output.shape)


torch.Size([1, 16, 750, 500])
torch.Size([1, 18, 750, 500])
torch.Size([1, 18, 750, 500])


In [78]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import numpy as np
import random

model = SimpleYOLOv3(num_classes=1)

# IoU calculation
def compute_iou(pred_boxes, true_boxes):
    # pred_boxes and true_boxes should be in (x_min, y_min, x_max, y_max)
    inter_xmin = torch.max(pred_boxes[:, 0], true_boxes[:, 0])
    inter_ymin = torch.max(pred_boxes[:, 1], true_boxes[:, 1])
    inter_xmax = torch.min(pred_boxes[:, 2], true_boxes[:, 2])
    inter_ymax = torch.min(pred_boxes[:, 3], true_boxes[:, 3])

    inter_area = torch.clamp(inter_xmax - inter_xmin, min=0) * torch.clamp(inter_ymax - inter_ymin, min=0)
    pred_area = (pred_boxes[:, 2] - pred_boxes[:, 0]) * (pred_boxes[:, 3] - pred_boxes[:, 1])
    true_area = (true_boxes[:, 2] - true_boxes[:, 0]) * (true_boxes[:, 3] - true_boxes[:, 1])

    union_area = pred_area + true_area - inter_area
    iou = inter_area / union_area
    return iou

# Simple IoU loss function
def iou_loss(pred_boxes, true_boxes):
    iou = compute_iou(pred_boxes, true_boxes)
    return 1 - iou.mean()  # We want to maximize IoU, so minimize 1 - IoU

# Custom YOLOv3 training loop
def train(model, train_loader, optimizer, device):
    model.train()
    total_loss = 0
    for images, targets in train_loader:
        print(images.shape())
        print(images)
        print(targets.shape())
        images = images.to(device)
        targets = targets.to(device)

        # Forward pass
        predictions = model(images)

        # Extract predicted boxes and target boxes (for simplicity, assuming one grid cell)
        pred_boxes = predictions[:, :4]  # first 4 are bounding box coordinates
        pred_conf = predictions[:, 4]    # 5th is objectness confidence
        pred_class = predictions[:, 5:]  # remaining are class predictions

        true_boxes = targets[:, :4]  # Ground truth boxes
        true_conf = targets[:, 4]    # Objectness confidence
        true_class = targets[:, 5:]  # Ground truth class

        # Losses
        loss_loc = iou_loss(pred_boxes, true_boxes)  # IoU loss
        loss_conf = torch.nn.BCEWithLogitsLoss()(pred_conf, true_conf)  # Confidence loss
        loss_class = torch.nn.BCEWithLogitsLoss()(pred_class, true_class)  # Classification loss

        # Total loss (sum or weighted sum)
        loss = loss_loc + loss_conf + loss_class
        total_loss += loss.item()

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    avg_loss = total_loss / len(train_loader)
    print(f"Training loss: {avg_loss}")

# Evaluation (testing) function
def evaluate(model, test_loader, device):
    model.eval()
    total_iou = 0
    with torch.no_grad():
        for images, targets in test_loader:
            images = images.to(device)
            targets = targets.to(device)

            predictions = model(images)

            # Extract predicted boxes and target boxes
            pred_boxes = predictions[:, :4]
            true_boxes = targets[:, :4]

            # Calculate IoU for the batch
            iou = compute_iou(pred_boxes, true_boxes)
            total_iou += iou.mean().item()

    avg_iou = total_iou / len(test_loader)
    print(f"Average IoU on test set: {avg_iou}")

# Initialize model, optimizer, and device
model = SimpleYOLOv3(num_classes=1).to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-4)

# Train the model
epochs = 10
for epoch in range(epochs):
    print(f"Epoch {epoch+1}/{epochs}")
    train(model, train_loader, optimizer, device)
    evaluate(model, test_loader, device)


Epoch 1/10


TypeError: Compose.__call__() takes 2 positional arguments but 3 were given

### Training

In [11]:
# Train YOLO model
#model = YOLO("yolov5su.pt")  # Load pretrained weights
#model.train(data="yolo.yaml", epochs=15, imgsz=640, pretrained=True, augment=True,)

# Submission File 

In [None]:
test_folder = "2024-fall-ml-3-hw-4-wheres-waldo/test/test"

# Predict on test images
test_images = [os.path.join(test_folder, img) for img in os.listdir(test_folder) if img.endswith(".jpg")]
results = model.predict(source=test_images, save=True, save_txt=True, project="yolo_test_predictions")

# Prepare to save the predictions
output_csv_path = os.path.join("yolo_test_predictions", "predictions.csv")
predictions = []

# Process results
for result in results:
    image_name = os.path.basename(result.path)  # Get the image name
    if result.boxes is not None and len(result.boxes) > 0:  # Check if there are predictions
        # Convert result.boxes to tensor for easier access
        boxes = result.boxes.xyxy.cpu().numpy()  # Convert bounding boxes to array
        confidences = result.boxes.conf.cpu().numpy()  # Convert confidence scores to array

        # Find the index of the box with the highest confidence
        best_idx = confidences.argmax()
        best_box = boxes[best_idx]
        conf = confidences[best_idx]

        # Extract bounding box coordinates
        x_min, y_min, x_max, y_max = best_box
        predictions.append([image_name, x_min, y_min, x_max, y_max, conf])
    else:
        # No predictions for this image
        predictions.append([image_name, None, None, None, None, None])

# Save predictions to CSV
df = pd.DataFrame(predictions, columns=["filename", "xmin", "ymin", "xmax", "ymax", "confidence"])
df.to_csv(output_csv_path, index=False)

print(f"Predictions saved to {output_csv_path}")



0: 640x640 (no detections), 210.5ms
1: 640x640 (no detections), 210.5ms
2: 640x640 (no detections), 210.5ms
3: 640x640 (no detections), 210.5ms
4: 640x640 (no detections), 210.5ms
5: 640x640 (no detections), 210.5ms
6: 640x640 (no detections), 210.5ms
7: 640x640 (no detections), 210.5ms
8: 640x640 (no detections), 210.5ms
Speed: 1.5ms preprocess, 210.5ms inference, 0.1ms postprocess per image at shape (1, 3, 640, 640)
Results saved to [1myolo_test_predictions/train2[0m
0 label saved to yolo_test_predictions/train2/labels
Predictions saved to yolo_test_predictions/predictions.csv
