<!DOCTYPE html>
<html>
<head>
    <title>Problem 5</title>
</head>
<body>
    <h1>Object Detection with YOLO - Documentation</h1>
    <p>Below is the documentation for the Python script using the Ultralytics YOLO library to perform object detection on images in a specified folder:</p>
    
<h2>Code Overview</h2>
<ol>
    <li>Import necessary libraries including <code>os</code>, <code>shutil</code>, <code>itertools</code>, <code>matplotlib.pyplot</code>, <code>YOLO</code> from Ultralytics, and <code>clip</code> from OpenAI.</li>
    <li>Define the function <code>detect_and_save_objects(yolo_model, input_folder, output_folder)</code> to perform object detection on images and save detected object crops.</li>
    <li>Define the function <code>compare_images_using_clip(clip_model, output_folder)</code> to compare images using the CLIP model and save similar object crops.</li>
    <li>Define the main function <code>main()</code> to orchestrate the execution of object detection and image comparison.</li>
    <li>Call the <code>main()</code> function when the script is run as the main module.</li>
</ol>

<h2>Execution Flow</h2>
<ol>
    <li>Create a YOLO model instance using the pretrained model file <code>yolov8m.pt</code>.</li>
    <li>Specify input and output directories for images and results.</li>
    <li>Call the <code>detect_and_save_objects</code> function to perform object detection and save object crops.</li>
    <li>Call the <code>compare_images_using_clip</code> function to compare images and save similar object crops.</li>
</ol>

<h2>Main Execution</h2>
<ul>
    <li>Check if the script is being run as the main module using <code>if __name__ == "__main__":</code></li>
    <li>Call the <code>main()</code> function to initiate the execution of object detection and comparison.</li>
    <li>Handle exceptions and print error messages in case of failures.</li>
</ul>
    
<p>This documentation provides an overview of the code's functionality and execution process.</p>
</body>
</html>


In [None]:
!pip install matplotlib
!pip install ultralytics
!pip install torch
!pip install clip

In [14]:
import os
import shutil
import itertools
import matplotlib.pyplot as plt
from ultralytics import YOLO
import clip


def main():
    input_folder = "./All_Images"
    output_folder = "output/problem5"
    model = YOLO("yolov8n.pt")
    detect_objects_yolo(model, input_folder, output_folder)
    compare_images_clip(output_folder)


def detect_objects_yolo(model ,input_folder, output_folder):
    
    image_paths = [os.path.join(input_folder, img) for img in os.listdir(input_folder) if img.lower().endswith(('.jpeg', '.jpg'))]

    for img_path in image_paths:
        image_name = os.path.splitext(os.path.basename(img_path))[0]
        image_output_folder = os.path.join(output_folder, image_name)
        os.makedirs(image_output_folder, exist_ok=True)

        results = model.predict(img_path, save=False)
        temp_store_label_no = {}

        for box in results[0].boxes:
            label_idx = box.cls[0].item()
            label = results[0].names[label_idx]
            entity_folder = os.path.join(image_output_folder, label)
            temp_store_label_no[label] = temp_store_label_no.get(label, 0) + 1
            os.makedirs(entity_folder, exist_ok=True)

            x_min, y_min, x_max, y_max = box.xyxy[0]
            x_min, y_min, x_max, y_max = x_min.item(), y_min.item(), x_max.item(), y_max.item()

            cutout_img = plt.imread(img_path)[int(y_min):int(y_max), int(x_min):int(x_max)]
            output_filename = f"{label}-{temp_store_label_no[label]}-crop.jpg"
            plt.imsave(os.path.join(entity_folder, output_filename), cutout_img)

def compare_images_clip(output_folder):
    clip_model, clip_preprocess = clip.load("ViT-B/32")
    matched_paths = [os.path.join(root, file) for root, _, files in os.walk(output_folder) for file in files if file.endswith('.jpg')]

    for root, _, files in os.walk(output_folder):
        if files:
            complete_img_path = os.path.join(root, files[0])
            entity = ''.join(filter(lambda z: not z.isdigit(), files[0].split('-')[0]))
            input_image = clip_preprocess(plt.imread(complete_img_path))[None]
            input_image_features = clip_model.encode_image(input_image)
            results = {}

            for img_path in matched_paths:
                if complete_img_path == img_path:
                    continue
                image_preprocess = clip_preprocess(plt.imread(img_path))[None]
                image_features = clip_model.encode_image(image_preprocess)
                similarity_score = (1 + (image_features @ input_image_features.T) / 2).item()

                results[img_path] = similarity_score

            sorted_results = dict(sorted(results.items(), key=lambda x: x[1], reverse=True))
            top_3_results = dict(itertools.islice(sorted_results.items(), 3))

            print(top_3_results)
            temp_count = 1

            for image_path in top_3_results.keys():
                new_filename = f"top{temp_count}-crop.jpeg"
                destination_path = os.path.join(root, new_filename)
                shutil.copy(image_path, destination_path)
                temp_count += 1


if __name__ == "__main__":
    main()



image 1/1 d:\Adobe\Aithon\Aithon\All_Images\1.jpg: 384x640 4 persons, 1 couch, 285.4ms
Speed: 7.5ms preprocess, 285.4ms inference, 3.0ms postprocess per image at shape (1, 3, 384, 640)

image 1/1 d:\Adobe\Aithon\Aithon\All_Images\2.jpg: 448x640 3 persons, 1 bed, 194.2ms
Speed: 8.2ms preprocess, 194.2ms inference, 3.5ms postprocess per image at shape (1, 3, 448, 640)

image 1/1 d:\Adobe\Aithon\Aithon\All_Images\3.jpg: 448x640 3 persons, 213.3ms
Speed: 7.5ms preprocess, 213.3ms inference, 5.2ms postprocess per image at shape (1, 3, 448, 640)

image 1/1 d:\Adobe\Aithon\Aithon\All_Images\AdobeStock_112814949.jpeg: 448x640 1 spoon, 1 bowl, 1 sandwich, 320.2ms
Speed: 7.1ms preprocess, 320.2ms inference, 4.6ms postprocess per image at shape (1, 3, 448, 640)

image 1/1 d:\Adobe\Aithon\Aithon\All_Images\AdobeStock_119085612.jpeg: 480x640 1 cup, 1 toothbrush, 289.4ms
Speed: 6.0ms preprocess, 289.4ms inference, 3.0ms postprocess per image at shape (1, 3, 480, 640)

image 1/1 d:\Adobe\Aithon\Aith

TypeError: Unexpected type <class 'numpy.ndarray'>