### **Author**
Shivansh Gupta

## **Importing All the Modules**

- **`YOLO` (from `ultralytics`)**
  - Load and run the **YOLOv8 model**.

- **`cv2`**
  - Read/write images.
  - Draw bounding boxes and other shapes.
  - Perform color space conversions (e.g., BGR ↔ RGB).

- **`numpy as np`**
  - Work with image arrays.
  - Perform numeric and matrix operations efficiently.

- **`typing.List` / `typing.Dict`**
  - Add **type annotations** for better code clarity and readability.

- **`torch`**
  - Detect GPU availability using `torch.cuda.is_available()`.
  - Perform tensor operations if needed.

- **`os`**
  - Read environment variables.
  - Check file paths and manage the filesystem.

In [2]:
from ultralytics import YOLO
import cv2
import numpy as np
from typing import List, Dict
import torch

### Loading Environment Variables

```python
# Load environment variables from a .env file


In [3]:
import os
from dotenv import load_dotenv
load_dotenv(dotenv_path="D:\PycharmProjects\Eco_Vision\Backend\.env")

  load_dotenv(dotenv_path="D:\PycharmProjects\Eco_Vision\Backend\.env")


True

### YOLO Model Setup

- **Device Selection:**
  You can **force the use of CPU or GPU** manually.
  If not specified, the device will be **auto-detected** later.

- **Model Path (`model_path`):**
  - Uses the environment variable `MODEL_PATH` if it exists.
  - Otherwise, defaults to `"yolov8n.pt"`.

- **`YOLO(model_path)`**:
  - Calls the **YOLO class** from Ultralytics with the given `model_path`.
  - Loads the **pretrained model into memory**.
  - **Important:** The model is loaded **once in the constructor**, so you **don’t need to reload it** every time you perform detection.


In [4]:
model_path = os.getenv("MODEL_PATH",None) #if we didn't set model , default this come "yolo12n.pt"
device = ("cuda" if torch.cuda.is_available() else "cpu")
model = YOLO(model_path)

#### As here  below we can see our model is lock & loaded on cpu not gpu as i dont have didicated graphic card 💀.

In [5]:
print(f"[YOLODetector] Loaded model: {model_path} on {device}")

[YOLODetector] Loaded model: yolo12n.pt on cpu


### **COCO 2017 Dataset Classes**

- These are the object classes we will use from the **COCO 2017 dataset**.
- The model will detect only on these classes.

> **Note:** COCO 2017 has 80 classes in total, but for our task, we select a subset relevant to our need.

- When a detection is made:
  - If the **class ID** is **in** `self.reusable_classes` → ✅ keep it.
  - If the **class ID** is **not in** `self.reusable_classes` → ❌ ignore it.


This keeps the detection system **focused only on relevant items**, reducing noise from unnecessary classes.


In [6]:
reusable_classes = {
    39: 'bottle',
    41: 'cup',
    42: 'fork',
    43: 'knife',
    44: 'spoon',
    45: 'bowl',
    46: 'banana',
    47: 'apple',
    51: 'orange',
    67: 'cell phone',
    73: 'laptop',
    76: 'keyboard',
    84: 'book',
}

### **Model Inference**

- This is where the **actual inference happens**.
- The input image/frame is passed to the YOLO model.
- The model runs its **forward pass** and returns detections:
  - **Bounding boxes** (location of objects).
  - **Class IDs** (what the object is).
  - **Confidence scores** (how sure the model is).

> **Note:** Inference = the stage where the trained model is **applied to new data** to make predictions.

In [7]:
def detect_objects(image_path: str, conf: float = 0.5, imgsz: int = 640) -> List[Dict]:
    if not os.path.exists(image_path):
        raise FileNotFoundError(f"Image not found: {image_path}")
    classes = list(reusable_classes.keys())   #Gets all the keys (class IDs like 39, 41, 42, …) from the reusable classes dictionary and make a list.
    results = model(
    image_path,
    conf=conf,
    imgsz=imgsz,
    device=device,
    classes=classes
)
    detections = []
    for result in results:
        if not getattr(result, "boxes", None):
            continue
        for box in result.boxes:
            class_id = int(box.cls[0])  #It’s a tensor (because YOLO is built on PyTorch)
            confidence = float(box.conf[0])
            x1, y1, x2, y2 = map(int, box.xyxy[0].tolist())
            detections.append({
                    "class_id": class_id,
                    "class_name": reusable_classes[class_id],
                    "confidence": {
                        "score": round(confidence, 4),       # raw score (0..1)
                        "percent": round(confidence * 100, 1)  # human-friendly %
                    },
                    "bbox": [x1, y1, x2, y2]
                })
    return detections

results is a list of Result objects (one per input image). Each contains detected boxes, class IDs, confidences, etc.

### **YOLO Inference Output (Ultralytics)**

When we run inference, **Ultralytics YOLO** returns a **list of `Results` objects**
👉 one `Results` object **per input image**.

---

#### **Step 1: What is a `Results` object?**
Each `Results` object contains:
- The **input image** (possibly resized).
- All **detections** found in that image.
- **Helper methods** (e.g., `.plot()`, `.save()`).

➡️ In short: **`result` = container for one image’s predictions**.

---

#### **Step 2: What is `.boxes` inside a result?**
- `result.boxes` → an attribute of the `Results` object.
- It is a **`Boxes` object** (Ultralytics’ custom class).
- Stores **all bounding boxes YOLO predicted** for that image.
- Each entry in `result.boxes` = **one detection**.

---

#### **Step 3: What does each box contain?**
A single box has:
- `.cls` → predicted **class id** (e.g., `tensor([39.])`).
- `.conf` → **confidence score** (e.g., `tensor([0.872])`).
- `.xyxy` → bounding box in **[x1, y1, x2, y2]** format (absolute pixel values).
- `.xywh` → bounding box in **[x_center, y_center, width, height]** format.
- `.data` → raw tensor with all values stacked.

---

✅ Example:
If YOLO finds **3 objects** in an image, then `result.boxes` will contain **3 box objects**,
each with its own class, confidence, and coordinates.

### **Lets run & see how this works**

In [8]:
detect = detect_objects("test.jpg")
print(detect)


image 1/1 D:\PycharmProjects\Eco_Vision\Backend\ML notebook\test.jpg: 576x640 8 bottles, 231.6ms
Speed: 68.2ms preprocess, 231.6ms inference, 5.2ms postprocess per image at shape (1, 3, 576, 640)
[{'class_id': 39, 'class_name': 'bottle', 'confidence': {'score': 0.9479, 'percent': 94.8}, 'bbox': [190, 67, 222, 181]}, {'class_id': 39, 'class_name': 'bottle', 'confidence': {'score': 0.7831, 'percent': 78.3}, 'bbox': [26, 62, 51, 155]}, {'class_id': 39, 'class_name': 'bottle', 'confidence': {'score': 0.7175, 'percent': 71.7}, 'bbox': [45, 81, 79, 153]}, {'class_id': 39, 'class_name': 'bottle', 'confidence': {'score': 0.663, 'percent': 66.3}, 'bbox': [26, 62, 51, 127]}, {'class_id': 39, 'class_name': 'bottle', 'confidence': {'score': 0.6598, 'percent': 66.0}, 'bbox': [145, 59, 169, 156]}, {'class_id': 39, 'class_name': 'bottle', 'confidence': {'score': 0.6534, 'percent': 65.3}, 'bbox': [156, 61, 186, 178]}, {'class_id': 39, 'class_name': 'bottle', 'confidence': {'score': 0.6361, 'percent':

We got list of classes and here it detect 8/10 bottles in our image.

### **YOLO Inference Timing Flow**

The YOLO output line:

```

test.jpg: 576x640 8 bottles, 214.9ms
Speed: 8.5ms preprocess, 214.9ms inference, 3.8ms postprocess per image at shape (1, 3, 576, 640)

```

can be visualized as:

```

Input Image: test.jpg (576x640)
│
▼
Preprocess: 8.5ms

* Load image
* Resize & normalize
* Convert to tensor
* Send to GPU
  │
  ▼
  Inference: 214.9ms
* YOLO model predicts bounding boxes & class scores
  │
  ▼
  Postprocess: 3.8ms
* Non-Max Suppression (NMS)
* Filter overlapping boxes
* Scale boxes to original image
  │
  ▼
  Output: 8 bottles detected
  Total time: 214.9ms

```

**Explanation:**
- **Preprocess:** Preparation before model runs.
- **Inference:** Model does all predictions.
- **Postprocess:** Refines and formats predictions.

> **Note:** Most of the time is spent in **inference**, which grows with image size or model complexity.

Here’s a clean Markdown snippet for your Jupyter Notebook explaining the input tensor shape:

### **Input Tensor Shape**

The YOLO model input tensor has the shape:

```

(1, 3, 576, 640)

```

**Breakdown:**

| Dimension | Meaning |
|-----------|---------|
| 1         | Batch size (**one image**) |
| 3         | Number of color channels (**RGB**) |
| 576       | Image height in pixels |
| 640       | Image width in pixels |

> **Note:** YOLO requires input images to be resized and converted into a tensor of shape `(batch, channels, height, width)` before passing it through the model.

In [11]:
def annotate_image(image_path: str, detections: List[Dict]) -> np.ndarray:
    img_bgr = cv2.imread(image_path)
    if img_bgr is None:
        raise FileNotFoundError(f"Failed to read image: {image_path}")
    for det in detections:
        x1, y1, x2, y2 = det["bbox"]
        label = f"{det['class_name']} {det['confidence']['percent']}%"
        cv2.rectangle(img_bgr, (x1, y1), (x2, y2), (16, 185, 129), 2)
        cv2.putText(img_bgr, label, (x1, y1 - 6),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.2,
                    (0, 0, 0), 1, cv2.LINE_AA)
    img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
    cv2.imshow("Annotated Image", img_rgb)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    return img_rgb

### **`cv2.imread()`**

- **What:**
  OpenCV function to **read an image from disk**.

- **Why:**
  Loads the image into a **NumPy array** in **BGR format**.
  > Note: OpenCV uses **Blue-Green-Red (BGR) channel order** by default, not RGB.


### **`cv2.rectangle()`**

Draws a rectangle on an image typically used for **bounding boxes** around detected objects.

```python
cv2.rectangle(img_bgr, (x1, y1), (x2, y2), (16, 185, 129), 2)
````

**Breakdown of parameters:**

| Parameter        | What it is                           | Why                                                            |
| ---------------- | ------------------------------------ | -------------------------------------------------------------- |
| `img_bgr`        | The image array (in BGR format)      | Rectangle will be drawn directly on this image                 |
| `(x1, y1)`       | Top-left corner of the rectangle     | Defines where the rectangle starts                             |
| `(x2, y2)`       | Bottom-right corner of the rectangle | Defines where the rectangle ends                               |
| `(16, 185, 129)` | BGR color tuple for the rectangle    | Chooses a visible color (here, a shade of green)               |
| `2`              | Line thickness in pixels             | Determines how thick the rectangle border appears on the image |

> **Note:** OpenCV uses **BGR** order, not RGB, for colors.

### **`cv2.putText()`**

Draws text on an image typically used to **display the class name and confidence** above a bounding box.

```python
cv2.putText(img_bgr, label, (x1, y1 - 6),
            cv2.FONT_HERSHEY_SIMPLEX, 0.2,
            (0, 0, 0), 1, cv2.LINE_AA)
````

**Breakdown of parameters:**

| Parameter                  | What it is                                            | Why                                                            |
|----------------------------| ----------------------------------------------------- | -------------------------------------------------------------- |
| `img_bgr`                  | The image array where text will be drawn (BGR format) | Text appears directly on this image alongside the bounding box |
| `label`                    | Text string (e.g., `"bottle 92.1%"`)                  | Displays the **object name and confidence**                    |
| `(x1, y1 - 6)`             | Bottom-left corner coordinates of the text            | Slightly above the bounding box to avoid overlap               |
| `cv2.FONT_HERSHEY_SIMPLEX` | Predefined OpenCV font type                           | Determines the **style of the text**                           |
| `0.2`                      | Font scale                                            | Controls **text size** (0.5 = moderately small)                |
| `(0, 0, 0)`                | Text color in BGR (white)                             | Ensures high visibility against most backgrounds               |
| `1`                        | Thickness of the text stroke                          | Determines how bold the text appears                           |
| `cv2.LINE_AA`              | Anti-aliased line type                                | Smooths edges for better readability                           |

> **Note:** Using anti-aliasing (`cv2.LINE_AA`) makes the text **look smoother and more professional**.

### **Converting BGR to RGB**

```python
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
````

**Breakdown of components:**

| Component           | What it is                                      | Why                                                               |
| ------------------- | ----------------------------------------------- | ----------------------------------------------------------------- |
| `img_rgb`           | New variable storing the converted image        | Needed in **RGB format** for consistent display (matplotlib, web) |
| `cv2.cvtColor()`    | OpenCV function to convert image color spaces   | Converts BGR → RGB                                                |
| `img_bgr`           | Input image in **BGR format**                   | Source image with drawn bounding boxes and text                   |
| `cv2.COLOR_BGR2RGB` | OpenCV constant specifying BGR → RGB conversion | Ensures red, green, and blue channels are correctly reordered     |

> **Note:** OpenCV uses **BGR by default**, while most display libraries (matplotlib, PIL, browsers) expect **RGB**. This conversion prevents color distortion.

### Now annotation is also working

In [12]:
annotate_image("test.jpg",detect)

array([[[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       ...,

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        ...,
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255]]