## 📚 Libraries

In [16]:
import os
import cv2
import shutil
from ultralytics import YOLO
from pyzbar.pyzbar import decode
from sklearn.model_selection import train_test_split

This following code organizes a dataset for training and validation in a YOLOv8 project. It starts by defining the directory paths for training and validation images and labels. It then ensures that the validation directories exist, creating them if necessary. Next, it retrieves a list of all image files in the training images directory and splits them into training and validation sets using an 80-20 ratio. The corresponding images and their associated label files (text files with annotations) for the validation set are moved to their respective validation directories. Finally, a message confirms that the validation dataset has been successfully created.

In [17]:
data_dir = "../data/"
images_dir = os.path.join(data_dir, "train/images")
labels_dir = os.path.join(data_dir, "train/labels")
valid_images_dir = os.path.join(data_dir, "valid/images")
valid_labels_dir = os.path.join(data_dir, "valid/labels")

os.makedirs(valid_images_dir, exist_ok=True)
os.makedirs(valid_labels_dir, exist_ok=True)

image_files = [f for f in os.listdir(images_dir) if f.endswith(".jpg")]

train_images, valid_images = train_test_split(image_files, test_size=0.2, random_state=42)

for img_file in valid_images:
    shutil.move(os.path.join(images_dir, img_file), os.path.join(valid_images_dir, img_file))
    label_file = img_file.replace(".jpg", ".txt")
    shutil.move(os.path.join(labels_dir, label_file), os.path.join(valid_labels_dir, label_file))

print("Validation dataset created successfully.")

Validation dataset created successfully.


This next code block initializes and trains a YOLOv8 model for object detection. It begins by importing the YOLO class from the Ultralytics library and loading a pre-trained YOLOv8n model, which serves as the starting point for training. The train method is then called to fine-tune the model using a custom dataset specified in the data.yaml file. Key training parameters include 50 epochs, an image size of 640 pixels and a batch size of 16. These settings determine the training duration, input image resolution and the number of images processed in each training batch, respectively.

In [18]:
from ultralytics import YOLO
 
model = YOLO("yolov8n.pt") 
 
model.train(
    data="../data/data.yaml",
    epochs=50,
    imgsz=640,
    batch=16
)

New https://pypi.org/project/ultralytics/8.3.150 available  Update with 'pip install -U ultralytics'
Ultralytics 8.3.144  Python-3.10.6 torch-2.7.0+cpu CPU (11th Gen Intel Core(TM) i7-1165G7 2.80GHz)
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=../data/data.yaml, degrees=0.0, deterministic=True, device=cpu, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=50, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train8, nbs=64, nms=False, opset=No

[34m[1mtrain: [0mScanning C:\Users\Kaloyan\Documents\GitHub\product-tracking\data\train\labels... 336 images, 1 backgrounds, 0 corrupt: 100%|██████████| 336/336 [00:03<00:00, 104.76it/s]


[34m[1mtrain: [0mNew cache created: C:\Users\Kaloyan\Documents\GitHub\product-tracking\data\train\labels.cache




[34m[1mval: [0mFast image access  (ping: 0.30.1 ms, read: 2.40.5 MB/s, size: 51.8 KB)


[34m[1mval: [0mScanning C:\Users\Kaloyan\Documents\GitHub\product-tracking\data\valid\labels... 733 images, 18 backgrounds, 0 corrupt: 100%|██████████| 733/733 [00:07<00:00, 104.05it/s]


[34m[1mval: [0mNew cache created: C:\Users\Kaloyan\Documents\GitHub\product-tracking\data\valid\labels.cache




Plotting labels to runs\detect\train8\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001667, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mruns\detect\train8[0m
Starting training for 50 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


  0%|          | 0/21 [00:01<?, ?it/s]


KeyboardInterrupt: 

# Model Evaluation (Company Provided Data Only)

The model shows consistent improvement over the course of training. Loss values for bounding boxes, classification, and distribution focal loss steadily decrease, indicating that the model is learning effectively. Precision, recall, and mAP metrics improve as well, demonstrating better detection performance with more epochs and data. Early results start modestly, but as training continues, the model achieves higher accuracy and fewer false positives. GPU memory usage remains low and stable throughout training. Overall, the model performs reliably and shows clear progress as it trains on increasing amounts of data.

# Model Evaluation (Data, collected on last Logicall Trip)

This updated model performs noticeably better than the previous one, largely thanks to being trained on a larger dataset. One of the main issues however was the time it took us to train the model. In this case (with the company provided data and the data from the 2 cameras from our latest Logicall Trip) it took us 10 hours and 45 Minutes. With more examples to learn from, the model picks up patterns faster and more accurately. The loss values—covering bounding boxes, classification, and focal loss—drop more smoothly and settle lower than before, showing that the model is learning efficiently. We also see solid improvements in key metrics like precision, recall, and mAP, meaning it’s doing a better job of correctly identifying objects and reducing mistakes. Compared to earlier results, it's more accurate and consistent, even in trickier cases. Despite training on more data, GPU usage stayed stable, which is a nice bonus. Overall, adding more data has made the model stronger, smarter, and more reliable.



## 🎯 Barcode Reading

In [None]:
def scan_barcode():
    # webcam
    cap = cv2.VideoCapture(0)
    scanned_barcodes = set()
    print("Press 'x' to exit the barcode scanner.")

    while True:
        ret, frame = cap.read()
        if not ret:
            print("Error: Failed to capture frame.")
            break

        barcodes = decode(frame)
        for barcode in barcodes:
            # extract barcode data
            barcode_data = barcode.data.decode('utf-8')

            # skip if the barcode has already been scanned
            if barcode_data in scanned_barcodes:
                continue

            # add new barcode to the set
            scanned_barcodes.add(barcode_data)

            # draw a rectangle around the detected barcode
            (x, y, w, h) = barcode.rect
            cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

            text = f"{barcode_data}"
            cv2.putText(frame, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

            print(f"Barcode detected: {barcode_data}")

        cv2.imshow("Barcode Scanner", frame)

        # break the loop when 'x' is pressed
        if cv2.waitKey(1) & 0xFF == ord('x'):
            break

    cap.release()
    cv2.destroyAllWindows()

scan_barcode()

# Box/Product Detection

In [24]:
import cv2
import os
from ultralytics import YOLO
from datetime import timedelta

# === CONFIGURATION ===
model_path = "../runs/detect/train9/weights/best.pt"
input_video_path = "../videos/before/_2025-05-28_11_44_21_572.mp4"
output_dir = "../videos/test/"
output_filename = "testtest.mp4"
output_video_path = os.path.join(output_dir, output_filename)

DISTANCE_THRESHOLD = 150  # pixels
DISAPPEAR_TIME_THRESHOLD = 2.0  # seconds

# === SETUP ===
os.makedirs(output_dir, exist_ok=True)
model = YOLO(model_path)
cap = cv2.VideoCapture(input_video_path)

width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)

# Initialize video writer
save_output = True
if save_output:
    fourcc = cv2.VideoWriter_fourcc(*"mp4v")
    out = cv2.VideoWriter(output_video_path, fourcc, fps, (width, height))

# Color map per class
color_map = {
    0: (151, 86, 4),     # Box
    1: (176, 42, 176),   # Product
}

# Tracking containers
id_counters = {0: 0, 1: 0}
tracked_objects = {0: {}, 1: {}}

# === HELPER FUNCTIONS ===
def get_center(box):
    x1, y1, x2, y2 = box
    return ((x1 + x2) // 2, (y1 + y2) // 2)

def euclidean_distance(p1, p2):
    return ((p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2) ** 0.5

# === MAIN LOOP ===
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    timestamp_sec = cap.get(cv2.CAP_PROP_POS_MSEC) / 1000.0
    timestamp_str = str(timedelta(seconds=timestamp_sec)).split('.')[0]

    # Draw timestamp on frame
    cv2.putText(frame, f"Time: {timestamp_str}", (10, height - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)

    # Mark all tracked objects as unseen
    for cls_id in tracked_objects:
        for obj in tracked_objects[cls_id].values():
            obj["seen"] = False

    # Run YOLOv8 detection
    results = model(frame)[0]
    boxes = results.boxes.xyxy.cpu().numpy()
    scores = results.boxes.conf.cpu().numpy()
    class_ids = results.boxes.cls.cpu().numpy().astype(int)

    # Process each detection
    for box, score, cls_id in zip(boxes, scores, class_ids):
        if cls_id not in (0, 1):
            continue

        x1, y1, x2, y2 = map(int, box)
        center = get_center((x1, y1, x2, y2))

        best_match_id = None
        min_distance = float('inf')

        for obj_id, obj_data in tracked_objects[cls_id].items():
            dist = euclidean_distance(center, obj_data["center"])
            print(f"[DEBUG] Matching class {cls_id}: {center} ↔ {obj_data['center']} = {dist:.2f}")
            if dist < DISTANCE_THRESHOLD and dist < min_distance:
                best_match_id = obj_id
                min_distance = dist

        if best_match_id is not None:
            # Update existing tracked object
            tracked_objects[cls_id][best_match_id].update({
                "center": center,
                "last_seen": timestamp_sec,
                "seen": True
            })
            assigned_id = best_match_id
            print(f"[DEBUG] 🔄 Updated ID {assigned_id} for class {cls_id} at {center}")
        else:
            # Assign new ID
            assigned_id = id_counters[cls_id]
            tracked_objects[cls_id][assigned_id] = {
                "center": center,
                "first_seen": timestamp_sec,
                "last_seen": timestamp_sec,
                "seen": True
            }
            id_counters[cls_id] += 1
            print(f"[DEBUG] ➕ New ID {assigned_id} for class {cls_id} at {center}")

        # Draw box and label
        detect_time_str = str(timedelta(seconds=tracked_objects[cls_id][assigned_id]["first_seen"])).split('.')[0]
        label = f"{'Box' if cls_id == 0 else 'Product'} ID {assigned_id} ({detect_time_str})"
        color = color_map.get(cls_id, (255, 255, 255))

        cv2.rectangle(frame, (x1, y1), (x2, y2), color, 2)
        cv2.putText(frame, label, (x1, y1 - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.55, color, 2)

    # Cleanup disappeared objects
    current_time = timestamp_sec
    for cls_id in tracked_objects:
        to_remove = []
        for obj_id, obj in tracked_objects[cls_id].items():
            if not obj["seen"] and (current_time - obj["last_seen"]) > DISAPPEAR_TIME_THRESHOLD:
                print(f"[DEBUG] ❌ Removing ID {obj_id} for class {cls_id} (unseen for {current_time - obj['last_seen']:.2f}s)")
                to_remove.append(obj_id)
        for obj_id in to_remove:
            del tracked_objects[cls_id][obj_id]

    # Show and save output
    cv2.imshow("YOLOv8 Product Detection", frame)
    if save_output:
        out.write(frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# === CLEANUP ===
cap.release()
if save_output:
    out.release()
cv2.destroyAllWindows()



0: 480x640 (no detections), 55.4ms
Speed: 3.5ms preprocess, 55.4ms inference, 0.5ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 (no detections), 59.3ms
Speed: 2.5ms preprocess, 59.3ms inference, 0.6ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 (no detections), 69.2ms
Speed: 2.7ms preprocess, 69.2ms inference, 0.5ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 (no detections), 56.4ms
Speed: 2.7ms preprocess, 56.4ms inference, 0.5ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 (no detections), 52.8ms
Speed: 2.4ms preprocess, 52.8ms inference, 0.6ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 (no detections), 50.2ms
Speed: 2.5ms preprocess, 50.2ms inference, 0.5ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 (no detections), 52.0ms
Speed: 3.0ms preprocess, 52.0ms inference, 0.5ms postprocess per image at shape (1, 3, 480, 640)

0: 480x640 (no detections), 51.0ms
Speed: 2.5ms preprocess, 51.0ms i