## Robot@Home 2 - Processing images with YOLO `v0.1`

`R@H2 notebook series`   

<a href="https://colab.research.google.com/github/goyoambrosio/RobotAtHome2/blob/master/notebooks/130-Processing-images-with-YOLO.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>



### Getting started



Install Robot@Home2 Toolbox using the Python package manager



In [None]:
!pip install robotathome

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting robotathome
  Downloading robotathome-1.1.3-py3-none-any.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
Collecting loguru
  Downloading loguru-0.7.0-py3-none-any.whl (59 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.0/60.0 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: loguru, robotathome
Successfully installed loguru-0.7.0 robotathome-1.1.3


Now, let's mount Google Drive (more info in [this notebook](https://colab.research.google.com/github/goyoambrosio/RobotAtHome2/blob/master/notebooks/05-Google-colab-drive.ipynb)) and instantiate the RobotAtHome class.

In [None]:
from google.colab import drive
from robotathome import RobotAtHome
from robotathome import logger, log, set_log_level

# Let's mount Google Drive
drive.mount('/content/drive')

# Then copy the provided configutarion file to the current directory (/content)
!cp /content/drive/MyDrive/R@H2-2.0.3/notebooks/.rh .

# And create an instance of the RobotAtHome class
try: 
      rh = RobotAtHome()
except:
      logger.error("Something was wrong")


Mounted at /content/drive


[32m2023-04-17 18:34:01.489[0m | [32m[1mSUCCESS [0m | [36mrobotathome.core.reader[0m:[36m__open_dataset[0m:[36m141[0m - [32m[1mConnection is established: rh.db[0m


### Iterating over RGBD images coming from multiple cameras




We already know how to iterate over RGBD images coming from multiple cameras. We continue from the previous example where we have build a main loop to iterate and concatenate images.

In [None]:
from robotathome import filter_sensor_observations
from robotathome import composed_RGBD_images
from robotathome import concat_images
from robotathome import process_image

# cv2 causes some trouble to Colab so they provide a patch
# Usually you'll write: from cv2 import imshow
from google.colab.patches import cv2_imshow

log.set_log_level('INFO')  # SUCCESS is the default

# Fill the variables that constitutes selection filter
home_session_name = 'alma-s1'
home_subsession = 0
room_name = 'alma_masterroom1'
sensor_list = ['RGBD_3', 'RGBD_4', 'RGBD_1', 'RGBD_2'] # Left to right order

# Get the labeled RGB-D observations dataframe
df = rh.get_sensor_observations('lblrgbd')

# Filter the dataframe and get a dictionary with a dataframe per sensor
df_dict = filter_sensor_observations(rh, df,
                                     home_session_name,
                                     home_subsession,
                                     room_name,
                                     sensor_list)

logger.info(f"Labeled RGBD set: {len(df)} observations")
for sensor_name in sensor_list:
    logger.info(f"No. of RGBD observations int the filtered subset for the sensor {sensor_name}: {len(df_dict[sensor_name])} observations")

[32m2023-04-17 18:34:07.167[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 28>[0m:[36m28[0m - [1mLabeled RGBD set: 32937 observations[0m
[32m2023-04-17 18:34:07.168[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 29>[0m:[36m30[0m - [1mNo. of RGBD observations int the filtered subset for the sensor RGBD_3: 299 observations[0m
[32m2023-04-17 18:34:07.174[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 29>[0m:[36m30[0m - [1mNo. of RGBD observations int the filtered subset for the sensor RGBD_4: 299 observations[0m
[32m2023-04-17 18:34:07.175[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 29>[0m:[36m30[0m - [1mNo. of RGBD observations int the filtered subset for the sensor RGBD_1: 299 observations[0m
[32m2023-04-17 18:34:07.178[0m | [1mINFO    [0m | [36m__main__[0m:[36m<cell line: 29>[0m:[36m30[0m - [1mNo. of RGBD observations int the filtered subset for the sensor RGBD_2: 299 observations[0m


In Google Colab, reading files from the mounted drive takes time, so for this example we will select only a few images.

In [None]:
# For this example, we will only select a few images
df_RGBD_N_only_some_frames = {}
for sensor_name in sensor_list:
    df_RGBD_N_only_some_frames[sensor_name] = df_dict[sensor_name][30:40] # ~ 1 sec

### Processing with YOLO

As we know from the previous example to make a video we need to write a function to get a video handler:

In [None]:
import cv2 as cv

def get_rh_video_handler(rh_dataset, filename, sensor_names):
    """Return a video handler."""
    sensor_size = rh_dataset.get_RGBD_sensor_size()
    fourcc = cv.VideoWriter_fourcc(*'MJPG')
    out = cv.VideoWriter(filename,
                          fourcc,
                          rh_dataset.get_RGBD_fps(),
                          (len(sensor_names)*sensor_size['w'], sensor_size['h']))
    return out


We want to show the detection boxes from the YOLO process so we need to write some helper functions:

In [None]:
def box_label(image, box, label='', color=(128, 128, 128), txt_color=(255, 255, 255)):
    # https://inside-machinelearning.com/en/bounding-boxes-python-function/
    lw = max(round(sum(image.shape) / 2 * 0.003), 2)
    p1, p2 = (int(box[0]), int(box[1])), (int(box[2]), int(box[3]))
    cv.rectangle(image, p1, p2, color, thickness=lw, lineType=cv.LINE_AA)
    if label:
        tf = max(lw - 1, 1)  # font thickness
        w, h = cv.getTextSize(label, 0, fontScale=lw / 3, thickness=tf)[0]  # text width, height
        outside = p1[1] - h >= 3
        p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3
        cv.rectangle(image, p1, p2, color, -1, cv.LINE_AA)  # filled
        cv.putText(image,
                   label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2),
                   0,
                   lw / 3,
                   txt_color,
                   thickness=tf,
                   lineType=cv.LINE_AA)


def plot_bboxes(image, boxes, labels=[], colors=[], score=True, conf=None):
    # https://inside-machinelearning.com/en/bounding-boxes-python-function/
    # Define COCO Labels
    if labels == []:
        labels = {0: u'__background__', 1: u'person', 2: u'bicycle',3: u'car', 4: u'motorcycle', 5: u'airplane', 6: u'bus', 7: u'train', 8: u'truck', 9: u'boat', 10: u'traffic light', 11: u'fire hydrant', 12: u'stop sign', 13: u'parking meter', 14: u'bench', 15: u'bird', 16: u'cat', 17: u'dog', 18: u'horse', 19: u'sheep', 20: u'cow', 21: u'elephant', 22: u'bear', 23: u'zebra', 24: u'giraffe', 25: u'backpack', 26: u'umbrella', 27: u'handbag', 28: u'tie', 29: u'suitcase', 30: u'frisbee', 31: u'skis', 32: u'snowboard', 33: u'sports ball', 34: u'kite', 35: u'baseball bat', 36: u'baseball glove', 37: u'skateboard', 38: u'surfboard', 39: u'tennis racket', 40: u'bottle', 41: u'wine glass', 42: u'cup', 43: u'fork', 44: u'knife', 45: u'spoon', 46: u'bowl', 47: u'banana', 48: u'apple', 49: u'sandwich', 50: u'orange', 51: u'broccoli', 52: u'carrot', 53: u'hot dog', 54: u'pizza', 55: u'donut', 56: u'cake', 57: u'chair', 58: u'couch', 59: u'potted plant', 60: u'bed', 61: u'dining table', 62: u'toilet', 63: u'tv', 64: u'laptop', 65: u'mouse', 66: u'remote', 67: u'keyboard', 68: u'cell phone', 69: u'microwave', 70: u'oven', 71: u'toaster', 72: u'sink', 73: u'refrigerator', 74: u'book', 75: u'clock', 76: u'vase', 77: u'scissors', 78: u'teddy bear', 79: u'hair drier', 80: u'toothbrush'}
    # Define colors
    if colors == []:
        # colors = [(6, 112, 83), (253, 246, 160), (40, 132, 70), (205, 97, 162), (149, 196, 30), (106, 19, 161), (127, 175, 225), (115, 133, 176), (83, 156, 8), (182, 29, 77), (180, 11, 251), (31, 12, 123), (23, 6, 115), (167, 34, 31), (176, 216, 69), (110, 229, 222), (72, 183, 159), (90, 168, 209), (195, 4, 209), (135, 236, 21), (62, 209, 199), (87, 1, 70), (75, 40, 168), (121, 90, 126), (11, 86, 86), (40, 218, 53), (234, 76, 20), (129, 174, 192), (13, 18, 254), (45, 183, 149), (77, 234, 120), (182, 83, 207), (172, 138, 252), (201, 7, 159), (147, 240, 17), (134, 19, 233), (202, 61, 206), (177, 253, 26), (10, 139, 17), (130, 148, 106), (174, 197, 128), (106, 59, 168), (124, 180, 83), (78, 169, 4), (26, 79, 176), (185, 149, 150), (165, 253, 206), (220, 87, 0), (72, 22, 226), (64, 174, 4), (245, 131, 96), (35, 217, 142), (89, 86, 32), (80, 56, 196), (222, 136, 159), (145, 6, 219), (143, 132, 162), (175, 97, 221), (72, 3, 79), (196, 184, 237), (18, 210, 116), (8, 185, 81), (99, 181, 254), (9, 127, 123), (140, 94, 215), (39, 229, 121), (230, 51, 96), (84, 225, 33), (218, 202, 139), (129, 223, 182), (167, 46, 157), (15, 252, 5), (128, 103, 203), (197, 223, 199), (19, 238, 181), (64, 142, 167), (12, 203, 242), (69, 21, 41), (177, 184, 2), (35, 97, 56), (241, 22, 161)]
        colors = [(89, 161, 197),(67, 161, 255),(19, 222, 24),(186, 55, 2),(167, 146, 11),(190, 76, 98),(130, 172, 179),(115, 209, 128),(204, 79, 135),(136, 126, 185),(209, 213, 45),(44, 52, 10),(101, 158, 121),(179, 124, 12),(25, 33, 189),(45, 115, 11),(73, 197, 184),(62, 225, 221),(32, 46, 52),(20, 165, 16),(54, 15, 57),(12, 150, 9),(10, 46, 99),(94, 89, 46),(48, 37, 106),(42, 10, 96),(7, 164, 128),(98, 213, 120),(40, 5, 219),(54, 25, 150),(251, 74, 172),(0, 236, 196),(21, 104, 190),(226, 74, 232),(120, 67, 25),(191, 106, 197),(8, 15, 134),(21, 2, 1),(142, 63, 109),(133, 148, 146),(187, 77, 253),(155, 22, 122),(218, 130, 77),(164, 102, 79),(43, 152, 125),(185, 124, 151),(95, 159, 238),(128, 89, 85),(228, 6, 60),(6, 41, 210),(11, 1, 133),(30, 96, 58),(230, 136, 109),(126, 45, 174),(164, 63, 165),(32, 111, 29),(232, 40, 70),(55, 31, 198),(148, 211, 129),(10, 186, 211),(181, 201, 94),(55, 35, 92),(129, 140, 233),(70, 250, 116),(61, 209, 152),(216, 21, 138),(100, 0, 176),(3, 42, 70),(151, 13, 44),(216, 102, 88),(125, 216, 93),(171, 236, 47),(253, 127, 103),(205, 137, 244),(193, 137, 224),(36, 152, 214),(17, 50, 238),(154, 165, 67),(114, 129, 60),(119, 24, 48),(73, 8, 110)]

    # plot each boxes
    for box in boxes:
        # add score in label if score=True
        if score:
            label = labels[int(box[-1])+1] + " " + str(round(100 * float(box[-2]),1)) + "%"
        else:
            label = labels[int(box[-1])+1]
        # filter every box under conf threshold if conf threshold setted
        if conf:
            if box[-2] > conf:
                color = colors[int(box[-1])]
                box_label(image, box, label, color)
        else:
            color = colors[int(box[-1])]
            box_label(image, box, label, color)
    return image

For YOLO processing we need to install the ultralytics package:

In [None]:
!pip install ultralytics

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ultralytics
  Downloading ultralytics-8.0.81-py3-none-any.whl (527 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m527.0/527.0 kB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
Collecting thop>=0.1.1
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Collecting sentry-sdk
  Downloading sentry_sdk-1.19.1-py2.py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.2/199.2 kB[0m [31m26.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentry-sdk, thop, ultralytics
Successfully installed sentry-sdk-1.19.1 thop-0.1.1.post2209072238 ultralytics-8.0.81


We are ready to write the function that we will pass to our main loop. The function `apply_model_ultralytics` detect objects in each image

In [None]:
def apply_model_ultralytics(img_dict, model):
    img_list = list(img_dict.values())
    img_list_result = []
    for img in img_list:
        results = model.predict(img, verbose=None)
        img_result = plot_bboxes(img, results[0].boxes.data)
        img_list_result.append(img_result)
    composed_img = cv.hconcat(img_list_result)
    return composed_img

The main loop iterates over the images applying the prediction model. 

In [None]:
from ultralytics import YOLO
import torch

torch.device('cuda') # 'cpu','cuda'
model = YOLO("yolov8n.pt")

video_filename = 'myYOLOvideo.avi'

 # Getting video handler
video_handler = get_rh_video_handler(rh,
                                     video_filename, #video_path_filename,
                                     sensor_list)

# Iterate over the dictionary of dataframes, i.e. frame by frame
for (RGB_image_dict, D_image_dict) in composed_RGBD_images(rh, df_RGBD_N_only_some_frames):
    def f(img_dict):
        # return my_function(img_dict, par1,... parn)
        return apply_model_ultralytics(img_dict, model)

    # Apply f to img_dict
    resulting_img = process_image(f, RGB_image_dict)
    # Add the resulting image to the video
    video_handler.write(resulting_img)
# Closing video file
video_handler.release()

Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to yolov8n.pt...
100%|██████████| 6.23M/6.23M [00:00<00:00, 154MB/s]


In [None]:
[_, _, _, wspc_path, _] = rh.get_path_vars()
!cp $video_filename $wspc_path
!ls $wspc_path/*.avi

'/content/drive/MyDrive/Colab Notebooks/myvideo.avi'
'/content/drive/MyDrive/Colab Notebooks/myYOLOvideo.avi'
