# Intelligent Queue Management with OpenVINO™

An intelligent queue management system is a system that provides real-time insights to businesses to manage checkout queues in an intelligent and efficient way. The system can provide real-time information to customers about queue lengths and average customer count, allowing them to plan their time better. It can also optimize the use of resources, such as staff and equipment, to reduce wait times and increase customer satisfaction.

By analyzing data such as wait times, queue lengths, and customer behavior, businesses can make informed decisions to optimize their operations, reduce wait times, and improve the overall customer experience. 

This notebook demonstrates an intelligent queue management system that utilizes the YOLOv8 object detection model, which can be found in this [repository](https://github.com/ultralytics/ultralytics). The system is designed to detect people in a video stream and count the number of individuals in a queue at any given time. By leveraging this real-time data on queue lengths and average customer count, the system can optimize queue management and reduce waiting times, ultimately leading to improved customer experiences.

Please to note that if you intend to use a webcam for this system, you will need to run the notebook on a computer with a built-in or external webcam. However, if you're running the notebook on a server, you can still perform inference on a pre-recorded video file.

> **NOTE**: To use this notebook with a webcam, you need to run the notebook on a computer with a webcam. If you run the notebook on a server, the webcam will not work. However, you can still do inference on a video.

## Imports

In [None]:
import time
import sys
import json
import logging as log
from collections import defaultdict, deque
from typing import Tuple, List, Union

import cv2
import numpy as np
import supervision as sv
import torch
from openvino import runtime as ov
from ultralytics.yolo.utils import ops

sys.path.append('../')
import utils

log.basicConfig(level=log.INFO)

### Preprocessing

The model input is a tensor with the shape `[1, 3, 640, 640]` in the `N, C, H, W` format, where:
* `N` - the number of images in a batch (batch size)
* `C` - the number of image channels
* `H` - the image height
* `W` - the image width

The model expects images in RGB channel format and normalized in the `[0, 1]` range. Although the YOLOv8 model itself supports dynamic input shapes while preserving input divisibility by 32, it is recommended to use static shapes, such as `640x640`, for better efficiency. To resize images to fit the model size, the `letterbox` resize approach is used, where the aspect ratio of width and height is preserved.

To maintain a specific shape, preprocessing automatically enables padding.

In [None]:
def letterbox(img: np.ndarray, new_shape: Tuple[int, int]) -> Tuple[np.ndarray, Tuple[float, float], Tuple[int, int]]:
    """
        Resize image and padding for detection. Takes image as input,
         resizes image to fit into new shape with saving original aspect ratio and pads it to meet stride-multiple constraints

        Parameters:
          img: image for preprocessing
          new_shape: image size after preprocessing in format [width, height]
        Returns:
          img: image after preprocessing
          ratio: hight and width scaling ratio
          padding_size: height and width padding size
    """
    # Resize and pad image while meeting stride-multiple constraints
    shape = img.shape[1::-1]  # current shape [width, height]

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[0] * r)), int(round(shape[1] * r))
    dw, dh = new_shape[0] - new_unpad[0], new_shape[1] - new_unpad[1]  # wh padding

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape != new_unpad:  # resize
        img = cv2.resize(img, dsize=new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(128, 128, 128))  # add border
    return img, ratio, (dw, dh)

The `preprocess` function prepares an image for detection using a YOLOv8 model. First, it adds padding to the image to meet the input size requirements of the model. Then, the function converts the image to float32 format and normalizes it to a range of (0,1). Additionally, the data layout of the image is changed from HWC to CHW, and an extra dimension is added before returning it for inference.

In [None]:
def preprocess(image: np.ndarray, input_size: Tuple[int, int]) -> np.ndarray:
    """
        Preprocess image according to YOLOv8 input requirements.

        Parameters:
          image: image for preprocessing
          input_size: image size after preprocessing in format [width, height]
        Returns:
          img: image after preprocessing
    """
    # add padding to the image
    image = letterbox(image, new_shape=input_size)[0]
    # convert to float32
    image = image.astype(np.float32)
    # normalize to (0, 1)
    image /= 255.0
    # changes data layout from HWC to CHW
    image = image.transpose((2, 0, 1))
    # add one more dimension
    image = np.expand_dims(image, axis=0)
    return image

### Postprocessing

The model output contains detection boxes candidates, it is a tensor with shape `[1,84,8400]` in format `B,84,N` where:

- `B` - batch size
- `N` - number of detection boxes

Detection box (second dimension) has format [`x`, `y`, `h`, `w`, `class_no_1`, ..., `class_no_80`], where:

- (`x`, `y`) - raw coordinates of box center
- `h`, `w` - raw height and width of the box
- `class_no_1`, ..., `class_no_80` - probability distribution over the classes.

Fot getting the final prediction, we apply a non-maximum suppression algorithm and rescale box coordinates to the original image size. The function also filters out other predictions than people and returns detected boxes in `supervision` (dependency) format.

The function takes the following parameters:

- `pred_boxes`: model output prediction boxes
- `input_size`: image size after preprocessing in format [width, height]
- `orig_img`: image before preprocessing
- `min_conf_threshold`: minimal accepted confidence for object filtering (default: 0.25)
- `nms_iou_threshold`: minimal overlap score for removing objects duplicates in NMS (default: 0.75)
- `agnostic_nms`: apply class agnostic NMS approach or not (default: False)
- `max_detections`: maximum detections after NMS (default: 100)

The function returns a list of detected boxes in supervision format (sv.Detections), with the following attributes:

- `xyxy`: a numpy array of shape `(N, 4)` containing the coordinates of the detected boxes in format `[x0, y0, x1, y1]`
- `confidence`: a numpy array of shape `(N,)` containing the confidence scores of the detected boxes
- `class_id`: a numpy array of shape `(N,)` containing the class IDs of the detected boxes (only `0` for people in this case).

In [None]:
def postprocess(pred_boxes: np.ndarray, input_size: Tuple[int, int], orig_img, min_conf_threshold=0.25, nms_iou_threshold=0.75, agnostic_nms=False, max_detections=100) -> sv.Detections:
    """
        YOLOv8 model postprocessing function. Applied non-maximum supression algorithm to detections and rescale boxes to original image size,
         filtering out other classes than person

         Parameters:
            pred_boxes: model output prediction boxes
            input_size: image size after preprocessing in format [width, height]
            orig_img: image before preprocessing
            min_conf_threshold: minimal accepted confidence for object filtering
            nms_iou_threshold: minimal overlap score for removing objects duplicates in NMS
            agnostic_nms: apply class agnostinc NMS approach or not
            max_detections:  maximum detections after NMS
        Returns:
           det: list of detected boxes in sv.Detections format
    """
    nms_kwargs = {"agnostic": agnostic_nms, "max_det": max_detections}
    # non-maximum suppresion
    pred = ops.non_max_suppression(torch.from_numpy(pred_boxes), min_conf_threshold, nms_iou_threshold, nc=80, **nms_kwargs)[0]

    # no predictions in the image
    if not len(pred):
        return sv.Detections(xyxy=np.empty((0, 4), dtype=np.float32), confidence=np.array([], dtype=np.float32), class_id=np.array([], dtype=int))

    # transform boxes to pixel coordinates
    pred[:, :4] = ops.scale_boxes(input_size, pred[:, :4], orig_img.shape).round()
    # numpy array from torch tensor
    pred = np.array(pred)
    # create detections in supervision format
    det = sv.Detections(pred[:, :4], pred[:, 5], pred[:, 4])
    # filter out other predictions than people
    return det[det.class_id == 0]

### Load the Model

Only a few lines of code are required to run the model. First, initialize OpenVINO Runtime. Then, read the network architecture and model weights from the `.bin` and `.xml` files to compile for the desired device. If you choose `GPU` you need to wait for a while, as the startup time is a little longer than in the case of `CPU`.

There is a possibility to allow OpenVINO to decide which hardware offers the best performance. In that case, just use `AUTO` to automatically select the best available device for the model based on the system configuration and available devices.

The config parameter specifies additional configuration options for the model compilation. In this case, the `PERFORMANCE_HINT` option is set to `LATENCY` mode, which optimizes the model for low latency inference, another option is to set to `THROUGHPUT` or `CUMULATIVE_THROUGHPUT` mode.

In [None]:
def get_model(model_path: str) -> ov.CompiledModel:
    """
        Initialize OpenVINO and compile model for latency processing

        Parameters:
            model_path: path to the model to load
        Returns:
           model: compiled and ready OpenVINO model
    """
    # initialize OpenVINO
    core = ov.Core()
    # read the model from file
    model = core.read_model(model_path)
    # compile the model for latency mode
    model = core.compile_model(model, device_name="AUTO", config={"PERFORMANCE_HINT": "LATENCY"})

    return model

This function reads a JSON file containing a list of zones, where the keys are the zone names and the values are lists of points defining the zone. It returns a list of numpy arrays, where each array contains the points of a zone in the format `(n, 2)`. `n` is the number of points and the second dimension consist of two values, the x and y coordinates of each point.

In [None]:
def load_zones(json_path: str) -> List[np.ndarray]:
    """
        Load zones specified in an external json file

        Parameters:
            json_path: path to the json file with defined zones
        Returns:
           zones: a list of arrays with zone points
    """
    # load json file
    with open(json_path) as f:
        zones_dict = json.load(f)

    # return a list of zones defined by points
    return [np.array(zone["points"], np.int32) for zone in zones_dict.values()]

This function takes a path to a JSON file that defines zones and their boundaries, as well as the resolution of the frame. It loads the zones, assigns colors to them, and creates `PolygonZone`, `PolygonZoneAnnotator`, and `BoxAnnotator` objects for each zone. These objects are returned in three lists: `zones`, `zone_annotators`, and `box_annotators`. The objects are used for visualizing and counting people in the specified zones.

In [None]:
def get_annotators(json_path: str, resolution_wh: Tuple[int, int]) -> Tuple[List, List, List]:
    """
        Load zones specified in an external json file

        Parameters:
            json_path: path to the json file with defined zones
            resolution_wh: width and height of the frame
        Returns:
           zones, zone_annotators, box_annotators: lists of zones and their annotators
    """
    # list of points
    polygons = load_zones(json_path)

    # colors for zones
    colors = sv.ColorPalette.default()

    zones = []
    zone_annotators = []
    box_annotators = []
    for index, polygon in enumerate(polygons):
        # a zone to count people in
        zone = sv.PolygonZone(polygon=polygon, frame_resolution_wh=resolution_wh)
        zones.append(zone)
        # the annotator - visual part of the zone
        zone_annotators.append(sv.PolygonZoneAnnotator(zone=zone, color=colors.by_idx(index), thickness=4))
        # box annotator, showing boxes around people
        box_annotators.append(sv.BoxAnnotator(color=colors.by_idx(index)))

    return zones, zone_annotators, box_annotators

The `draw_text` function calculates the size of the text and the size of the rectangle that will be drawn around the text based on the image size. It uses the `cv2.rectangle()` function to draw the rectangle and the `cv2.putText()` function to draw the text.

In [None]:
def draw_text(image, text, point, color=(255, 255, 255)) -> None:
    """
    Draws "Store assistant required" in the bottom-right corner

    Parameters:
        image: image to draw on
        text: text to draw
        point: top left corner of the text
        color: text color
    """
    _, f_width = image.shape[:2]
    text_size, _ = cv2.getTextSize(text, fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=f_width / 1500, thickness=2)

    rect_width = text_size[0] + 20
    rect_height = text_size[1] + 20
    rect_x, rect_y = point

    cv2.rectangle(image, pt1=(rect_x, rect_y), pt2=(rect_x + rect_width, rect_y + rect_height), color=(0, 0, 0), thickness=cv2.FILLED)

    text_x = rect_x + (rect_width - text_size[0]) // 2
    text_y = rect_y + (rect_height + text_size[1]) // 2

    cv2.putText(image, text=text, org=(text_x, text_y), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=f_width / 1500, color=color, thickness=2, lineType=cv2.LINE_AA)

### Main Processing Function

Run queue management on the specified source. Either a webcam or a video file.

In [None]:
def queue_management(video_path: Union[str, int], model_path: str = "../model/yolov8m_openvino_model/yolov8m.xml", zones_config_file: str = "zones.json", customers_limit: int = 3) -> None:
    """
    Main processing function.

    Parameters:
        video_path: path to the video file or camera number
        model_path: path to the object detection OV model
        zones_config_file: path to zones config JSON file
        customers_limit: limit of customers in every queue
    """
    # initialize and load model
    model = get_model(model_path)
    # input shape of the model (w, h, d)
    input_shape = tuple(model.inputs[0].shape)[:0:-1]

    # initialize video player to deliver frames
    if isinstance(video_path, str) and video_path.isnumeric():
        video_path = int(video_path)
    player = utils.VideoPlayer(video_path, fps=60)

    # get zones, and zone and box annotators for zones
    zones, zone_annotators, box_annotators = get_annotators(json_path=zones_config_file,
                                                            resolution_wh=(player.width, player.height))
    
    # people counter
    queue_count = defaultdict(deque)
    # keep at most 100 last times
    processing_times = deque(maxlen=100)
    
    # initialize a video writer with codec and fps
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter('output.mp4', fourcc, player.fps, (player.width, player.height))


    # start a video stream
    player.start()
    while True:
        # Grab the frame.
        frame = player.next()
        if frame is None:
            print("Source ended")
            break
        # If the frame is larger than full HD, reduce size to improve the performance.
        scale = 1280 / max(frame.shape)
        if scale < 1:
            frame = cv2.resize(
                src=frame,
                dsize=None,
                fx=scale,
                fy=scale,
                interpolation=cv2.INTER_AREA,
            )
        # Get the results.
        frame = np.array(frame)
        f_height, f_width = frame.shape[:2]
        
        start_time = time.time()
        # preprocessing
        input_image = preprocess(image=frame, input_size=input_shape[:2])
        # prediction
        prediction = model(input_image)[model.outputs[0]]
        # postprocessing
        detections = postprocess(pred_boxes=prediction, input_size=input_shape[:2], orig_img=frame)
        processing_times.append(time.time() - start_time)

        # annotate the frame with the detected persons within each zone
        for zone_id, (zone, zone_annotator, box_annotator) in enumerate(zip(zones, zone_annotators, box_annotators), start=1):
            # visualize polygon for the zone
            frame = zone_annotator.annotate(scene=frame)

            # get detections relevant only for the zone
            mask = zone.trigger(detections=detections)
            detections_filtered = detections[mask]
            # visualize boxes around people in the zone
            frame = box_annotator.annotate(scene=frame, detections=detections_filtered, skip_label=True)
            # count how many people detected
            det_count = len(detections_filtered)

            # add the count to the list
            queue_count[zone_id].append(det_count)
            # store the results from last 300 frames (approx. 5-10s)
            if len(queue_count[zone_id]) > 300:
                queue_count[zone_id].popleft()
            # calculate the mean number of customers in the queue
            mean_customer_count = np.mean(queue_count[zone_id], dtype=np.int32)
            
            # add alert text to the frame if necessary
            if mean_customer_count > customers_limit:
                draw_text(frame, text=f"Store assistant required on cash desk {zone_id}!", point=(f_width // 2, f_height - 50), color=(0, 0, 255))

            # print an info about number of customers in the queue, ask for the more assistants if required
            log.info(f"Checkout queue: {zone_id}, avg customer count: {mean_customer_count} {'Store assistant required!' if mean_customer_count > customers_limit else ''}")

        # Mean processing time [ms].
        processing_time = np.mean(processing_times) * 1000
        fps = 1000 / processing_time

        draw_text(frame, text=f"Inference time: {processing_time:.0f}ms ({fps:.1f} FPS)", point=(f_width * 3 // 5, 10))

        # write frame to the output video
        out.write(frame)
        # show the output live
        cv2.imshow("Intelligent Queue Management System", frame)
        key = cv2.waitKey(1)
        # escape = 27 or 'q' to close the app
        if key == 27 or key == ord('q'):
            break

    # stop the stream
    player.stop()
    # clean-up windows
    cv2.destroyAllWindows()


### Run

#### Run Intelligent Queue Management

To run Intelligent Queue Management, use a webcam or a video file as the video input. By default, the primary webcam is set with `video_path=0`. If you have multiple webcams, each one will be assigned a consecutive number starting at 0.

If you do not have a webcam, you can still run this demo with a video file. Any [format supported by OpenCV](https://docs.opencv.org/4.5.1/dd/d43/tutorial_py_video_display.html) will work.

`queue_management()` is a function that uses the OpenVINO toolkit to perform object detection on a video file and count the number of customers in each zone based on the detected objects. It takes four arguments: the path to the video file, the path to the model file, the path to the JSON file containing the zone definitions, and the maximum number of customers allowed in a zone at any given time. The function displays the output video with the detected objects and zone annotations overlaid, and sends an alert notification to store management if any zones are over capacity.

> NOTE: To use this notebook with a webcam, you need to run the notebook on a computer with a webcam. If you run the notebook on a server (for example, Binder), the webcam will not work. Popup mode may not work if you run this notebook on a remote computer.

In [None]:
video_path = "../video_file.mp4"  # Provide Path to the video file or camera number (0, 1, 2, etc.)
model_path = "../model/yolov8m_openvino_model/yolov8m.xml"
zones_config_file = "../zones.json"
customers_limit = 3

queue_management(video_path, model_path, zones_config_file, customers_limit)