### This notebook is optionally accelerated with a GPU runtime.
### If you would like to use this acceleration, please select the menu option "Runtime" -> "Change runtime type", select "Hardware Accelerator" -> "GPU" and click "SAVE"

----------------------------------------------------------------------

# YOLOv5

*Author: Ultralytics*

**Ultralytics YOLOv5 🚀 for object detection, instance segmentation and image classification.**

_ | _
- | -
![alt](https://pytorch.org/assets/images/ultralytics_yolov5_img1.png) | ![alt](https://pytorch.org/assets/images/ultralytics_yolov5_img2.png)


## Before You Start

Start from a **Python>=3.8** environment with **PyTorch>=1.7** installed. To install PyTorch see [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/). To install YOLOv5 dependencies:

In [39]:
%%bash
pip install -U ultralytics



In [40]:
# Install YOLOv8 from ultralytics
!pip install ultralytics

# Install Flask and Flask-Ngrok for serving the app for future work
#!pip install flask flask-ngrok

# Install OpenCV for image processing
!pip install opencv-python

# Install Pillow for image handling
!pip install pillow




In [41]:
import cv2
import os
from tqdm import tqdm
import requests
import logging
import base64


In [42]:
## HERE IS THE INPUT VIDEO. YOU SET THIS.
input_video_path = '/content/Calssroom.MOV'

# API Key from platform.openai.com. YOU SET THIS TO YOUR API KEY.
api_key="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"

In [43]:
## OUTPUT FILES. AUTOMATICALLY SET.
output_framestext_file=input_video_path+"_frametext_out.txt"
output_summary_file=input_video_path+"_summary_out.txt"

## Frames folder. AUTOMATICALLY SET.
output_frames_dir="frames"

In [44]:
# Set up logging
logging.basicConfig(level=logging.DEBUG, filename='video_labeler.log', filemode='w', format='%(asctime)s - %(levelname)s - %(message)s')
console = logging.StreamHandler()
console.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)

In [47]:
## Given an image the following function
## returns OpenAI's response.
## The function uses the global variable api_key
## to communicate with OpenAI

def generate_image_description(image_path):
    with requests.Session() as session:
        # Convert image to base64
        with open(image_path, "rb") as image_file:
            image_base64 = base64.b64encode(image_file.read()).decode('utf-8')
        headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
        }

        payload = {
          "model": "gpt-4o",
          "messages": [
            {
              "role": "user",
              "content": [
                {
                  "type": "text",
                  "text": "What’s in this image?"
                },
                {
                  "type": "image_url",
                  "image_url": {
                    "url": f"data:image/jpeg;base64,{image_base64}"
                  }
                }
              ]
            }
          ],
          "max_tokens": 300
        }

        #response = requests.post("https://api.openai.com/v1/chat/completions",
        #                         headers=headers, json=payload)
        # I will create a session for each image.
        response = session.post("https://api.openai.com/v1/chat/completions",
                                    headers=headers, json=payload)

        if response.status_code == 200:
            description = response.json()['choices'][0]['message']['content'].strip()
            return description
        else:
            logging.error(f"OpenAI API error: {response.status_code}, {response.text}")
            return "Error in getting description"

In [48]:
# Given the output_framestext_file as the input parameter
# the function returns the summary text for all the frames.
# The function uses OpenAI API Key (in global variable api_key)

def summarizeViaOpenAI(frames_description_file):
    with requests.Session() as session:
        try:
            with open(frames_description_file,
                      'r', encoding='utf-8',
                      errors='replace') as file:
                file_content = file.read()
        except Exception as e:
            return f"An error occurred: {e}"

        headers = {
          "Content-Type": "application/json",
          "Authorization": f"Bearer {api_key}"
        }

        payload = {
          "model": "gpt-4o",
          "messages": [
            {
              "role": "user",
              "content": [
                {
                  "type": "text",
                  "text": "The content has descriptions of frames in a video. Please summarize the video."
                },
                {
                  "type": "text",
                  "text": file_content
                }
              ]
            }
          ],
          "max_tokens": 300
        }

        response = session.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
        return response.json()['choices'][0]['message']['content']

In [49]:
# Load video
cap = cv2.VideoCapture(input_video_path)
if not cap.isOpened():
    raise Exception(
        f"Error: Could not open input video file: {input_video_path}")

In [50]:
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
frame_interval = int(fps * 2)  # 2 seconds interval

In [52]:
# Create an output directory for frames
os.makedirs(output_frames_dir, exist_ok=True)

In [53]:
# Open the text file to write detected objects.
# Also, save the frames that are sent to OpenAI
with open(output_framestext_file,
          "w") as file, tqdm(total=int(cap.get(cv2.CAP_PROP_FRAME_COUNT)),
         desc="Processing frames") as pbar:
    frame_count = 0
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        if frame_count % frame_interval == 0:
            frame_path = os.path.join(output_frames_dir, f"frame_{frame_count}.jpg")
            cv2.imwrite(frame_path, frame)

            # Send frame to OpenAI API
            try:
                description = generate_image_description(frame_path)
                # Write object descriptions to file
                file.write(f"Frame {frame_count}: {description}\n")
                file.flush()
                logging.info(f"Processed frame {frame_count}: {description}")
            except Exception as e:
                logging.error(f"Error processing frame {frame_count} with OpenAI API: {e}")

        frame_count += 1
        pbar.update(1)
    cap.release()
    cv2.destroyAllWindows()

Processing frames: 100%|██████████| 240/240 [00:28<00:00,  8.54it/s]


error: OpenCV(4.10.0) /io/opencv/modules/highgui/src/window.cpp:1295: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvDestroyAllWindows'


In [54]:
summary = summarizeViaOpenAI(output_framestext_file)
print("THE SUMMARY OF THE VIDEO IS AS FOLLOWS: ")
print(summary)

THE SUMMARY OF THE VIDEO IS AS FOLLOWS: 
The video appears to depict a casual gathering of people in an indoor setting that resembles a classroom or office. Throughout the frames, individuals are shown engaged in interactions, either sitting or standing around tables filled with personal belongings and office supplies, such as notebooks, smartphones, disinfecting wipes, and keys. The environment features elements like bulletin boards, computer monitors, and posters, contributing to a professional yet relaxed atmosphere. The people in the video appear to be having informal conversations and some are seen eating snacks, suggesting a friendly and casual meeting or group activity.


In [55]:
## Write the summary to the file output_summary_file
try:
    with open(output_summary_file,
              'w', encoding='utf-8') as summaryfile:
        summaryfile.write(summary)
        summaryfile.flush()
    print(f"Content successfully written to {output_summary_file}")
except Exception as e:
    print(f"An error occurred: {e}")

Content successfully written to /content/Calssroom.MOV_summary_out.txt


In [57]:
import cv2
from tqdm import tqdm
!pip install ultralytics
from ultralytics import YOLO
import torchvision
import logging
import gc



In [59]:
## HERE IS THE INPUT VIDEO. SET THIS
input_video_path = '/content/Calssroom.MOV'

## OUTPUT FILES. AUTOMATICALLY SET.
output_video_path = input_video_path+"_out.avi"
output_text_file=input_video_path+"_out.txt"


In [60]:
# Set up logging
logging.basicConfig(
    level=logging.DEBUG,
    filename='video_labeler.log',
    filemode='w',
    format='%(asctime)s - %(levelname)s - %(message)s')
console = logging.StreamHandler()
console.setLevel(logging.DEBUG)
formatter = logging.Formatter(
    '%(asctime)s - %(levelname)s - %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)


In [61]:
try:
    # Load YOLOv8 model
    model = YOLO("yolov8n.pt")  # You can replace "YOLOv8x.pt"  with other YOLOv8 models like yolov8s.pt, etc. https://docs.ultralytics.com/tasks/detect/#models
    logging.info("YOLOv8 model loaded successfully.")

    # Load video
    cap = cv2.VideoCapture(input_video_path)
    if not cap.isOpened():
        raise Exception(
            f"Error: Could not open input video file: {input_video_path}")

    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = cap.get(cv2.CAP_PROP_FPS)

    # Define codec and create VideoWriter object
    fourcc = cv2.VideoWriter_fourcc(*'XVID')

    out = cv2.VideoWriter(
        output_video_path, fourcc, fps, (width, height))
    if not out.isOpened():
        raise Exception(
            f"Error: Could not open output video file for writing: {output_video_path}")

    logging.info(
        f"VideoWriter opened successfully: {out.isOpened()}")

    # Get total number of frames in the video
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    # Open the text file to write detected objects
    with open(output_text_file, "w") as file, tqdm(total=total_frames,
         desc="Processing frames") as pbar:
        frame_count = 0
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret:
                break

            # Detecting objects using YOLOv8
            try:
                results = model(frame, device='cpu')  # Use CPU for inference
                detected_objects = []

                for result in results:
                    boxes = result.boxes  # Boxes object for bbox outputs

                    if len(boxes.xyxy) > 0:
                        boxes_tensor = boxes.xyxy.to('cpu')
                        scores = boxes.conf.to('cpu')
                        nms_indices = torchvision.ops.nms(boxes_tensor, scores, 0.5)

                        for idx in nms_indices:
                            x1, y1, x2, y2 = boxes.xyxy[idx].tolist()
                            confidence = boxes.conf[idx].item()
                            class_id = int(boxes.cls[idx].item())
                            label = model.names[class_id]
                            detected_objects.append(label)

                            # Draw bounding box and label on the frame
                            color = (0, 255, 0)
                            cv2.rectangle(frame,
                                          (int(x1), int(y1)),
                                          (int(x2), int(y2)),
                                          color, 2)
                            cv2.putText(frame, label,
                                        (int(x1), int(y1) - 10),
                                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

                out.write(frame)  # Write the frame to the video file

                # Write detected objects to file
                file.write(
                    f"Frame {int(cap.get(cv2.CAP_PROP_POS_FRAMES))}: {', '.join(detected_objects)}\n")
                file.flush()  # Flush the file buffer to ensure data is written to disk


                frame_count += 1
                pbar.update(1)  # Update the progress bar

                # Debugging output every 100 frames
                if frame_count % 100 == 0:
                    logging.info(f"Processed {frame_count}/{total_frames} frames.")

                # Clear memory
                del results, detected_objects
                gc.collect()

            except Exception as e:
                logging.error(f"Error processing frame {frame_count}: {e}")
                continue

    cap.release()
    out.release()
    cv2.destroyAllWindows()

    print("Output video file: ", output_video_path)
    print("Output object list file: ", output_text_file)

    logging.info("Processing complete. Output video and detected objects file have been created.")
except Exception as e:
    logging.error(f"An error occurred: {e}")

Processing frames:   0%|          | 0/240 [00:00<?, ?it/s]


0: 640x384 4 persons, 1 bottle, 1 cup, 1 dining table, 1 book, 258.4ms
Speed: 19.2ms preprocess, 258.4ms inference, 28.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   0%|          | 1/240 [00:02<08:04,  2.03s/it]


0: 640x384 4 persons, 1 backpack, 1 bottle, 1 dining table, 1 book, 105.2ms
Speed: 7.5ms preprocess, 105.2ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   1%|          | 2/240 [00:02<03:59,  1.01s/it]


0: 640x384 4 persons, 1 backpack, 1 bottle, 1 dining table, 1 book, 108.3ms
Speed: 3.7ms preprocess, 108.3ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   1%|▏         | 3/240 [00:02<02:40,  1.47it/s]


0: 640x384 4 persons, 1 backpack, 1 bottle, 1 bowl, 1 dining table, 1 remote, 1 cell phone, 1 book, 105.1ms
Speed: 3.2ms preprocess, 105.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   2%|▏         | 4/240 [00:02<02:02,  1.92it/s]


0: 640x384 4 persons, 1 backpack, 1 bottle, 1 dining table, 1 book, 114.9ms
Speed: 3.1ms preprocess, 114.9ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   2%|▏         | 5/240 [00:03<01:43,  2.28it/s]


0: 640x384 3 persons, 1 backpack, 1 bottle, 1 dining table, 2 books, 104.0ms
Speed: 3.3ms preprocess, 104.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   2%|▎         | 6/240 [00:03<01:29,  2.60it/s]


0: 640x384 3 persons, 1 backpack, 1 bottle, 1 dining table, 1 cell phone, 1 book, 102.0ms
Speed: 5.5ms preprocess, 102.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   3%|▎         | 7/240 [00:03<01:21,  2.85it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 bottle, 1 dining table, 1 book, 95.3ms
Speed: 3.3ms preprocess, 95.3ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   3%|▎         | 8/240 [00:04<01:15,  3.08it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 bottle, 1 dining table, 1 cell phone, 2 books, 90.3ms
Speed: 4.0ms preprocess, 90.3ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   4%|▍         | 9/240 [00:04<01:11,  3.24it/s]


0: 640x384 4 persons, 1 backpack, 1 handbag, 1 bottle, 1 cup, 1 bowl, 1 dining table, 1 cell phone, 1 book, 95.7ms
Speed: 4.8ms preprocess, 95.7ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   4%|▍         | 10/240 [00:04<01:09,  3.32it/s]


0: 640x384 4 persons, 1 backpack, 1 handbag, 1 bottle, 1 dining table, 1 remote, 1 cell phone, 1 book, 88.9ms
Speed: 5.4ms preprocess, 88.9ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   5%|▍         | 11/240 [00:04<01:06,  3.42it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 bottle, 1 dining table, 1 cell phone, 1 book, 100.2ms
Speed: 2.9ms preprocess, 100.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   5%|▌         | 12/240 [00:05<01:05,  3.49it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 bottle, 1 dining table, 1 cell phone, 2 books, 102.4ms
Speed: 3.2ms preprocess, 102.4ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   5%|▌         | 13/240 [00:05<01:05,  3.48it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 cup, 1 dining table, 1 cell phone, 1 book, 106.7ms
Speed: 3.5ms preprocess, 106.7ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   6%|▌         | 14/240 [00:05<01:04,  3.48it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 1 bottle, 1 cup, 1 dining table, 2 cell phones, 1 book, 91.0ms
Speed: 3.5ms preprocess, 91.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   6%|▋         | 15/240 [00:05<01:03,  3.54it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 2 bottles, 1 cup, 1 dining table, 1 cell phone, 2 books, 118.2ms
Speed: 3.0ms preprocess, 118.2ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   7%|▋         | 16/240 [00:06<01:04,  3.50it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 1 bottle, 1 cup, 1 dining table, 2 books, 110.8ms
Speed: 3.9ms preprocess, 110.8ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   7%|▋         | 17/240 [00:06<01:04,  3.48it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 1 bottle, 1 cup, 1 dining table, 2 books, 93.0ms
Speed: 3.3ms preprocess, 93.0ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   8%|▊         | 18/240 [00:06<01:02,  3.55it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 cup, 1 dining table, 1 cell phone, 1 book, 100.4ms
Speed: 5.1ms preprocess, 100.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   8%|▊         | 19/240 [00:07<01:02,  3.55it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 1 bottle, 1 cup, 1 remote, 1 book, 96.7ms
Speed: 3.3ms preprocess, 96.7ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   8%|▊         | 20/240 [00:07<01:02,  3.54it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 1 cup, 1 dining table, 1 remote, 1 book, 111.8ms
Speed: 3.4ms preprocess, 111.8ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   9%|▉         | 21/240 [00:07<01:03,  3.45it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 1 cup, 1 dining table, 1 remote, 1 book, 88.6ms
Speed: 4.4ms preprocess, 88.6ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:   9%|▉         | 22/240 [00:07<01:01,  3.54it/s]


0: 640x384 2 persons, 1 handbag, 1 cup, 1 remote, 100.4ms
Speed: 3.4ms preprocess, 100.4ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  10%|▉         | 23/240 [00:08<01:01,  3.53it/s]


0: 640x384 2 persons, 1 handbag, 1 bottle, 1 cup, 1 remote, 1 book, 140.0ms
Speed: 3.9ms preprocess, 140.0ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  10%|█         | 24/240 [00:08<01:03,  3.41it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 1 bottle, 1 cup, 99.3ms
Speed: 2.9ms preprocess, 99.3ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  10%|█         | 25/240 [00:08<01:01,  3.48it/s]


0: 640x384 2 persons, 1 handbag, 1 bottle, 1 cup, 1 book, 85.0ms
Speed: 3.9ms preprocess, 85.0ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  11%|█         | 26/240 [00:09<00:59,  3.58it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 1 cup, 102.6ms
Speed: 2.9ms preprocess, 102.6ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  11%|█▏        | 27/240 [00:09<01:00,  3.53it/s]


0: 640x384 2 persons, 1 handbag, 1 cup, 1 book, 101.5ms
Speed: 3.5ms preprocess, 101.5ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  12%|█▏        | 28/240 [00:09<01:00,  3.52it/s]


0: 640x384 2 persons, 1 handbag, 1 bottle, 1 cup, 1 remote, 1 cell phone, 1 book, 95.1ms
Speed: 3.3ms preprocess, 95.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  12%|█▏        | 29/240 [00:09<00:59,  3.55it/s]


0: 640x384 2 persons, 1 handbag, 1 bottle, 1 cup, 1 book, 85.2ms
Speed: 3.2ms preprocess, 85.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  12%|█▎        | 30/240 [00:10<00:57,  3.65it/s]


0: 640x384 3 persons, 1 handbag, 1 bottle, 1 cup, 1 dining table, 101.9ms
Speed: 5.6ms preprocess, 101.9ms inference, 1.8ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  13%|█▎        | 31/240 [00:10<00:58,  3.56it/s]


0: 640x384 2 persons, 1 backpack, 1 handbag, 1 cup, 1 dining table, 98.8ms
Speed: 5.0ms preprocess, 98.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  13%|█▎        | 32/240 [00:10<00:58,  3.58it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 bottle, 1 dining table, 1 remote, 1 book, 96.4ms
Speed: 3.3ms preprocess, 96.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  14%|█▍        | 33/240 [00:11<00:57,  3.60it/s]


0: 640x384 2 persons, 1 handbag, 1 cup, 1 dining table, 1 remote, 1 book, 94.1ms
Speed: 3.9ms preprocess, 94.1ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  14%|█▍        | 34/240 [00:11<00:56,  3.62it/s]


0: 640x384 3 persons, 1 handbag, 1 bottle, 1 dining table, 1 remote, 1 book, 147.8ms
Speed: 3.2ms preprocess, 147.8ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  15%|█▍        | 35/240 [00:11<01:03,  3.20it/s]


0: 640x384 3 persons, 1 handbag, 1 bottle, 1 dining table, 1 book, 139.8ms
Speed: 3.0ms preprocess, 139.8ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  15%|█▌        | 36/240 [00:12<01:07,  3.01it/s]


0: 640x384 3 persons, 1 handbag, 1 bottle, 1 dining table, 1 book, 147.8ms
Speed: 6.4ms preprocess, 147.8ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  15%|█▌        | 37/240 [00:12<01:10,  2.86it/s]


0: 640x384 4 persons, 1 backpack, 1 handbag, 1 bottle, 1 cup, 1 dining table, 1 book, 126.3ms
Speed: 6.8ms preprocess, 126.3ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  16%|█▌        | 38/240 [00:12<01:11,  2.81it/s]


0: 640x384 5 persons, 1 backpack, 1 handbag, 1 cup, 1 dining table, 1 book, 131.9ms
Speed: 3.3ms preprocess, 131.9ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  16%|█▋        | 39/240 [00:13<01:13,  2.73it/s]


0: 640x384 4 persons, 1 backpack, 1 handbag, 1 cup, 1 dining table, 1 book, 159.0ms
Speed: 3.1ms preprocess, 159.0ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  17%|█▋        | 40/240 [00:13<01:16,  2.61it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 bottle, 1 cup, 1 dining table, 1 book, 146.0ms
Speed: 3.1ms preprocess, 146.0ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  17%|█▋        | 41/240 [00:14<01:16,  2.59it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 bottle, 1 cup, 1 dining table, 1 book, 99.3ms
Speed: 4.0ms preprocess, 99.3ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  18%|█▊        | 42/240 [00:14<01:10,  2.82it/s]


0: 640x384 3 persons, 1 backpack, 1 bottle, 1 dining table, 1 book, 101.7ms
Speed: 2.9ms preprocess, 101.7ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  18%|█▊        | 43/240 [00:14<01:06,  2.97it/s]


0: 640x384 3 persons, 1 backpack, 1 bottle, 1 dining table, 1 tv, 1 book, 94.2ms
Speed: 2.9ms preprocess, 94.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  18%|█▊        | 44/240 [00:14<01:02,  3.15it/s]


0: 640x384 5 persons, 1 backpack, 1 bottle, 1 dining table, 1 tv, 1 book, 94.8ms
Speed: 3.3ms preprocess, 94.8ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  19%|█▉        | 45/240 [00:15<00:59,  3.28it/s]


0: 640x384 4 persons, 1 backpack, 1 bottle, 1 dining table, 1 tv, 1 book, 112.4ms
Speed: 3.6ms preprocess, 112.4ms inference, 2.8ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  19%|█▉        | 46/240 [00:15<00:59,  3.28it/s]


0: 640x384 4 persons, 1 backpack, 1 cup, 1 dining table, 1 book, 111.5ms
Speed: 2.8ms preprocess, 111.5ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  20%|█▉        | 47/240 [00:15<00:58,  3.32it/s]


0: 640x384 4 persons, 1 backpack, 1 bottle, 1 dining table, 1 cell phone, 1 book, 110.7ms
Speed: 3.6ms preprocess, 110.7ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  20%|██        | 48/240 [00:16<00:57,  3.34it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 1 book, 96.4ms
Speed: 3.5ms preprocess, 96.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  20%|██        | 49/240 [00:16<00:56,  3.40it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 92.0ms
Speed: 3.4ms preprocess, 92.0ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  21%|██        | 50/240 [00:16<00:55,  3.42it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 103.1ms
Speed: 2.9ms preprocess, 103.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  21%|██▏       | 51/240 [00:16<00:54,  3.46it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 1 book, 92.5ms
Speed: 4.0ms preprocess, 92.5ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  22%|██▏       | 52/240 [00:17<00:53,  3.53it/s]


0: 640x384 4 persons, 1 cup, 1 dining table, 1 book, 115.2ms
Speed: 2.9ms preprocess, 115.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  22%|██▏       | 53/240 [00:17<00:53,  3.50it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 98.6ms
Speed: 3.5ms preprocess, 98.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  22%|██▎       | 54/240 [00:17<00:53,  3.50it/s]


0: 640x384 5 persons, 1 bottle, 1 chair, 1 dining table, 1 book, 99.6ms
Speed: 3.3ms preprocess, 99.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  23%|██▎       | 55/240 [00:18<00:52,  3.50it/s]


0: 640x384 5 persons, 1 bottle, 1 chair, 1 dining table, 1 book, 119.0ms
Speed: 6.0ms preprocess, 119.0ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  23%|██▎       | 56/240 [00:18<00:52,  3.48it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 1 book, 122.2ms
Speed: 4.0ms preprocess, 122.2ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  24%|██▍       | 57/240 [00:18<00:54,  3.35it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 96.5ms
Speed: 3.5ms preprocess, 96.5ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  24%|██▍       | 58/240 [00:18<00:52,  3.44it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 104.2ms
Speed: 4.4ms preprocess, 104.2ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  25%|██▍       | 59/240 [00:19<00:52,  3.43it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 110.4ms
Speed: 3.9ms preprocess, 110.4ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  25%|██▌       | 60/240 [00:19<00:52,  3.45it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 99.0ms
Speed: 3.8ms preprocess, 99.0ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  25%|██▌       | 61/240 [00:19<00:51,  3.47it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 87.7ms
Speed: 3.2ms preprocess, 87.7ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  26%|██▌       | 62/240 [00:20<00:49,  3.58it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 96.5ms
Speed: 2.9ms preprocess, 96.5ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  26%|██▋       | 63/240 [00:20<00:48,  3.62it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 keyboard, 112.9ms
Speed: 3.6ms preprocess, 112.9ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  27%|██▋       | 64/240 [00:20<00:50,  3.51it/s]


0: 640x384 5 persons, 1 bottle, 1 cup, 1 dining table, 101.1ms
Speed: 3.0ms preprocess, 101.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  27%|██▋       | 65/240 [00:20<00:49,  3.53it/s]


0: 640x384 4 persons, 1 bottle, 1 cup, 1 dining table, 1 keyboard, 86.8ms
Speed: 4.1ms preprocess, 86.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  28%|██▊       | 66/240 [00:21<00:48,  3.61it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 111.7ms
Speed: 3.3ms preprocess, 111.7ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  28%|██▊       | 67/240 [00:21<00:48,  3.56it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 122.9ms
Speed: 9.4ms preprocess, 122.9ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  28%|██▊       | 68/240 [00:21<00:49,  3.46it/s]


0: 640x384 4 persons, 1 bottle, 2 chairs, 1 dining table, 95.5ms
Speed: 4.1ms preprocess, 95.5ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  29%|██▉       | 69/240 [00:22<00:48,  3.52it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 92.7ms
Speed: 3.9ms preprocess, 92.7ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  29%|██▉       | 70/240 [00:22<00:47,  3.59it/s]


0: 640x384 4 persons, 1 bottle, 1 cup, 1 knife, 1 chair, 1 dining table, 1 cell phone, 101.8ms
Speed: 3.2ms preprocess, 101.8ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  30%|██▉       | 71/240 [00:22<00:48,  3.52it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 99.7ms
Speed: 3.7ms preprocess, 99.7ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  30%|███       | 72/240 [00:22<00:47,  3.51it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 87.6ms
Speed: 4.0ms preprocess, 87.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  30%|███       | 73/240 [00:23<00:46,  3.59it/s]


0: 640x384 4 persons, 1 handbag, 1 bottle, 2 chairs, 1 dining table, 1 tv, 113.9ms
Speed: 3.6ms preprocess, 113.9ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  31%|███       | 74/240 [00:23<00:47,  3.53it/s]


0: 640x384 5 persons, 1 bottle, 1 chair, 1 tv, 112.4ms
Speed: 4.1ms preprocess, 112.4ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  31%|███▏      | 75/240 [00:23<00:47,  3.47it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 1 tv, 95.5ms
Speed: 3.4ms preprocess, 95.5ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  32%|███▏      | 76/240 [00:24<00:46,  3.54it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 154.0ms
Speed: 3.4ms preprocess, 154.0ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  32%|███▏      | 77/240 [00:24<00:50,  3.22it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 129.3ms
Speed: 3.2ms preprocess, 129.3ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  32%|███▎      | 78/240 [00:24<00:54,  3.00it/s]


0: 640x384 5 persons, 1 bottle, 1 cup, 1 chair, 1 remote, 141.1ms
Speed: 3.4ms preprocess, 141.1ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  33%|███▎      | 79/240 [00:25<00:55,  2.89it/s]


0: 640x384 5 persons, 1 handbag, 1 cup, 2 chairs, 1 dining table, 153.4ms
Speed: 3.6ms preprocess, 153.4ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  33%|███▎      | 80/240 [00:25<00:56,  2.83it/s]


0: 640x384 5 persons, 1 handbag, 1 cup, 2 chairs, 1 book, 152.4ms
Speed: 3.3ms preprocess, 152.4ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  34%|███▍      | 81/240 [00:25<00:57,  2.75it/s]


0: 640x384 5 persons, 1 handbag, 1 bottle, 1 chair, 1 dining table, 1 book, 145.1ms
Speed: 3.3ms preprocess, 145.1ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  34%|███▍      | 82/240 [00:26<00:58,  2.70it/s]


0: 640x384 5 persons, 1 backpack, 1 chair, 148.7ms
Speed: 5.3ms preprocess, 148.7ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  35%|███▍      | 83/240 [00:26<01:00,  2.60it/s]


0: 640x384 5 persons, 1 backpack, 1 handbag, 1 chair, 1 book, 98.9ms
Speed: 4.8ms preprocess, 98.9ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  35%|███▌      | 84/240 [00:27<00:56,  2.77it/s]


0: 640x384 5 persons, 1 backpack, 1 chair, 98.8ms
Speed: 3.6ms preprocess, 98.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  35%|███▌      | 85/240 [00:27<00:52,  2.98it/s]


0: 640x384 4 persons, 1 backpack, 1 chair, 1 tv, 1 book, 102.0ms
Speed: 3.2ms preprocess, 102.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  36%|███▌      | 86/240 [00:27<00:49,  3.11it/s]


0: 640x384 5 persons, 1 backpack, 1 handbag, 1 chair, 1 tv, 110.8ms
Speed: 3.6ms preprocess, 110.8ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  36%|███▋      | 87/240 [00:27<00:48,  3.15it/s]


0: 640x384 5 persons, 1 handbag, 1 chair, 114.2ms
Speed: 5.5ms preprocess, 114.2ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  37%|███▋      | 88/240 [00:28<00:46,  3.23it/s]


0: 640x384 6 persons, 1 handbag, 1 chair, 1 book, 115.0ms
Speed: 4.6ms preprocess, 115.0ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  37%|███▋      | 89/240 [00:28<00:46,  3.27it/s]


0: 640x384 6 persons, 1 handbag, 1 suitcase, 1 chair, 1 book, 95.1ms
Speed: 2.8ms preprocess, 95.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  38%|███▊      | 90/240 [00:28<00:44,  3.37it/s]


0: 640x384 6 persons, 1 handbag, 1 chair, 103.4ms
Speed: 3.2ms preprocess, 103.4ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  38%|███▊      | 91/240 [00:29<00:44,  3.38it/s]


0: 640x384 6 persons, 1 backpack, 1 suitcase, 1 chair, 104.8ms
Speed: 2.8ms preprocess, 104.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  38%|███▊      | 92/240 [00:29<00:42,  3.47it/s]


0: 640x384 7 persons, 1 suitcase, 1 chair, 1 tv, 1 laptop, 104.3ms
Speed: 3.4ms preprocess, 104.3ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  39%|███▉      | 93/240 [00:29<00:43,  3.40it/s]


0: 640x384 7 persons, 1 suitcase, 1 chair, 1 tv, 93.4ms
Speed: 7.3ms preprocess, 93.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  39%|███▉      | 94/240 [00:29<00:42,  3.47it/s]


0: 640x384 6 persons, 1 suitcase, 1 chair, 1 tv, 1 book, 94.8ms
Speed: 4.1ms preprocess, 94.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  40%|███▉      | 95/240 [00:30<00:41,  3.46it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 1 book, 117.2ms
Speed: 2.8ms preprocess, 117.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  40%|████      | 96/240 [00:30<00:41,  3.45it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 2 tvs, 2 laptops, 99.9ms
Speed: 3.7ms preprocess, 99.9ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  40%|████      | 97/240 [00:30<00:41,  3.47it/s]


0: 640x384 4 persons, 1 suitcase, 3 chairs, 104.5ms
Speed: 3.5ms preprocess, 104.5ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  41%|████      | 98/240 [00:31<00:40,  3.52it/s]


0: 640x384 4 persons, 1 suitcase, 2 chairs, 1 tv, 1 laptop, 98.8ms
Speed: 3.1ms preprocess, 98.8ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  41%|████▏     | 99/240 [00:31<00:39,  3.55it/s]


0: 640x384 5 persons, 1 suitcase, 2 chairs, 2 laptops, 107.1ms
Speed: 3.2ms preprocess, 107.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  42%|████▏     | 100/240 [00:31<00:40,  3.49it/s]


0: 640x384 5 persons, 2 handbags, 1 suitcase, 2 chairs, 1 tv, 1 laptop, 112.0ms
Speed: 3.2ms preprocess, 112.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  42%|████▏     | 101/240 [00:31<00:40,  3.46it/s]


0: 640x384 7 persons, 1 handbag, 1 suitcase, 2 chairs, 2 tvs, 106.3ms
Speed: 3.6ms preprocess, 106.3ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  42%|████▎     | 102/240 [00:32<00:39,  3.46it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 103.7ms
Speed: 4.8ms preprocess, 103.7ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  43%|████▎     | 103/240 [00:32<00:39,  3.47it/s]


0: 640x384 6 persons, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 104.6ms
Speed: 3.3ms preprocess, 104.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  43%|████▎     | 104/240 [00:32<00:39,  3.49it/s]


0: 640x384 6 persons, 1 suitcase, 2 chairs, 2 tvs, 2 laptops, 96.1ms
Speed: 3.5ms preprocess, 96.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  44%|████▍     | 105/240 [00:33<00:38,  3.49it/s]


0: 640x384 5 persons, 1 suitcase, 3 chairs, 2 tvs, 2 laptops, 93.2ms
Speed: 4.6ms preprocess, 93.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  44%|████▍     | 106/240 [00:33<00:37,  3.55it/s]


0: 640x384 5 persons, 1 handbag, 1 suitcase, 2 chairs, 1 tv, 2 laptops, 105.1ms
Speed: 3.6ms preprocess, 105.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  45%|████▍     | 107/240 [00:33<00:38,  3.47it/s]


0: 640x384 6 persons, 1 suitcase, 1 chair, 1 tv, 1 laptop, 100.6ms
Speed: 3.6ms preprocess, 100.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  45%|████▌     | 108/240 [00:33<00:37,  3.52it/s]


0: 640x384 6 persons, 1 suitcase, 1 chair, 1 tv, 1 laptop, 122.7ms
Speed: 3.7ms preprocess, 122.7ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  45%|████▌     | 109/240 [00:34<00:37,  3.48it/s]


0: 640x384 5 persons, 1 suitcase, 2 chairs, 2 tvs, 2 laptops, 103.2ms
Speed: 3.9ms preprocess, 103.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  46%|████▌     | 110/240 [00:34<00:37,  3.50it/s]


0: 640x384 6 persons, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 93.4ms
Speed: 3.6ms preprocess, 93.4ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  46%|████▋     | 111/240 [00:34<00:36,  3.50it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 2 tvs, 2 laptops, 93.4ms
Speed: 3.3ms preprocess, 93.4ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  47%|████▋     | 112/240 [00:35<00:36,  3.55it/s]


0: 640x384 3 persons, 1 suitcase, 2 chairs, 2 tvs, 2 laptops, 103.6ms
Speed: 3.9ms preprocess, 103.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  47%|████▋     | 113/240 [00:35<00:35,  3.54it/s]


0: 640x384 3 persons, 1 suitcase, 1 tv, 1 laptop, 99.7ms
Speed: 6.0ms preprocess, 99.7ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  48%|████▊     | 114/240 [00:35<00:35,  3.52it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 1 tv, 1 laptop, 87.2ms
Speed: 3.2ms preprocess, 87.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  48%|████▊     | 115/240 [00:35<00:35,  3.55it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 1 laptop, 101.0ms
Speed: 3.5ms preprocess, 101.0ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  48%|████▊     | 116/240 [00:36<00:34,  3.58it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 2 laptops, 109.2ms
Speed: 3.5ms preprocess, 109.2ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  49%|████▉     | 117/240 [00:36<00:34,  3.55it/s]


0: 640x384 3 persons, 1 suitcase, 1 cup, 1 chair, 1 tv, 1 laptop, 91.3ms
Speed: 3.2ms preprocess, 91.3ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  49%|████▉     | 118/240 [00:36<00:34,  3.58it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 131.4ms
Speed: 7.6ms preprocess, 131.4ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  50%|████▉     | 119/240 [00:37<00:36,  3.36it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 158.9ms
Speed: 5.7ms preprocess, 158.9ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  50%|█████     | 120/240 [00:37<00:39,  3.02it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 128.4ms
Speed: 11.6ms preprocess, 128.4ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  50%|█████     | 121/240 [00:37<00:40,  2.95it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 2 tvs, 127.2ms
Speed: 3.2ms preprocess, 127.2ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  51%|█████     | 122/240 [00:38<00:39,  2.95it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 1 tv, 1 laptop, 133.2ms
Speed: 3.4ms preprocess, 133.2ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  51%|█████▏    | 123/240 [00:38<00:40,  2.85it/s]


0: 640x384 3 persons, 1 suitcase, 1 couch, 1 tv, 2 laptops, 153.1ms
Speed: 3.4ms preprocess, 153.1ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  52%|█████▏    | 124/240 [00:38<00:42,  2.76it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 1 tv, 1 laptop, 143.4ms
Speed: 3.5ms preprocess, 143.4ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  52%|█████▏    | 125/240 [00:39<00:42,  2.70it/s]


0: 640x384 3 persons, 1 suitcase, 2 chairs, 2 tvs, 1 laptop, 130.3ms
Speed: 5.7ms preprocess, 130.3ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  52%|█████▎    | 126/240 [00:39<00:42,  2.68it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 2 laptops, 87.6ms
Speed: 3.0ms preprocess, 87.6ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  53%|█████▎    | 127/240 [00:40<00:39,  2.86it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 1 tv, 2 laptops, 101.5ms
Speed: 3.5ms preprocess, 101.5ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  53%|█████▎    | 128/240 [00:40<00:36,  3.05it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 1 tv, 1 laptop, 100.4ms
Speed: 2.9ms preprocess, 100.4ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  54%|█████▍    | 129/240 [00:40<00:35,  3.17it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 98.3ms
Speed: 5.6ms preprocess, 98.3ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  54%|█████▍    | 130/240 [00:40<00:33,  3.28it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 1 tv, 2 laptops, 93.0ms
Speed: 3.2ms preprocess, 93.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  55%|█████▍    | 131/240 [00:41<00:31,  3.42it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 1 tv, 1 laptop, 1 mouse, 116.3ms
Speed: 4.3ms preprocess, 116.3ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  55%|█████▌    | 132/240 [00:41<00:31,  3.38it/s]


0: 640x384 5 persons, 1 handbag, 1 suitcase, 1 chair, 2 tvs, 2 laptops, 1 mouse, 1 book, 94.1ms
Speed: 3.8ms preprocess, 94.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  55%|█████▌    | 133/240 [00:41<00:31,  3.44it/s]


0: 640x384 5 persons, 1 handbag, 1 suitcase, 1 chair, 2 tvs, 2 laptops, 1 mouse, 1 book, 87.6ms
Speed: 3.2ms preprocess, 87.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  56%|█████▌    | 134/240 [00:41<00:30,  3.52it/s]


0: 640x384 3 persons, 1 suitcase, 1 cup, 3 tvs, 3 laptops, 1 mouse, 96.8ms
Speed: 2.8ms preprocess, 96.8ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  56%|█████▋    | 135/240 [00:42<00:29,  3.58it/s]


0: 640x384 3 persons, 1 suitcase, 3 tvs, 3 laptops, 1 mouse, 127.3ms
Speed: 3.5ms preprocess, 127.3ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  57%|█████▋    | 136/240 [00:42<00:30,  3.45it/s]


0: 640x384 3 persons, 1 suitcase, 3 tvs, 2 laptops, 1 mouse, 109.0ms
Speed: 3.1ms preprocess, 109.0ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  57%|█████▋    | 137/240 [00:42<00:29,  3.45it/s]


0: 640x384 3 persons, 1 suitcase, 3 tvs, 1 laptop, 1 mouse, 91.5ms
Speed: 3.4ms preprocess, 91.5ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  57%|█████▊    | 138/240 [00:43<00:28,  3.52it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 1 chair, 4 tvs, 2 laptops, 1 mouse, 113.4ms
Speed: 6.2ms preprocess, 113.4ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  58%|█████▊    | 139/240 [00:43<00:29,  3.45it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 3 tvs, 2 laptops, 1 mouse, 95.0ms
Speed: 3.3ms preprocess, 95.0ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  58%|█████▊    | 140/240 [00:43<00:28,  3.48it/s]


0: 640x384 3 persons, 1 suitcase, 1 cup, 3 tvs, 2 laptops, 1 mouse, 95.3ms
Speed: 4.1ms preprocess, 95.3ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  59%|█████▉    | 141/240 [00:43<00:28,  3.51it/s]


0: 640x384 3 persons, 1 suitcase, 1 cup, 3 tvs, 1 laptop, 1 mouse, 93.8ms
Speed: 3.0ms preprocess, 93.8ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  59%|█████▉    | 142/240 [00:44<00:27,  3.58it/s]


0: 640x384 3 persons, 1 suitcase, 3 tvs, 2 laptops, 116.3ms
Speed: 3.2ms preprocess, 116.3ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  60%|█████▉    | 143/240 [00:44<00:27,  3.47it/s]


0: 640x384 2 persons, 1 suitcase, 2 tvs, 2 laptops, 97.0ms
Speed: 3.4ms preprocess, 97.0ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  60%|██████    | 144/240 [00:44<00:27,  3.53it/s]


0: 640x384 2 persons, 1 suitcase, 2 tvs, 1 laptop, 95.6ms
Speed: 5.7ms preprocess, 95.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  60%|██████    | 145/240 [00:45<00:26,  3.58it/s]


0: 640x384 2 persons, 1 handbag, 1 suitcase, 2 tvs, 1 laptop, 1 mouse, 95.2ms
Speed: 4.0ms preprocess, 95.2ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  61%|██████    | 146/240 [00:45<00:26,  3.60it/s]


0: 640x384 3 persons, 1 suitcase, 1 tv, 1 laptop, 1 mouse, 111.1ms
Speed: 6.0ms preprocess, 111.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  61%|██████▏   | 147/240 [00:45<00:26,  3.52it/s]


0: 640x384 3 persons, 1 suitcase, 1 cup, 3 tvs, 1 laptop, 1 mouse, 118.6ms
Speed: 3.8ms preprocess, 118.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  62%|██████▏   | 148/240 [00:45<00:26,  3.49it/s]


0: 640x384 3 persons, 1 suitcase, 1 cup, 3 tvs, 2 laptops, 104.9ms
Speed: 3.1ms preprocess, 104.9ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  62%|██████▏   | 149/240 [00:46<00:25,  3.51it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 111.1ms
Speed: 3.3ms preprocess, 111.1ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  62%|██████▎   | 150/240 [00:46<00:25,  3.47it/s]


0: 640x384 3 persons, 1 backpack, 1 handbag, 1 suitcase, 3 tvs, 89.4ms
Speed: 3.4ms preprocess, 89.4ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  63%|██████▎   | 151/240 [00:46<00:25,  3.51it/s]


0: 640x384 3 persons, 1 suitcase, 1 chair, 3 tvs, 1 laptop, 105.7ms
Speed: 3.2ms preprocess, 105.7ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  63%|██████▎   | 152/240 [00:47<00:24,  3.54it/s]


0: 640x384 2 persons, 1 suitcase, 3 chairs, 3 tvs, 1 laptop, 126.5ms
Speed: 4.6ms preprocess, 126.5ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  64%|██████▍   | 153/240 [00:47<00:25,  3.41it/s]


0: 640x384 3 persons, 1 backpack, 1 suitcase, 1 chair, 3 tvs, 1 laptop, 112.1ms
Speed: 5.3ms preprocess, 112.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  64%|██████▍   | 154/240 [00:47<00:25,  3.38it/s]


0: 640x384 2 persons, 1 handbag, 1 suitcase, 1 chair, 3 tvs, 1 laptop, 104.8ms
Speed: 4.2ms preprocess, 104.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  65%|██████▍   | 155/240 [00:48<00:24,  3.41it/s]


0: 640x384 2 persons, 1 suitcase, 1 chair, 3 tvs, 2 laptops, 2 mouses, 117.4ms
Speed: 3.7ms preprocess, 117.4ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  65%|██████▌   | 156/240 [00:48<00:24,  3.37it/s]


0: 640x384 2 persons, 1 suitcase, 3 tvs, 2 laptops, 2 mouses, 97.2ms
Speed: 4.4ms preprocess, 97.2ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  65%|██████▌   | 157/240 [00:48<00:24,  3.40it/s]


0: 640x384 2 persons, 1 suitcase, 2 chairs, 2 tvs, 1 laptop, 1 mouse, 89.0ms
Speed: 3.4ms preprocess, 89.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  66%|██████▌   | 158/240 [00:48<00:23,  3.48it/s]


0: 640x384 2 persons, 1 handbag, 1 suitcase, 2 chairs, 1 tv, 1 laptop, 1 mouse, 89.1ms
Speed: 3.3ms preprocess, 89.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  66%|██████▋   | 159/240 [00:49<00:23,  3.52it/s]


0: 640x384 2 persons, 1 handbag, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 1 mouse, 112.6ms
Speed: 2.9ms preprocess, 112.6ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  67%|██████▋   | 160/240 [00:49<00:22,  3.53it/s]


0: 640x384 2 persons, 1 handbag, 1 suitcase, 1 chair, 3 tvs, 1 laptop, 1 mouse, 115.7ms
Speed: 4.2ms preprocess, 115.7ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  67%|██████▋   | 161/240 [00:49<00:22,  3.47it/s]


0: 640x384 2 persons, 1 handbag, 1 suitcase, 3 tvs, 1 laptop, 148.9ms
Speed: 3.3ms preprocess, 148.9ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  68%|██████▊   | 162/240 [00:50<00:23,  3.30it/s]


0: 640x384 2 persons, 1 handbag, 1 suitcase, 1 chair, 3 tvs, 1 laptop, 162.2ms
Speed: 4.7ms preprocess, 162.2ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  68%|██████▊   | 163/240 [00:50<00:25,  2.99it/s]


0: 640x384 2 persons, 1 handbag, 1 suitcase, 3 tvs, 1 laptop, 163.9ms
Speed: 9.0ms preprocess, 163.9ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  68%|██████▊   | 164/240 [00:50<00:26,  2.83it/s]


0: 640x384 2 persons, 1 handbag, 1 suitcase, 1 chair, 3 tvs, 1 laptop, 132.3ms
Speed: 6.4ms preprocess, 132.3ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  69%|██████▉   | 165/240 [00:51<00:26,  2.81it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 3 tvs, 1 laptop, 141.1ms
Speed: 3.8ms preprocess, 141.1ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  69%|██████▉   | 166/240 [00:51<00:27,  2.73it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 1 chair, 3 tvs, 1 laptop, 136.1ms
Speed: 3.4ms preprocess, 136.1ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  70%|██████▉   | 167/240 [00:51<00:26,  2.74it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 2 chairs, 3 tvs, 1 laptop, 153.5ms
Speed: 4.2ms preprocess, 153.5ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  70%|███████   | 168/240 [00:52<00:26,  2.72it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 1 chair, 3 tvs, 1 laptop, 144.0ms
Speed: 3.4ms preprocess, 144.0ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  70%|███████   | 169/240 [00:52<00:26,  2.71it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 2 chairs, 2 tvs, 98.4ms
Speed: 4.5ms preprocess, 98.4ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  71%|███████   | 170/240 [00:53<00:24,  2.85it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 2 tvs, 2 laptops, 95.6ms
Speed: 3.3ms preprocess, 95.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  71%|███████▏  | 171/240 [00:53<00:22,  3.09it/s]


0: 640x384 4 persons, 1 handbag, 1 suitcase, 2 tvs, 1 laptop, 104.4ms
Speed: 3.3ms preprocess, 104.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  72%|███████▏  | 172/240 [00:53<00:21,  3.19it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 2 tvs, 2 laptops, 93.5ms
Speed: 4.2ms preprocess, 93.5ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  72%|███████▏  | 173/240 [00:53<00:20,  3.31it/s]


0: 640x384 4 persons, 1 handbag, 1 suitcase, 1 chair, 2 tvs, 1 laptop, 1 mouse, 87.9ms
Speed: 5.9ms preprocess, 87.9ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  72%|███████▎  | 174/240 [00:54<00:19,  3.43it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 2 chairs, 2 tvs, 2 laptops, 1 mouse, 95.6ms
Speed: 4.0ms preprocess, 95.6ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  73%|███████▎  | 175/240 [00:54<00:18,  3.47it/s]


0: 640x384 4 persons, 1 handbag, 1 suitcase, 2 chairs, 1 tv, 1 laptop, 1 mouse, 96.5ms
Speed: 3.6ms preprocess, 96.5ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  73%|███████▎  | 176/240 [00:54<00:18,  3.44it/s]


0: 640x384 4 persons, 1 backpack, 1 handbag, 1 suitcase, 3 chairs, 2 tvs, 2 laptops, 1 mouse, 97.6ms
Speed: 3.8ms preprocess, 97.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  74%|███████▍  | 177/240 [00:54<00:18,  3.49it/s]


0: 640x384 4 persons, 1 handbag, 1 suitcase, 1 chair, 1 tv, 1 laptop, 1 mouse, 94.0ms
Speed: 4.3ms preprocess, 94.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  74%|███████▍  | 178/240 [00:55<00:17,  3.56it/s]


0: 640x384 3 persons, 1 handbag, 1 suitcase, 1 cup, 2 laptops, 2 mouses, 98.6ms
Speed: 4.1ms preprocess, 98.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  75%|███████▍  | 179/240 [00:55<00:17,  3.43it/s]


0: 640x384 4 persons, 1 suitcase, 1 tv, 1 laptop, 116.5ms
Speed: 5.4ms preprocess, 116.5ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  75%|███████▌  | 180/240 [00:55<00:17,  3.43it/s]


0: 640x384 4 persons, 1 suitcase, 2 tvs, 2 laptops, 97.5ms
Speed: 5.4ms preprocess, 97.5ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  75%|███████▌  | 181/240 [00:56<00:17,  3.45it/s]


0: 640x384 5 persons, 1 suitcase, 2 chairs, 2 tvs, 114.3ms
Speed: 5.3ms preprocess, 114.3ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  76%|███████▌  | 182/240 [00:56<00:16,  3.45it/s]


0: 640x384 5 persons, 1 suitcase, 1 tv, 1 laptop, 89.8ms
Speed: 6.7ms preprocess, 89.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  76%|███████▋  | 183/240 [00:56<00:16,  3.49it/s]


0: 640x384 5 persons, 1 handbag, 1 suitcase, 1 tv, 1 laptop, 113.9ms
Speed: 4.1ms preprocess, 113.9ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  77%|███████▋  | 184/240 [00:57<00:16,  3.48it/s]


0: 640x384 5 persons, 1 handbag, 1 suitcase, 1 chair, 1 tv, 1 laptop, 94.2ms
Speed: 7.8ms preprocess, 94.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  77%|███████▋  | 185/240 [00:57<00:15,  3.50it/s]


0: 640x384 5 persons, 1 suitcase, 2 tvs, 1 laptop, 105.4ms
Speed: 3.4ms preprocess, 105.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  78%|███████▊  | 186/240 [00:57<00:15,  3.50it/s]


0: 640x384 5 persons, 1 suitcase, 2 tvs, 1 laptop, 100.7ms
Speed: 4.0ms preprocess, 100.7ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  78%|███████▊  | 187/240 [00:57<00:15,  3.51it/s]


0: 640x384 5 persons, 1 handbag, 1 suitcase, 1 chair, 1 tv, 1 laptop, 99.5ms
Speed: 3.5ms preprocess, 99.5ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  78%|███████▊  | 188/240 [00:58<00:14,  3.50it/s]


0: 640x384 4 persons, 1 handbag, 1 suitcase, 1 chair, 1 laptop, 113.1ms
Speed: 3.7ms preprocess, 113.1ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  79%|███████▉  | 189/240 [00:58<00:14,  3.44it/s]


0: 640x384 6 persons, 1 handbag, 1 suitcase, 87.7ms
Speed: 3.5ms preprocess, 87.7ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  79%|███████▉  | 190/240 [00:58<00:14,  3.48it/s]


0: 640x384 4 persons, 1 suitcase, 1 chair, 101.1ms
Speed: 3.1ms preprocess, 101.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  80%|███████▉  | 191/240 [00:59<00:14,  3.48it/s]


0: 640x384 4 persons, 1 backpack, 1 handbag, 1 suitcase, 1 tv, 1 laptop, 100.8ms
Speed: 3.2ms preprocess, 100.8ms inference, 1.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  80%|████████  | 192/240 [00:59<00:13,  3.49it/s]


0: 640x384 5 persons, 1 handbag, 1 chair, 1 laptop, 98.6ms
Speed: 3.6ms preprocess, 98.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  80%|████████  | 193/240 [00:59<00:13,  3.49it/s]


0: 640x384 7 persons, 1 handbag, 1 suitcase, 1 laptop, 94.7ms
Speed: 3.9ms preprocess, 94.7ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  81%|████████  | 194/240 [00:59<00:12,  3.57it/s]


0: 640x384 5 persons, 1 suitcase, 1 chair, 1 tv, 1 laptop, 101.1ms
Speed: 3.4ms preprocess, 101.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  81%|████████▏ | 195/240 [01:00<00:12,  3.52it/s]


0: 640x384 5 persons, 1 handbag, 1 suitcase, 111.9ms
Speed: 3.3ms preprocess, 111.9ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  82%|████████▏ | 196/240 [01:00<00:12,  3.49it/s]


0: 640x384 5 persons, 1 handbag, 1 suitcase, 1 laptop, 1 book, 92.2ms
Speed: 3.7ms preprocess, 92.2ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  82%|████████▏ | 197/240 [01:00<00:12,  3.46it/s]


0: 640x384 4 persons, 1 handbag, 1 suitcase, 108.1ms
Speed: 3.0ms preprocess, 108.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  82%|████████▎ | 198/240 [01:01<00:12,  3.50it/s]


0: 640x384 5 persons, 1 handbag, 98.2ms
Speed: 3.8ms preprocess, 98.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  83%|████████▎ | 199/240 [01:01<00:11,  3.48it/s]


0: 640x384 5 persons, 1 handbag, 95.7ms
Speed: 3.4ms preprocess, 95.7ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  83%|████████▎ | 200/240 [01:01<00:11,  3.49it/s]


0: 640x384 4 persons, 1 handbag, 97.8ms
Speed: 3.3ms preprocess, 97.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  84%|████████▍ | 201/240 [01:01<00:11,  3.52it/s]


0: 640x384 4 persons, 1 handbag, 104.2ms
Speed: 3.4ms preprocess, 104.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  84%|████████▍ | 202/240 [01:02<00:10,  3.54it/s]


0: 640x384 4 persons, 1 handbag, 97.4ms
Speed: 3.4ms preprocess, 97.4ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  85%|████████▍ | 203/240 [01:02<00:10,  3.58it/s]


0: 640x384 4 persons, 1 handbag, 97.8ms
Speed: 3.6ms preprocess, 97.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  85%|████████▌ | 204/240 [01:02<00:10,  3.55it/s]


0: 640x384 4 persons, 1 handbag, 140.6ms
Speed: 8.3ms preprocess, 140.6ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  85%|████████▌ | 205/240 [01:03<00:10,  3.28it/s]


0: 640x384 5 persons, 1 handbag, 147.4ms
Speed: 3.1ms preprocess, 147.4ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  86%|████████▌ | 206/240 [01:03<00:11,  3.06it/s]


0: 640x384 4 persons, 129.6ms
Speed: 3.3ms preprocess, 129.6ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  86%|████████▋ | 207/240 [01:03<00:11,  2.94it/s]


0: 640x384 4 persons, 1 cup, 1 dining table, 1 tv, 156.3ms
Speed: 11.4ms preprocess, 156.3ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  87%|████████▋ | 208/240 [01:04<00:11,  2.81it/s]


0: 640x384 4 persons, 1 cup, 1 tv, 148.0ms
Speed: 4.6ms preprocess, 148.0ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  87%|████████▋ | 209/240 [01:04<00:11,  2.74it/s]


0: 640x384 4 persons, 1 bottle, 1 cup, 144.7ms
Speed: 4.0ms preprocess, 144.7ms inference, 1.5ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  88%|████████▊ | 210/240 [01:04<00:11,  2.68it/s]


0: 640x384 4 persons, 1 bottle, 1 tv, 171.4ms
Speed: 3.5ms preprocess, 171.4ms inference, 1.6ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  88%|████████▊ | 211/240 [01:05<00:11,  2.61it/s]


0: 640x384 4 persons, 1 bottle, 1 cup, 1 dining table, 124.8ms
Speed: 3.2ms preprocess, 124.8ms inference, 1.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  88%|████████▊ | 212/240 [01:05<00:10,  2.66it/s]


0: 640x384 4 persons, 1 bottle, 1 cup, 1 chair, 1 dining table, 98.6ms
Speed: 3.9ms preprocess, 98.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  89%|████████▉ | 213/240 [01:06<00:09,  2.89it/s]


0: 640x384 4 persons, 1 cup, 1 dining table, 87.0ms
Speed: 3.6ms preprocess, 87.0ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  89%|████████▉ | 214/240 [01:06<00:08,  3.13it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 1 book, 97.2ms
Speed: 3.4ms preprocess, 97.2ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  90%|████████▉ | 215/240 [01:06<00:07,  3.19it/s]


0: 640x384 5 persons, 1 bottle, 1 chair, 1 dining table, 103.0ms
Speed: 2.9ms preprocess, 103.0ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  90%|█████████ | 216/240 [01:06<00:07,  3.31it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 105.3ms
Speed: 2.9ms preprocess, 105.3ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  90%|█████████ | 217/240 [01:07<00:06,  3.40it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 119.5ms
Speed: 3.6ms preprocess, 119.5ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  91%|█████████ | 218/240 [01:07<00:06,  3.35it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 94.1ms
Speed: 6.6ms preprocess, 94.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  91%|█████████▏| 219/240 [01:07<00:06,  3.40it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 96.4ms
Speed: 3.3ms preprocess, 96.4ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  92%|█████████▏| 220/240 [01:07<00:05,  3.48it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 98.4ms
Speed: 3.9ms preprocess, 98.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  92%|█████████▏| 221/240 [01:08<00:05,  3.48it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 90.8ms
Speed: 3.6ms preprocess, 90.8ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  92%|█████████▎| 222/240 [01:08<00:05,  3.46it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 1 book, 90.8ms
Speed: 7.6ms preprocess, 90.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  93%|█████████▎| 223/240 [01:08<00:04,  3.48it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 103.8ms
Speed: 3.6ms preprocess, 103.8ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  93%|█████████▎| 224/240 [01:09<00:04,  3.52it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 1 book, 135.0ms
Speed: 4.3ms preprocess, 135.0ms inference, 2.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  94%|█████████▍| 225/240 [01:09<00:04,  3.37it/s]


0: 640x384 5 persons, 1 bottle, 1 dining table, 1 book, 89.6ms
Speed: 3.2ms preprocess, 89.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  94%|█████████▍| 226/240 [01:09<00:04,  3.43it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 107.1ms
Speed: 3.3ms preprocess, 107.1ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  95%|█████████▍| 227/240 [01:10<00:03,  3.44it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 refrigerator, 1 book, 102.9ms
Speed: 3.2ms preprocess, 102.9ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  95%|█████████▌| 228/240 [01:10<00:03,  3.46it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 refrigerator, 1 book, 101.5ms
Speed: 3.4ms preprocess, 101.5ms inference, 1.2ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  95%|█████████▌| 229/240 [01:10<00:03,  3.39it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 refrigerator, 1 book, 91.1ms
Speed: 3.6ms preprocess, 91.1ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  96%|█████████▌| 230/240 [01:10<00:02,  3.48it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 refrigerator, 1 book, 92.7ms
Speed: 5.9ms preprocess, 92.7ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  96%|█████████▋| 231/240 [01:11<00:02,  3.52it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 refrigerator, 1 book, 120.3ms
Speed: 2.9ms preprocess, 120.3ms inference, 1.4ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  97%|█████████▋| 232/240 [01:11<00:02,  3.47it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 91.9ms
Speed: 3.4ms preprocess, 91.9ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  97%|█████████▋| 233/240 [01:11<00:01,  3.53it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 86.2ms
Speed: 3.4ms preprocess, 86.2ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  98%|█████████▊| 234/240 [01:12<00:01,  3.60it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 1 book, 96.0ms
Speed: 4.7ms preprocess, 96.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  98%|█████████▊| 235/240 [01:12<00:01,  3.59it/s]


0: 640x384 4 persons, 1 bottle, 1 chair, 1 dining table, 1 book, 130.6ms
Speed: 3.4ms preprocess, 130.6ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  98%|█████████▊| 236/240 [01:12<00:01,  3.41it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 98.2ms
Speed: 5.7ms preprocess, 98.2ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  99%|█████████▉| 237/240 [01:12<00:00,  3.47it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 88.6ms
Speed: 4.5ms preprocess, 88.6ms inference, 0.9ms postprocess per image at shape (1, 3, 640, 384)


Processing frames:  99%|█████████▉| 238/240 [01:13<00:00,  3.57it/s]


0: 640x384 4 persons, 1 bottle, 1 dining table, 1 book, 103.4ms
Speed: 3.7ms preprocess, 103.4ms inference, 1.1ms postprocess per image at shape (1, 3, 640, 384)


Processing frames: 100%|█████████▉| 239/240 [01:13<00:00,  3.58it/s]


0: 640x384 3 persons, 1 bottle, 1 couch, 1 dining table, 1 book, 87.4ms
Speed: 3.9ms preprocess, 87.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 384)


Processing frames: 100%|██████████| 240/240 [01:13<00:00,  3.25it/s]
ERROR:root:An error occurred: OpenCV(4.10.0) /io/opencv/modules/highgui/src/window.cpp:1295: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvDestroyAllWindows'

2024-12-12 19:58:46,668 - ERROR - An error occurred: OpenCV(4.10.0) /io/opencv/modules/highgui/src/window.cpp:1295: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvDestroyAllWindows'

2024-12-12 19:58:46,668 - ERROR - An error occurred: OpenCV(4.10.0) /io/opencv/modules/highgui/src/window.cpp:1295: error: (-2:Unspecified error) The function is not implemented. Rebuild t

In [38]:
 # Ist future work need to show accuracy as percentage
 # 2nd Future work
import os
from flask import Flask, render_template, request, redirect, url_for
from flask_ngrok import run_with_ngrok
from ultralytics import YOLO
import cv2
from PIL import Image

# Setup Flask app
app = Flask(__name__)
run_with_ngrok(app)  # Start ngrok when the app runs

# Paths
UPLOAD_FOLDER = '/content/uploads/'
RESULT_FOLDER = '/content/results/'
MODEL_PATH = '/content/drive/MyDrive/royal_detector.pt'  # Update this to your model's path

os.makedirs(UPLOAD_FOLDER, exist_ok=True)
os.makedirs(RESULT_FOLDER, exist_ok=True)

# Load YOLO model
model = YOLO(MODEL_PATH)

# Home Page
@app.route('/')
def index():
    return '''
    <!DOCTYPE html>
    <html>
    <head>
        <title>Royal Detection</title>
    </head>
    <body>
        <h1>Royal Detection - Upload an Image</h1>
        <form action="/upload" method="POST" enctype="multipart/form-data">
            <input type="file" name="file">
            <button type="submit">Upload</button>
        </form>
    </body>
    </html>
    '''

# Upload and Process Image
@app.route('/upload', methods=['POST'])
def upload_file():
    if 'file' not in request.files:
        return "No file part"
    file = request.files['file']
    if file.filename == '':
        return "No selected file"
    if file:
        # Save the uploaded image
        filepath = os.path.join(UPLOAD_FOLDER, file.filename)
        file.save(filepath)

        # Run YOLO on the image
        results = model.predict(source=filepath, save=True, conf=0.25)

        # Save the result image
        result_filepath = os.path.join(RESULT_FOLDER, file.filename)
        results[0].save(save_dir=RESULT_FOLDER)

        return f'''
        <!DOCTYPE html>
        <html>
        <head>
            <title>Royal Detection Result</title>
        </head>
        <body>
            <h1>Detection Result</h1>
            <img src="/result/{file.filename}" alt="Result Image">
            <br><br>
            <a href="/">Go Back</a>
        </body>
        </html>
        '''

# Serve Result Images
@app.route('/result/<filename>')
def result_file(filename):
    return redirect(f'/content/results/{filename}')

# Start the app
if __name__ == '__main__':
    app.run()


ModuleNotFoundError: No module named 'flask_ngrok'

## Model Description

<img width="800" alt="YOLO Model Comparison" src="https://raw.githubusercontent.com/ultralytics/assets/main/yolov8/yolo-comparison-plots.png">

Ultralytics YOLOv5 🚀 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv5 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection, instance segmentation and image classification tasks.

We hope that the resources here will help you get the most out of YOLOv5. Please browse the YOLOv5 [Docs](https://docs.ultralytics.com/yolov5) for details, raise an issue on [GitHub](https://github.com/ultralytics/yolov5/issues/new/choose) for support, and join our [Discord](https://discord.gg/n6cFeSPZdD) community for questions and discussions!

| Model                                                                                           | size<br><sup>(pixels) | mAP<sup>val<br>50-95 | mAP<sup>val<br>50 | Speed<br><sup>CPU b1<br>(ms) | Speed<br><sup>V100 b1<br>(ms) | Speed<br><sup>V100 b32<br>(ms) | params<br><sup>(M) | FLOPs<br><sup>@640 (B) |
|-------------------------------------------------------------------------------------------------|-----------------------|----------------------|-------------------|------------------------------|-------------------------------|--------------------------------|--------------------|------------------------|
| [YOLOv5n](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5n.pt)              | 640                   | 28.0                 | 45.7              | **45**                       | **6.3**                       | **0.6**                        | **1.9**            | **4.5**                |
| [YOLOv5s](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt)              | 640                   | 37.4                 | 56.8              | 98                           | 6.4                           | 0.9                            | 7.2                | 16.5                   |
| [YOLOv5m](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5m.pt)              | 640                   | 45.4                 | 64.1              | 224                          | 8.2                           | 1.7                            | 21.2               | 49.0                   |
| [YOLOv5l](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5l.pt)              | 640                   | 49.0                 | 67.3              | 430                          | 10.1                          | 2.7                            | 46.5               | 109.1                  |
| [YOLOv5x](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5x.pt)              | 640                   | 50.7                 | 68.9              | 766                          | 12.1                          | 4.8                            | 86.7               | 205.7                  |
|                                                                                                 |                       |                      |                   |                              |                               |                                |                    |                        |
| [YOLOv5n6](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5n6.pt)            | 1280                  | 36.0                 | 54.4              | 153                          | 8.1                           | 2.1                            | 3.2                | 4.6                    |
| [YOLOv5s6](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s6.pt)            | 1280                  | 44.8                 | 63.7              | 385                          | 8.2                           | 3.6                            | 12.6               | 16.8                   |
| [YOLOv5m6](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5m6.pt)            | 1280                  | 51.3                 | 69.3              | 887                          | 11.1                          | 6.8                            | 35.7               | 50.0                   |
| [YOLOv5l6](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5l6.pt)            | 1280                  | 53.7                 | 71.3              | 1784                         | 15.8                          | 10.5                           | 76.8               | 111.4                  |
| [YOLOv5x6](https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5x6.pt)<br>+ [TTA] | 1280<br>1536          | 55.0<br>**55.8**     | 72.7<br>**72.7**  | 3136<br>-                    | 26.2<br>-                     | 19.4<br>-                      | 140.7<br>-         | 209.8<br>-             |

<details>
  <summary>Table Notes</summary>

- All checkpoints are trained to 300 epochs with default settings. Nano and Small models use [hyp.scratch-low.yaml](https://github.com/ultralytics/yolov5/blob/master/data/hyps/hyp.scratch-low.yaml) hyps, all others use [hyp.scratch-high.yaml](https://github.com/ultralytics/yolov5/blob/master/data/hyps/hyp.scratch-high.yaml).
- **mAP<sup>val</sup>** values are for single-model single-scale on [COCO val2017](http://cocodataset.org) dataset.<br>Reproduce by `python val.py --data coco.yaml --img 640 --conf 0.001 --iou 0.65`
- **Speed** averaged over COCO val images using a [AWS p3.2xlarge](https://aws.amazon.com/ec2/instance-types/p3/) instance. NMS times (~1 ms/img) not included.<br>Reproduce by `python val.py --data coco.yaml --img 640 --task speed --batch 1`
- **TTA** [Test Time Augmentation](https://docs.ultralytics.com/yolov5/tutorials/test_time_augmentation) includes reflection and scale augmentations.<br>Reproduce by `python val.py --data coco.yaml --img 1536 --iou 0.7 --augment`

</details>

## Load From PyTorch Hub

This example loads a pretrained **YOLOv5s** model and passes an image for inference. YOLOv5 accepts **URL**, **Filename**, **PIL**, **OpenCV**, **Numpy** and **PyTorch** inputs, and returns detections in **torch**, **pandas**, and **JSON** output formats. See the [YOLOv5 PyTorch Hub Tutorial](https://docs.ultralytics.com/yolov5/tutorials/pytorch_hub_model_loading/) for details.

In [None]:
import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Images
imgs = ['/content/bus.jpg']  # batch of images

# Inference
results = model(imgs)

# Results
results.print()
results.save()  # or .show()

results.xyxy[0]  # img1 predictions (tensor)
results.pandas().xyxy[0]  # img1 predictions (pandas)
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 1  433.50  433.50   517.5  714.5    0.687988     27     tie
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2024-9-15 Python-3.10.12 torch-2.4.0+cu121 CUDA:0 (Tesla T4, 15102MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 


UnidentifiedImageError: cannot identify image file '/content/bus.jpg'

## Citation

If you use YOLOv5 or YOLOv5u in your research, please cite the Ultralytics YOLOv5 repository as follows:

[![DOI](https://zenodo.org/badge/264818686.svg)](https://zenodo.org/badge/latestdoi/264818686)

In [None]:
@software{yolov5,
  title = {YOLOv5 by Ultralytics},
  author = {Glenn Jocher},
  year = {2020},
  version = {7.0},
  license = {AGPL-3.0},
  url = {https://github.com/ultralytics/yolov5},
  doi = {10.5281/zenodo.3908559},
  orcid = {0000-0001-5950-6979}
}

## Contact

For YOLOv5 bug reports and feature requests please visit [GitHub Issues](https://github.com/ultralytics/yolov5/issues), and join our [Discord](https://discord.gg/n6cFeSPZdD) community for questions and discussions!

&nbsp;