<a href="https://colab.research.google.com/github/isavida/football-task/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install this specific YOLO branch which includes weighted loss function
!git clone --branch fix#8578 https://github.com/hulkds/ultralytics.git -q
!pip install /content/ultralytics/ lapx==0.5.5 -q

# Fine-tuning dataset
!wget -q -O dataset.zip https://universe.roboflow.com/ds/91Soi5QkdU?key=E6tIgxhinz
!unzip -q dataset.zip -d dataset

# Download drive folder
!pip install -U --no-cache-dir gdown --pre -q
!gdown --id 1AXgq-cQtfJdeinnDD8mJQAN26HsJWE99 -O task.zip -q
!unzip -q task.zip

!pip install easyocr -q
!pip install jsonlines -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m30.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for ultralytics (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.9/2.9 MB[0m [31m40.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m908.3/908.3 kB[0m [31m66.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.2/307.2 kB[0m [31m34.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import colorsys
import copy
import cv2
import easyocr
import imutils
import jsonlines
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import time

from collections import OrderedDict
from google.colab import drive
from google.colab.patches import cv2_imshow
from scipy.spatial.distance import cdist
from sklearn.cluster import KMeans
from ultralytics import YOLO
from ultralytics.data.augment import Albumentations
from ultralytics.utils import LOGGER, colorstr

#Data Augmentation

In [None]:
def __init__(self, p=1.0):
        """Initialize the transform object for YOLO bbox formatted params."""
        self.p = p
        self.transform = None
        prefix = colorstr("albumentations: ")
        try:
            import albumentations as A

            # check_version(A.__version__, "1.0.3", hard=True)  # version requirement

            # Transforms
            T = [
                A.MotionBlur(p=0.8),
                A.Affine(scale=0.8, rotate = [-15,15], shear=[-30,30], p=0.5, mode=cv2.BORDER_REFLECT),
                A.Blur(p=0.3),
                A.GaussNoise(p=0.3),
                A.RandomBrightnessContrast(p=0.3),
                A.GridDistortion(p=0.2),
                A.RandomGamma(p=0.2)
            ]
            self.transform = A.Compose(T, bbox_params=A.BboxParams(format="yolo", label_fields=["class_labels"]))

            LOGGER.info(prefix + ", ".join(f"{x}".replace("always_apply=False, ", "") for x in T if x.p))
        except ImportError:  # package not installed, skip
            pass
        except Exception as e:
            LOGGER.info(f"{prefix}{e}")

Albumentations.__init__ = __init__


# Define inference and fine-tuning

In [None]:
def finetune_model(model, datapath, pos_weight, imgsz=640, epochs=100, patience=10):
    return model.train(data=datapath, pos_weight=pos_weight, imgsz=imgsz, epochs=epochs, patience=patience, dropout=0.2)

def inference_video(model, filename, persist=False, classes=[0,1,2]):
    return model.track(source=filename, save = False, conf=0.1, persist=persist, verbose = False, classes=classes)

#Players detection and team classification

## Getting colors from scoreboard

To correctly classify each player into their team, the team colors should be finded.

As a first approach, is proposed to use the **k-means clustering algorithm** to detect the colors of the jerseys from the bounding boxes of the players, processing the image by removing the green background to have the maximum percentage of relevant information possible.

This approach does not provide the expected results and also does not provide the information of which equipment belongs to the home team and which to the visiting team, so it was decided to obtain all this information from the **scoreboard**.

*Although this notebook does not keep all the code that has been tested for readability, it can be consulted in previous commits.*


###Scoreboard recognition
The first step to obtain the scoreboard colors is to recognize the scoreboard itself. To do this, the pixels that remain **static** in several random frames are obtained and a **letter detector** is used to cut out the rectangle containing the scoreboard from the letters.

In [None]:
def get_static_pixels_from_video(filepath, n_samples=100, std_ratio = 0.04, crop_x_ratio = 0.4, crop_y_ratio = 0.2):
    ''' Crop parameters just accelerates the workflow since we know that the
    scoreboard is located at the upper-left corner. The reader can test
    this function with both crop ratios = 1, which takes around 30 secs using CPU'''
    cap = cv2.VideoCapture(filepath)

    # Randomly select n sample frames
    sample_frames_index = [np.random.randint(0, cap.get(cv2.CAP_PROP_FRAME_COUNT)) for i in range(n_samples)]

    # Store selected frames in an array
    sample_frames = []
    for sfi in sample_frames_index:
        cap.set(cv2.CAP_PROP_POS_FRAMES, sfi)
        _, frame = cap.read()
        if frame is not None:
            sample_frames.append(frame[0:int(crop_y_ratio * frame.shape[0]),
                                0:int(crop_x_ratio * frame.shape[1])])

    # std will help to check static pixels
    # median obtains a precise scoreboard in case it's damaged on any frame
    std_frames = np.std(sample_frames, axis=0).astype(dtype=np.uint8)
    median_frames = np.median(sample_frames, axis=0).astype(dtype=np.uint8)

    # get mean over color channels
    std_frame_mean = np.mean(std_frames/255, axis=2)
    std_frame_mean_3D = np.repeat(std_frame_mean[:,:,np.newaxis], 3, axis=2)

    # filter static pixels
    background = np.where(std_frame_mean_3D < std_ratio, median_frames, 0)

    return background

def xywh_from_points_with_scale(points_2d, scale=1.2):
    ''' Compute center_x, center_y, width and weight given N 2d points '''
    x_min = np.min(points_2d[:,0], axis=0)
    x_max = np.max(points_2d[:,0], axis=0)
    y_min = np.min(points_2d[:,1], axis=0)
    y_max = np.max(points_2d[:,1], axis=0)

    return [(x_max+x_min)/2,
            (y_max+y_min)/2,
            (x_max-x_min) * scale,
            (y_max-y_min) * scale]

def crop_image_given_xywh(image, xywh):
    ''' Crop the input image based on given bounding box coordinates in xywh format '''
    x_min = int(xywh[0] - xywh[2]/2)
    x_max = int(xywh[0] + xywh[2]/2)
    y_min = int(xywh[1] - xywh[3]/2)
    y_max = int(xywh[1] + xywh[3]/2)

    return image[y_min:y_max, x_min:x_max]

def detect_team_scoreboard_and_crop_image(background):
    ''' Detect team initials on a scoreboard image and crop the image around the detected initials '''
    img = copy.deepcopy(background)

    # Initialize the text detector
    reader = easyocr.Reader(['en'], gpu=True)

    # Detect text on the image
    text_ = reader.readtext(img)

    # Set threshold for text detection confidence
    threshold = 0.25
    initials = []

    # Iterate over detected text and check if the text meets the criteria for team initials
    for t_, t in enumerate(text_):
        bbox, text, score = t

        if score > threshold and len(text) == 3 and ~(any(char.isdigit() for char in text)):
            initials.append(bbox)

    # Convert bounding box points to xywh format and crop image based on the detected bounding boxes
    initials_np = np.array(initials)
    initials_np = initials_np.reshape(initials_np.shape[0] * initials_np.shape[1], initials_np.shape[2])
    xywh = xywh_from_points_with_scale(initials_np)
    crop_img = crop_image_given_xywh(img, xywh)

    return crop_img


In [None]:
def split_scoreboard_per_team(scoreboard):
    ''' Split image in 2 by x = w//2'''
    img_width = scoreboard.shape[1]
    return scoreboard[:,:img_width//2,:], scoreboard[:,img_width//2:,:]

### Color extraction
The second step is to obtain the colors of the scoreboard (the one on the left belongs to the home team and the one on the right belongs to the away team).

To do this, we use the k-means clustering algorithm to keep the main colors of the scoreboard. This **quantization** groups the similar colors and facilitates the subsequent comparison. We use to our advantage in this approach the fact that teams wear colors that are easily distinguishable from each other when playing a match.

Once this quantization is done, we divide the scoreboard in half and compare the **frequency** of each color on each side to identify which colors are the most distinctive on each side.

In [None]:
def quantize_img(img, K=32):
    ''' This function quantize the input image by using k-means algorithm,
        dividing the input image in the K most-predominant colors'''
    # Preprocess input img
    Z = img.reshape((-1,3))
    Z = np.float32(Z)

    # Specify stopping criteria, max_iters and desired-accuracy
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
    #(samples,nclusters,None,criteria,attempts,flags)
    ret,label,center=cv2.kmeans(Z,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)

    # Postprocess back img to original structure
    center = np.uint8(center)
    res = center[label.flatten()]
    quantized_img = res.reshape((img.shape))

    return quantized_img

In [None]:
def display_colors(colors):
    # Create a blank white image
    bar = np.zeros((50, 300, 3), dtype=np.uint8)
    startX = 0

    # For each dominant color, draw a rectangle on the blank image
    for color in colors:
        endX = startX + (300 // len(colors))
        cv2.rectangle(bar, (int(startX), 0), (int(endX), 50), color.astype(int).tolist(), -1)
        startX = endX

    # Display the image
    cv2_imshow(bar)

In [None]:
def get_color_frequencies(img):
    ''' Get color frequencies and return pd dataframe '''

    img_nx3 = np.float32(img.reshape(-1, 3))

    unique_pixels, counts = np.unique(img_nx3, axis=0, return_counts=True)

    color_dict = {'color': [tuple(color) for color in unique_pixels],
                  'frequency': counts}

    return pd.DataFrame(color_dict)

def get_most_distinctive_color(color_freq_home, color_freq_away, ratio=10):
    ''' Given the color frequency per team, get the most distinctive color per team '''

    df_merged = pd.merge(color_freq_home, color_freq_away, on='color', suffixes=('_df1', '_df2'), how='outer')
    # NaN is filled with 1, which is not true but precise in our results
    df_merged['frequency_df1'] = df_merged['frequency_df1'].fillna(1)
    df_merged['frequency_df2'] = df_merged['frequency_df2'].fillna(1)

    # Compute difference between frequencies in both dataframes
    df_merged['frequency_difference'] = abs(df_merged['frequency_df1'] / df_merged['frequency_df2'])

    # If the frequencies are bigger than ratio or lower than 1/ratio or NaN
    # (meaning that the color is in one side, but not in the other) is considered
    # a difference to consider

    while True:
        outstanding_differences = df_merged[
            (df_merged['frequency_difference'] > ratio) |
            (df_merged['frequency_difference'] < 1/ratio)
        ]
        # There has to be at least one color per team
        if (outstanding_differences['frequency_difference'] > 1).any() and (outstanding_differences['frequency_difference'] < 1).any():
            break
        ratio /= 1.2

    # Retrieve the rows with the most frequency for each df from these differences
    max_frequency_df1 = outstanding_differences.loc[outstanding_differences['frequency_df1'].idxmax()]
    max_frequency_df2 = outstanding_differences.loc[outstanding_differences['frequency_df2'].idxmax()]

    return max_frequency_df1['color'], max_frequency_df2['color']


In [None]:
def get_color_per_team_from_video(filepath, kmeans_nclusters=32, color_freq_ratio=10, debug_color = False):
    ''' Workflow which takes a video as input and return the color assigned to home and away team '''
    # Extract scoreboard from video
    background = get_static_pixels_from_video(filepath)
    scoreboard = detect_team_scoreboard_and_crop_image(background)

    # Quantize scoreboard and split in home-away teams
    quantized_scoreboard = quantize_img(scoreboard, kmeans_nclusters)
    quantized_home_scoreboard, quantized_away_scoreboard = split_scoreboard_per_team(quantized_scoreboard)

    # Get color frequency per team
    color_frequencies_home_scoreboard = get_color_frequencies(quantized_home_scoreboard)
    color_frequencies_away_scoreboard = get_color_frequencies(quantized_away_scoreboard)

    # Get the most used color in a team that is mostly never used in the other team
    home_color, away_color = get_most_distinctive_color(color_frequencies_home_scoreboard, color_frequencies_away_scoreboard, color_freq_ratio)

    if debug_color == True:
        print('Static pixels in sample:')
        cv2_imshow(background)
        print('Detect scoreboard')
        cv2_imshow(scoreboard)
        print('Quantize scoreboard per team')
        cv2_imshow(quantized_home_scoreboard)
        cv2_imshow(quantized_away_scoreboard)
        print('Most distinctive color per team')
        display_colors([np.array(home_color), np.array(away_color)])

    return home_color, away_color

##Color filtering based on team jerseys
Once we have obtained from the scoreboard the color of the home team and the color of the away team, we use the OpenCV *cv2.inRange* method to detect the colors in the jerseys that are in a **range** around the colors of the teams.


This process is performed in the **HSV color space**, as it allows a better color separation and to take into account aspects such as brightness and saturation, which are important for color detection.

In [None]:
def get_color_range(color, h_range=20, s_range=90, v_range=90):
    ''' Calculate a range of colors in the HSV color space based on a given RGB color and a range in HSV space'''
    h, s, v= colorsys.rgb_to_hsv(color[2], color[1], color[0])

    # translate hsv into opencv space
    h = int(180*h)
    s = int(255*s)
    v = int(v)

    low_h=h-h_range
    top_h=h+h_range
    additionalMask = False

    # h coordinates are circular, we have to consider an additional mask in this case
    if low_h < 0:
        new_low_h = low_h + 180
        new_top_h = 180
        low_h = 0
        additionalMask = True
    elif top_h>180:
        new_top_h = top_h - 180
        new_low_h = 0
        top_h = 180
        additionalMask = True

    lower_bound = np.array([low_h, 10, 10])
    upper_bound = np.array([top_h, 255, 255])

    if additionalMask:
        additional_lower_bound = np.array([new_low_h, 10, 10])
        additional_upper_bound = np.array([new_top_h, 255, 255])
        return lower_bound, upper_bound, additional_lower_bound, additional_upper_bound

    return lower_bound, upper_bound

def get_mask(player_hsv, boundaries):
    ''' Filter player via HSV mask '''
    mask = cv2.inRange(player_hsv, boundaries[0], boundaries[1])

    # In case there are two masks due to h being red or similar
    if len(boundaries)==4:
        mask1 = cv2.inRange(player_hsv, boundaries[2], boundaries[3])
        mask = mask + mask1

    return mask

def team_classification(player, config_color, debug=False):
    ''' Classify the team of a player based on the color distribution in the player image '''
    player_hsv = cv2.cvtColor(player,cv2.COLOR_BGR2HSV)

    # Mask and count for home team color
    mask_home_color = cv2.bitwise_and(player_hsv,player_hsv,mask=get_mask(player_hsv, config_color['home_color_range']))
    count_mask_home_color = np.count_nonzero(mask_home_color)

    # Mask and count for away team color
    mask_away_color = cv2.bitwise_and(player_hsv,player_hsv,mask=get_mask(player_hsv, config_color['away_color_range']))
    count_mask_away_color = np.count_nonzero(mask_away_color)

    # Calculate percentages of home and away team colors
    player_count = np.count_nonzero(player_hsv)
    home_percentage = count_mask_home_color / player_count
    away_percentage = count_mask_away_color / player_count

    if debug:
        cv2_imshow(player)
        cv2_imshow(player_hsv)

        print('Percentage of pixels home team : ', home_percentage)
        cv2_imshow(mask_home_color)

        print('Percentage of pixels away team  : ', away_percentage)
        cv2_imshow(mask_away_color)


    # Determine team classification based on color percentages
    if home_percentage > 0.01 and home_percentage > away_percentage:
        return 'Home', config_color['home_color']
    elif away_percentage>0.01 and away_percentage > home_percentage:
        return 'Away', config_color['away_color']
    else:
        return 'Not sure', (0.0, 0.0, 0.0)


In [None]:
def process_ball(ball_img, ball_bounding_boxes):
    '''Process the image of a ball and its bounding boxes'''
    if ball_bounding_boxes:
        # get most confident ball detected
        box = ball_bounding_boxes[ball_bounding_boxes.conf.argmax()]
        x, y, w, h = map(int, box.xywh.tolist()[0])
        ball_img = cv2.rectangle(ball_img, (x-w//2, y-h//2), (x+w//2,y+h//2), (255,255,0), 2)
        ball_img = cv2.putText(ball_img, "Ball",(x, y-h//2-10), cv2.FONT_HERSHEY_SIMPLEX, 0.50, (0, 0, 0), 2)
        return ball_img, (x, y, w, h)
    else:
        return ball_img, None

def process_persons(orig_img, bounding_boxes, config_color):
    ''' Postprocess detected persons. If it is a player, assign it to a team. If it is a referee, just label it. '''

    hsv = cv2.cvtColor(orig_img, cv2.COLOR_BGR2HSV)
    final_image = np.copy(orig_img)

    # Dictionary to count persons belonging to each team and referees
    count_persons = {'home': 0, 'away': 0, 'referees': 0}

    # Mask to remove green color (for field)
    green_mask = cv2.inRange(hsv, (35, 35, 35), (70, 255,255))
    inverted_green_mask= cv2.bitwise_not(green_mask)
    img_without_green = cv2.bitwise_and(orig_img, orig_img, mask=inverted_green_mask)

    for box in bounding_boxes:
        # Box position and dimensions
        x, y, w, h = map(int, box.xywh.tolist()[0])

        # If the box corresponds to a person (cls 1)
        if box.cls==1:
            # Crop image to only have the torso
            player_torso = img_without_green[y-h//2:y,x-w//2:x+w//2]

            # Classify the team of the person
            team_text, color_float = team_classification(player_torso, config_color)

            # Process image
            final_image = cv2.rectangle (final_image, (x-w//2, y-h//2), (x+w//2,y+h//2), (int(color_float[0]), int(color_float[1]), int(color_float[2])), 2)
            final_image = cv2.putText(final_image, team_text + ' ID:'+str(int(box.id.item())), (x-w//2, y-h//2-10), cv2.FONT_HERSHEY_SIMPLEX, 0.50, (0, 0, 0), 2)

            # Update person count based on team classification
            if team_text == 'Home':
                count_persons['home'] += 1
            elif team_text == 'Away':
                count_persons['away'] += 1

        # If the box corresponds to a referee (cls 2)
        elif box.cls==2:
            # Process image
            final_image = cv2.rectangle (final_image, (x-w//2, y-h//2), (x+w//2,y+h//2), (0,0,0), 2)
            final_image = cv2.putText(final_image,'Referee', (x-w//2, y-h//2-10), cv2.FONT_HERSHEY_SIMPLEX, 0.50, (0, 0, 0), 2)

            # Update referee count
            count_persons['referees'] += 1

    return final_image, count_persons

# JSONL

In [None]:
def get_json_output_per_frame(frame, count_persons, ball_location, output_each_n_frames=5):
    ''' Ordered dictionary based on requeriment '''
    return OrderedDict([('frame', frame), ('home_team', count_persons['home']), ('away_team', count_persons['away']), ('refs', count_persons['referees']), ('ball_loc', ball_location)])

In [None]:
def write_output_json(filename, json_output):
    ''' Creates jsonl based on a dictionary list '''
    with jsonlines.open(filename,'w') as writer:
        for elem in json_output:
            writer.write(elem)


## Players detection and classification
The last step of this process would be to run through the video clip we want to process and use the **YOLO model** to obtain the player detections.

In [None]:
def match_detection_pipeline(config):
    ''' Perform object detection using YOLO model.
        Obtain team color by locating the scoreboard in the image.
        Assign a team to a player based on the color.
        Generate video and json output. '''

    time_init = time.time()

    # Init parameters
    json_output = []
    frame_index = 0
    model = YOLO(config['model_checkpoint_path'])

    # Get input video and create output video
    input_video = cv2.VideoCapture(config['input_video_path'])

    width = int(input_video.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(input_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(input_video.get(cv2.CAP_PROP_FPS))

    output_video = cv2.VideoWriter(config['output_video_path'], cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))

    # Get most distinctive color per team from input video
    home_color, away_color = get_color_per_team_from_video(config['input_video_path'], debug_color = config['debug_colors'])

    # Get color range to identify each player team
    home_color_range = get_color_range(home_color)
    away_color_range = get_color_range(away_color)

    config['color'] = {'home_color': home_color, 'home_color_range': home_color_range, 'away_color': away_color, 'away_color_range': away_color_range}

    time_preprocessing = time.time()
    time_list_inference = []
    time_list_postprocessing = []

    while input_video.isOpened():
        # Read each frame
        success, frame = input_video.read()

        time_loop_init = time.time()
        if success:
            # Persons have persist = True so the idenfier doesn't change
            results_persons = inference_video(model, frame, persist=True, classes=[config['class_player_index'], config['class_referee_index']])[0]
            # Ball have persist = False since the detection per frame is better this way
            results_ball = inference_video(model, frame, persist=False, classes=[config['class_ball_index']])[0]

            time_loop_inference = time.time()
            # Post process detections
            processed_img, ball_coords = process_ball(results_ball.orig_img, results_ball.boxes)
            new_frame, count_persons = process_persons(processed_img, results_persons.boxes, config['color'])

            # Frame video output
            output_video.write(new_frame)

            # Frame JSON output
            if frame_index % 5 == 0:
                json_output.append(get_json_output_per_frame(frame_index, count_persons, ball_coords))

            time_loop_postprocessing = time.time()
            time_list_inference.append(time_loop_inference - time_loop_init)
            time_list_postprocessing.append(time_loop_postprocessing - time_loop_inference)

            frame_index += 1

            #cv2_imshow(new_frame)
        else:
            break

    time_end = time.time()
    write_output_json(config['output_json_path'], json_output)

    input_video.release()
    output_video.release()

    computational_time = {}
    computational_time['preprocessing'] = round(time_preprocessing - time_init, 2)
    computational_time['inference'] = round(sum(time_list_inference), 2)
    computational_time['postprocessing'] = round(sum(time_list_postprocessing), 2)
    computational_time['inference_mean'] = round(sum(time_list_inference) / len(time_list_inference), 2)
    computational_time['postprocessing_mean'] = round(sum(time_list_postprocessing) / len(time_list_postprocessing), 2)
    computational_time['total'] = round(time_end - time_init, 2)

    return computational_time

# Pipeline

In [None]:
def main():
    config = {}
    config['debug_colors'] = False
    config['output_each_n_frames'] = 5
    config['class_ball_index'] = 0
    config['class_player_index'] = 1
    config['class_referee_index'] = 2

    config['model_checkpoint_path'] = '/content/football-task-delivery/yolov8n-1088p-motionblur-6xball3xreferee/weights/best.pt'

    for i in range(1,4):
        print(f'Starting clip_{i} ...')
        config['input_video_path'] = f'/content/football-task-delivery/clip_{i}.mp4'
        config['output_video_path'] = f'/content/football-task-delivery/output/clip_{i}_output.mp4'
        config['output_json_path'] = f'/content/football-task-delivery/output/clip_{i}_output.json'

        computational_time = match_detection_pipeline(config)

        print('\n')
        print(f'Total time for clip_{i}: ' + str(computational_time['total']) + ' seconds')
        print('Preprocessing time: ' + str(computational_time['preprocessing']) + ' seconds')
        print('Inference total time: ' + str(computational_time['inference']) + ' seconds')
        print('Inference mean time per frame: ' + str(computational_time['inference_mean']) + ' seconds')
        print('Postprocessing total time: ' + str(computational_time['postprocessing']) + ' seconds')
        print('Postprocessing mean time per frame: ' + str(computational_time['postprocessing_mean']) + ' seconds')
        print('\n')


In [None]:
main()

Starting clip_1 ...




Progress: |██████████████████████████████████████████████████| 100.0% Complete



Progress: |██████████████████████████████████████████████████| 100.0% Complete

Total time for clip_1: 84.21 seconds
Preprocessing time: 19.65 seconds
Inference total time: 46.91 seconds
Inference mean time per frame: 0.09 seconds
Postprocessing total time: 16.29 seconds
Postprocessing mean time per frame: 0.03 seconds


Starting clip_2 ...


Total time for clip_2: 44.07 seconds
Preprocessing time: 9.01 seconds
Inference total time: 24.85 seconds
Inference mean time per frame: 0.09 seconds
Postprocessing total time: 9.48 seconds
Postprocessing mean time per frame: 0.04 seconds


Starting clip_3 ...


Total time for clip_3: 48.66 seconds
Preprocessing time: 9.91 seconds
Inference total time: 27.98 seconds
Inference mean time per frame: 0.09 seconds
Postprocessing total time: 9.99 seconds
Postprocessing mean time per frame: 0.03 seconds




# Fine-tuning

In [None]:
training = False
model_checkpoint = 'yolov8n.pt'
model = YOLO(model_checkpoint)

Downloading https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8n.pt to 'yolov8n.pt'...


100%|██████████| 6.23M/6.23M [00:00<00:00, 271MB/s]


In [None]:
if training:
    # Paste the yaml dataset location in the path
    finetune_model(model, '/content/dataset/data.yaml', pos_weight=[6.0, 1.0, 3.0], imgsz=1088)