# Player Re-Identification in Sports Video using YOLOv11


## Goal

Given a 15-second sports video, detect all players and assign them unique IDs, ensuring each player keeps the same ID even if they leave and re-enter the frame.

## Step-by-Step Process

### Step 1: Load the Video and Object Detection Model

What to do:
- Load the input video (e.g., `15sec_input_720p.mp4`) using OpenCV.
- Load the pretrained YOLOv11 model to detect players.

Explanation:
Read the video frame by frame and use the YOLO model to detect players by drawing bounding boxes.

### Step 2: Detect Players in Each Frame

What to do:
- For every frame:
  - Run the YOLO model.
  - Extract bounding box coordinates [x, y, width, height]
  - Extract class label (class ID 2 for player)

Explanation:
Identify player locations in each frame using the output of the object detector.

### Step 3: Initialize a Player Tracker

What to use:
- Use a tracker to maintain consistent IDs across frames:
  - SORT (Simple Online Realtime Tracking) - fast but IDs may change on reentry
  - Deep SORT - uses appearance features, more stable IDs on reentry

# Importing the Libraries

In [17]:
import cv2
from ultralytics import YOLO

### Loading the YOLO Model

In [18]:
model = YOLO("best.pt")

### step-1

In [19]:
# loading the given video
video_path = "15sec_input_720p.mp4"
capture = cv2.VideoCapture(video_path)

In [20]:
# Check if video loaded correctly
if not capture.isOpened():
    print("Could not open video.")
else:
    print("Video loaded successfully!")

Video loaded successfully!


In [5]:
# checking the assigned numbers
model.names

{0: 'ball', 1: 'goalkeeper', 2: 'player', 3: 'referee'}

### step-2
#### Checking whether it captures the players and writing it to output.mp4

In [None]:
## Reading frame by frame
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
fps = int(capture.get(cv2.CAP_PROP_FPS))

width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))

out = cv2.VideoWriter("./output_video.mp4",fourcc,fps,(width,height))

while True:
    ## This retures the boolean value and the frame while reading frame by frame
    ret,frame = capture.read()

    ## Break out of the while loop when all frames are processed
    if not ret:
        break

    ## Running the YOLO model detection on the frame
    results = model(frame)

    ## Extracting the detections from the results
    detections = results[0].boxes.data.cpu().numpy()

    ## looping over the detection to get the coordinates , score and class id
    for det in detections:
        x1,y1,x2,y2,score,class_id = det
        
        ## player class is 2
        if int(class_id)==2:
            ## Draw the bounding box around
            cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 2)
            cv2.putText(frame, f"Player", (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX,
                        0.6, (0, 255, 0), 2)
            
    out.write(frame)

## At last release the variables 
capture.release()
out.release()
cv2.destroyAllWindows() 


0: 384x640 1 ball, 16 players, 2 referees, 951.6ms
Speed: 9.4ms preprocess, 951.6ms inference, 2.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 18 players, 2 referees, 935.1ms
Speed: 2.0ms preprocess, 935.1ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 16 players, 2 referees, 832.8ms
Speed: 1.9ms preprocess, 832.8ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 14 players, 2 referees, 829.9ms
Speed: 2.0ms preprocess, 829.9ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 14 players, 2 referees, 893.3ms
Speed: 2.3ms preprocess, 893.3ms inference, 1.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 16 players, 2 referees, 754.3ms
Speed: 1.6ms preprocess, 754.3ms inference, 1.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 15 players, 2 referees, 763.1ms
Speed: 1.8ms preprocess, 763.1ms inference, 0.9ms postprocess pe

### Step-3
### Initializing the player tracker using SORT

### SORT (Simple Online and Realtime Tracking) is a fast object tracking algorithm.
- It uses two main techniques:
- Kalman Filter – to guess where the object will be in the next frame.
- Hungarian Algorithm – to match detections with the guessed locations.

### How SORT Works

1. **Detection:** In each frame, objects are detected (e.g., using YOLO).
2. **Prediction:** The Kalman Filter predicts the new positions of tracked objects.
3. **Update:** If a match is found, update the position of the object.

In [None]:
from sort import Sort
import numpy as np
## Reading frame by frame
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
fps = int(capture.get(cv2.CAP_PROP_FPS))

width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))

out = cv2.VideoWriter("./tracker_video.mp4",fourcc,fps,(width,height))
tracker = Sort()

while True:
    ## This retures the boolean value and the frame while reading frame by frame
    ret,frame = capture.read()

    ## Break out of the while loop when all frames are processed
    if not ret:
        break

    ## Running the YOLO model detection on the frame
    results = model(frame)

    ## Extracting the detections from the results
    detections = results[0].boxes.data.cpu().numpy()

    player_detection = []

    ## looping over the detection to get the coordinates , score and class id
    for det in detections:
        x1,y1,x2,y2,score,class_id = det
        
        ## player class is 2
        if int(class_id)==2:
            ## append values into player_detection array
            player_detection.append([x1,y1,x2,y2,score])

    ## checking whether the array is empty or not
    if len(player_detection)>0:
        
        player_detection = np.array(player_detection)

        tracked_players = tracker.update(player_detection)
                
        for track in tracked_players:
            x1, y1, x2, y2, track_id = track
            cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (255, 0, 0), 2)
            cv2.putText(frame, f"Player {int(track_id)}", (int(x1), int(y1) - 10),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 0, 0), 2)
    
    ## if it is empty update manually like this to avoid the error
    else:
        tracker.update(np.empty((0,5)))
        
    out.write(frame)

    
capture.release()
out.release()


0: 384x640 1 ball, 16 players, 2 referees, 752.6ms
Speed: 2.1ms preprocess, 752.6ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 18 players, 2 referees, 681.1ms
Speed: 1.4ms preprocess, 681.1ms inference, 1.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 16 players, 2 referees, 678.4ms
Speed: 1.8ms preprocess, 678.4ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 14 players, 2 referees, 680.3ms
Speed: 1.6ms preprocess, 680.3ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 14 players, 2 referees, 685.4ms
Speed: 1.2ms preprocess, 685.4ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 16 players, 2 referees, 676.2ms
Speed: 1.3ms preprocess, 676.2ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 15 players, 2 referees, 728.4ms
Speed: 1.2ms preprocess, 728.4ms inference, 0.9ms postprocess pe

### Initializing the player track using the deep sort 
##### DeepSORT is a Computer Vision Tracking Algorithm used to track the objects while assigning each of the tracked object a unique id. DeepSORT is an extension of the SORT. DeepSORT introduces deep learning into SORT algorithm by adding appearance descriptor to reduce the identity switches and hence making the tracking more efficient.

In [None]:
## import the Deepsort library
from deep_sort_realtime.deepsort_tracker import DeepSort
import numpy as np


## initialize the fourcc code inorder to help the cv to understand the type of video and also to compress it to minimize the size
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
fps = int(capture.get(cv2.CAP_PROP_FPS))

## get the width and height of the original video to ensure the same dimention for the output video
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))

out = cv2.VideoWriter("./deepsort_video.mp4",fourcc,fps,(width,height))

# Initialize Deep SORT tracker with a max_age of 30 frames (how long it keeps a lost object before deleting)

tracker = DeepSort(max_age=30)


while True:

    ## this returns the boolean and the frame 
    ret, frame = capture.read()

    ## when the video ends ret returns false which means there are no frames we can break the loop
    if not ret:
        break

    ## give the frames one by one to model to detect the players
    results = model(frame)
    detections = results[0].boxes.data.cpu().numpy()

    ## initialize an array to store the coordinates of the box to track the players
    player_detection = []

    for det in detections:
        x1, y1, x2, y2, score, class_id = det
        
        ## player class is 2 as we seen before using model.names
        if int(class_id) == 2:  # player class
            x, y, w, h = x1, y1, x2 - x1, y2 - y1
            player_detection.append(([x, y, w, h], float(score), "player"))

    # Update DeepSort with player detections
    tracks = tracker.update_tracks(player_detection, frame=frame)

    # Loop through the tracked objects returned by deep sort
    for track in tracks:
        if not track.is_confirmed():
            continue

        track_id = track.track_id # Unique ID assigned to the player
        ltrb = track.to_ltrb()  # Get bounding box in [x1, y1, x2, y2] format
        x1, y1, x2, y2 = map(int, ltrb)

        # Draw bounding box and player ID on the frame
        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(frame, f"Player {track_id}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 255, 0), 2)

     # Write the processed frame to the output video to check the output
    out.write(frame)


## Relase the resources at last
capture.release()
out.release()


0: 384x640 1 ball, 16 players, 2 referees, 742.5ms
Speed: 8.8ms preprocess, 742.5ms inference, 12.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 18 players, 2 referees, 730.9ms
Speed: 2.4ms preprocess, 730.9ms inference, 3.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 16 players, 2 referees, 721.8ms
Speed: 2.6ms preprocess, 721.8ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 14 players, 2 referees, 799.2ms
Speed: 2.1ms preprocess, 799.2ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 14 players, 2 referees, 821.3ms
Speed: 4.6ms preprocess, 821.3ms inference, 2.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 ball, 16 players, 2 referees, 740.3ms
Speed: 2.1ms preprocess, 740.3ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 15 players, 2 referees, 745.3ms
Speed: 2.0ms preprocess, 745.3ms inference, 0.9ms postprocess p