In this notebook I will share one idea to merging traking data with sideline helmet label information.

Main approch is that if we can find specific 4 pair points(cx, cy) which is matching with `tracking images` & `side or endline images`, we can find homography `H` for Perspective Transformation.

in this notebook I'll use field line numbers to find homography `H` between `tracking images` and `sideline images`

Reference: 
- https://www.kaggle.com/robikscube/nfl-helmet-assignment-getting-started-guide
- https://www.kaggle.com/c/nfl-health-and-safety-helmet-assignment/discussion/264361#1467283
- https://www.kaggle.com/coldfir3/camera-tracking-matching-with-gradient-descent
- https://www.kaggle.com/go5kuramubon/merge-label-and-tracking-data

In [None]:
!pip install imageio-ffmpeg

In [None]:
import os
import cv2
import imageio
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
from tqdm.auto import tqdm

# Prepare

In [None]:
## https://www.kaggle.com/go5kuramubon/merge-label-and-tracking-data

# Read in data files
BASE_DIR = '../input/nfl-health-and-safety-helmet-assignment'

# Labels and sample submission
labels = pd.read_csv(f'{BASE_DIR}/train_labels.csv')
ss = pd.read_csv(f'{BASE_DIR}/sample_submission.csv')

# Player tracking data
tr_tracking = pd.read_csv(f'{BASE_DIR}/train_player_tracking.csv')
te_tracking = pd.read_csv(f'{BASE_DIR}/test_player_tracking.csv')

# Baseline helmet detection labels
tr_helmets = pd.read_csv(f'{BASE_DIR}/train_baseline_helmets.csv')
te_helmets = pd.read_csv(f'{BASE_DIR}/test_baseline_helmets.csv')

# Extra image labels
img_labels = pd.read_csv(f'{BASE_DIR}/image_labels.csv')

In [None]:
##https://www.kaggle.com/robikscube/nfl-helmet-assignment-getting-started-guide

def add_track_features(tracks, fps=59.94, snap_frame=10):
    """
    Add column features helpful for syncing with video data.
    """
    tracks = tracks.copy()
    tracks["game_play"] = (
        tracks["gameKey"].astype("str")
        + "_"
        + tracks["playID"].astype("str").str.zfill(6)
    )
    tracks["time"] = pd.to_datetime(tracks["time"])
    snap_dict = (
        tracks.query('event == "ball_snap"')
        .groupby("game_play")["time"]
        .first()
        .to_dict()
    )
    tracks["snap"] = tracks["game_play"].map(snap_dict)
    tracks["isSnap"] = tracks["snap"] == tracks["time"]
    tracks["team"] = tracks["player"].str[0].replace("H", "Home").replace("V", "Away")
    tracks["snap_offset"] = (tracks["time"] - tracks["snap"]).astype(
        "timedelta64[ms]"
    ) / 1_000
    # Estimated video frame
    tracks["est_frame"] = (
        ((tracks["snap_offset"] * fps) + snap_frame).round().astype("int")
    )
    return tracks


tr_tracking = add_track_features(tr_tracking)
te_tracking = add_track_features(te_tracking)


In [None]:
def merge_label_and_tracking(tracking_df, label_df):

    tracking_with_game_index = tracking_df.set_index(["gameKey", "playID", "player"])

    df_list = []

    for key, _label_df in tqdm(label_df.groupby(["gameKey", "playID", "view", "label"])):
        # skip because there are sideline player
        if key[3] == "H00" or key[3] == "V00":
            continue

        tracking_data = tracking_with_game_index.loc[(key[0], key[1], key[3])]
        _label_df = _label_df.sort_values("frame")

        # merge with frame and est_frame
        merged_df = pd.merge_asof(
            _label_df,
            tracking_data,
            left_on="frame",
            right_on="est_frame",
            direction='nearest',
        )
        df_list.append(merged_df)

    all_merged_df = pd.concat(df_list)
    all_merged_df = all_merged_df.sort_values(["video_frame", "label"], ignore_index=True)
    
    return all_merged_df

In [None]:
merged_df = merge_label_and_tracking(tr_tracking, labels)

In [None]:
unique_gameKeys = merged_df.gameKey.unique()
check_frame = 1
homography_df = merged_df[(merged_df.gameKey == unique_gameKeys[0]) & (merged_df.frame == check_frame) & (merged_df.view =='Sideline')].copy()
homography_df.head()

## PerspectiveTransform

If we know matched Keypoints in the images, we can find homography `H` using `cv2.findHomography`. 

below code show how we can transform sideline helmet boxes to tracking data scale.

In [None]:
trakcing_coordinate = np.float32(list(zip(homography_df['x'],53.33-homography_df['y']))).reshape(-1,1,2)
label_coordinate =  np.float32(list(zip(homography_df['left']+homography_df['width']/2,homography_df['top']-homography_df['height']/2))).reshape(-1,1,2)

In [None]:
H, mask = cv2.findHomography(label_coordinate, trakcing_coordinate)
transformed_coordinate =  cv2.perspectiveTransform(label_coordinate, H)

In [None]:
print(H)

In [None]:
plt.figure(figsize=(12,10))

plt.scatter(transformed_coordinate[:, :, 0],transformed_coordinate[:, :, 1])

plt.scatter(homography_df['x'], 53.33-homography_df['y'])

plt.legend(['Transformed coordinate from Sideline helmet box','Ground truth tracking data'])

But important thing is that we can't match each keypoints exactly becaues we don't have a enough information to find homography.

If we can match specific pair points with tracking images, side & endzone images, we might find good homography.

I'll use filed line numbers to match both images

## Video to Frame

In [None]:
video_name = homography_df.video.unique()
video_path = f"{BASE_DIR}/train/{video_name[0]}"

vid = imageio.get_reader(video_path, 'ffmpeg')
img = vid.get_data(check_frame - 1)
plt.figure(figsize=(12, 10))
plt.imshow(img)

## Finding filed line number points in sideline

In [None]:
line_numbers = [[110, 600],  ## Home Sideline 20
                [550, 630],  ## Home Sideline 30
                [990, 680],  ## Home Sideline 40
                [1150, 200], ## Victory Sideline 40
                [770, 200]]  ## Victory Sideline 30
for line_number in line_numbers:
    img = cv2.circle(img, (line_number[0],line_number[1]), radius=2, color=(0, 255, 255), thickness=10)

plt.figure(figsize=(12, 10))
plt.imshow(img)    

## 

## Finding filed line number points in tracking data
![images](https://drive.google.com/uc?export=view&id=1IdUQHo9G673ifp-mIrwiG_ep0q88H13N)

If we treat tracking `x`, `y`  as pair points in this images, we can guess that Finding filed line number points.

In [None]:
projection_numbers = [[30, 53.3-10], ## Home Sideline 20
                      [40, 53.3-10], ## Home Sideline 30
                      [50, 53.3-10], ## Home Sideline 40
                      [50, 10],      ## Victory Sideline 40
                      [40, 10]]      ## Victory Sideline 30

In [None]:
H, mask = cv2.findHomography(np.float32(line_numbers).reshape(5, 2), np.float32(projection_numbers).reshape(5, 2))

In [None]:
print(H)

In [None]:
transformed_coordinate =  cv2.perspectiveTransform(label_coordinate, H)

In [None]:
plt.figure(figsize=(12,10))

plt.scatter(transformed_coordinate[:, :, 0],transformed_coordinate[:, :, 1])

plt.scatter(homography_df['x'], 53.33-homography_df['y'])

plt.legend(['Transformed coordinate from Sideline helmet box','Ground truth tracking data'])

## Next to 
- Matching label using homography information
- Build filed number detection model ? 
- Merging with MOT models like deepsort, FairMOT