# Spatial-Temporal IOU Comparison

In this notebook, we use LaeoNet to track faces in selected shots from the Friends dataset which we then compare to the ground truths using Spatial-Temporal IOU. 

## Data Imports

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
! mkdir data
! mkdir data/frames
! mkdir data/shots

## Generating shots

In [None]:
episode_no = 1

In [None]:
FRAME_PATH = f'drive/MyDrive/Friends/frames/episode{str(episode_no).zfill(2)}.tar.gz'
SHOTS_PATH = f'drive/MyDrive/Friends/shots/shots.tar.gz'
TRACK_PATH = f'drive/MyDrive/Friends/tracks-features/Friends.pk'
VIDEO_PATH = f'drive/MyDrive/Friends_Extra/episode{str(episode_no).zfill(2)}.mp4'

In [None]:
! tar xzf $FRAMES_PATH -C data/frames
! tar xzf $SHOTS_PATH -C data

In [None]:
with open(f'data/shots/season3/episode{str(episode_no).zfill(2)}_shots.txt', 'r') as f:
  shots = [(int(l.split(' ')[0]), int(l.split(' ')[1])) for l in f]

In [None]:
from moviepy.editor import *

def generate_shots(shots, video, outpath, framerate):
  '''Given a list of shot frame intervals and an mp4, 
  generate the clips'''
  clip = VideoFileClip(video)
  for i, (start, end) in enumerate(shots):
    new_clip = clip.subclip((start-1)/framerate, end/framerate)
    new_clip.write_videofile(f'{outpath}shot{i}.mp4', audio = True)

Imageio: 'ffmpeg-linux64-v3.3.1' was not found on your computer; downloading it now.
Try 1. Download from https://github.com/imageio/imageio-binaries/raw/master/ffmpeg/ffmpeg-linux64-v3.3.1 (43.8 MB)
Downloading: 8192/45929032 bytes (0.0%)2899968/45929032 bytes (6.3%)6356992/45929032 bytes (13.8%)9601024/45929032 bytes (20.9%)12967936/45929032 bytes (28.2%)16252928/45929032 bytes (35.4%)19587072/45929032 bytes (42.6%)22716416/45929032 bytes (49.5%)25812992/45929032 bytes (56.2%)29253632/45929032 bytes (63.7%)32612352/45929032 bytes (71.0%)36249600/45929032 bytes (78.9%)39395328/45929032 bytes (85.8%)

In [None]:
# Output mp4 shots to data/shots/
generate_shots(shots, VIDEO_PATH, 'data/shots/',23.98)

[MoviePy] >>>> Building video data/shots/shot0.mp4
[MoviePy] Writing audio in shot0TEMP_MPY_wvf_snd.mp3


100%|██████████| 126/126 [00:00<00:00, 444.35it/s]


[MoviePy] Done.
[MoviePy] Writing video data/shots/shot0.mp4


100%|██████████| 137/137 [00:06<00:00, 22.55it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: data/shots/shot0.mp4 

[MoviePy] >>>> Building video data/shots/shot1.mp4
[MoviePy] Writing audio in shot1TEMP_MPY_wvf_snd.mp3


100%|██████████| 38/38 [00:00<00:00, 406.30it/s]

[MoviePy] Done.
[MoviePy] Writing video data/shots/shot1.mp4



100%|██████████| 41/41 [00:00<00:00, 73.91it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: data/shots/shot1.mp4 

[MoviePy] >>>> Building video data/shots/shot2.mp4
[MoviePy] Writing audio in shot2TEMP_MPY_wvf_snd.mp3


100%|██████████| 326/326 [00:00<00:00, 640.48it/s]


[MoviePy] Done.
[MoviePy] Writing video data/shots/shot2.mp4


100%|██████████| 354/354 [00:18<00:00, 18.86it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: data/shots/shot2.mp4 

[MoviePy] >>>> Building video data/shots/shot3.mp4
[MoviePy] Writing audio in shot3TEMP_MPY_wvf_snd.mp3


100%|██████████| 39/39 [00:00<00:00, 368.05it/s]

[MoviePy] Done.
[MoviePy] Writing video data/shots/shot3.mp4



100%|██████████| 42/42 [00:00<00:00, 75.95it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: data/shots/shot3.mp4 

[MoviePy] >>>> Building video data/shots/shot4.mp4
[MoviePy] Writing audio in shot4TEMP_MPY_wvf_snd.mp3


100%|██████████| 226/226 [00:00<00:00, 572.12it/s]


[MoviePy] Done.
[MoviePy] Writing video data/shots/shot4.mp4


100%|██████████| 245/245 [00:12<00:00, 18.89it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: data/shots/shot4.mp4 



## Downloading LaeoNet

In [None]:
!wget https://github.com/AVAuco/laeonetplus/archive/refs/heads/main.zip -O main.zip
!unzip main.zip
!ln -s laeonetplus-mjmarin-demotrack laeonetplus-main

--2021-12-11 18:06:15--  https://github.com/AVAuco/laeonetplus/archive/refs/heads/main.zip
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/AVAuco/laeonetplus/zip/refs/heads/main [following]
--2021-12-11 18:06:16--  https://codeload.github.com/AVAuco/laeonetplus/zip/refs/heads/main
Resolving codeload.github.com (codeload.github.com)... 192.30.255.121
Connecting to codeload.github.com (codeload.github.com)|192.30.255.121|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘main.zip’

main.zip                [           <=>      ] 106.06M  16.3MB/s    in 7.2s    

2021-12-11 18:06:23 (14.7 MB/s) - ‘main.zip’ saved [111209738]

Archive:  main.zip
e7fb7a977a84d63cf524e59e9716ffa7fd42cb72
   creating: laeonetplus-main/
  inflating: laeonetplus-main/.gitignore  
  inflating

In [None]:
# Move to the correct directory
!ls
%cd laeonetplus-main
!pwd

data  drive  laeonetplus-main  main.zip  sample_data
/content/laeonetplus-main
/content/laeonetplus-main


## Loading LaeoNet Model

In [None]:
"""
Demo code for testing a trained model on an input video.
Adpted to Google Colab
 
Reference:
MJ. Marin-Jimenez, V. Kalogeiton, P. Medina-Suarez, A. Zisserman
LAEO-Net++: revisiting people Looking At Each Other in videos
IEEE TPAMI, 2021
 
(c) MJMJ/2021
"""
 
import os, sys
import numpy as np
import cv2
 
from os.path import expanduser
import os.path as osp
import pickle
 
homedir = "/content/laeonetplus-main/"
 
mainsdir = osp.join(homedir, "mains")
 
# Add custom directories with source code
sys.path.insert(0, os.path.join(mainsdir,"../tracking")) # CHANGE ME
sys.path.insert(0, os.path.join(mainsdir,"../utils")) # CHANGE ME
sys.path.insert(0, os.path.join(mainsdir,"../datasets")) # CHANGE ME
 
sys.path.insert(0, os.path.join(homedir,"utils")) # CHANGE ME
sys.path.insert(0, os.path.join(homedir,"datasets")) # CHANGE ME
sys.path.insert(0, os.path.join(homedir,"tracking")) # CHANGE ME
 
 
gpu_rate = 0.30
theSEED = 0
 
 
# for reproducibility
np.random.seed(theSEED)
 
from mj_tracksManager import TracksManager
from ln_avagoogleImages import mj_getImagePairSeqFromTracks, mj_getFrameBBsPairFromTracks
from ln_laeoImage import mj_padImageTrack
from ln_tracking_heads import process_video
 
from tensorflow.keras.models import load_model

/content/laeonetplus-main/tracking


In [None]:
# ====================================================================================
 
modeldir = homedir+"/models/bestAVA"
modelfile = os.path.join(modeldir, "model-hmaps-trava_pyv36.hdf5")
 
outdirbase = homedir+"/results"
 
case_wanted = "val"
inputs = "1010"
  
# Load model
model = load_model(modelfile, compile=False)
model.summary()
 
# Load mean map (mean head is not used for LAEO-Net++)
meanfile = os.path.join(homedir, "models", "meanmaps10.npy")
mean_map_ = np.load(meanfile)
mean_map5 = mean_map_[5*64:6*64,]

NameError: ignored

## Tracking

In [None]:
# Some parameters
verbose = 1
save_to_disk = True 
   
lTracksInShots = []
 
# Prepare data
# =================================
videonames = [f"shot{i}" for i in range(5)]
for videoname in videonames:
  videospath = "data/shots/"
  videopath = os.path.join(videospath, videoname+".mp4")

  framesdir_for_detection = "/tmp/"+videoname+"_frames"

  outdir= os.path.join(outdirbase, videoname+"_laeo") 
  
  # Given video, detect heads and generate tracks
  # ===============================
  tracks_live = process_video(videopath, verbose=verbose, framesdir=framesdir_for_detection)
  tm2 = TracksManager(filepath="", data=tracks_live)
  
  lTracksInShots.append(tm2)

Processing video data/shots/shot0.mp4
Detections not found for video data/shots/shot0.mp4, generating...
Downloading model...
Model downloaded to /content/laeonetplus-main/tracking/utilstr/../data/models/detector/ssd512-hollywood-trainval-bs_16-lr_1e-05-scale_pascal-epoch-187-py3.6.h5.
Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.map_fn(fn, elems, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.map_fn(fn, elems))
Instructions for updating:
Use fn_output_signature instead
Could not open input video file data/shots/shot0.mp4
Reading video file data/shots/shot0.mp4
Detections saved to "/content/laeonetplus-main/tracking/utilstr/../data/results/dets/shot0_processed_th0.2.pkl/"
Linking backwards
Linking forwards
Finished processing tracks for video data/shots/shot0.mp4, now saving..
Processing video data/shots/shot1.mp4
Detections not found for video data/shots/shot1.mp4, generating...


In [None]:
detections = [[i for i in range(lTracksInShots[s].ntracks)] for s in range(len(shots))]

for s in range(len(shots)):
  t = lTracksInShots[s]
  starting_frame, ending_frame = np.empty(t.ntracks), np.empty(t.ntracks)
  bbx = np.empty(t.ntracks)

  for trix in range(0, t.ntracks):
    start, end = int(t.start(trix)), int(t.end(trix))
    bbx = (t.getTrackIntervalBBs(trix, start, end))

    detections[s][trix] = np.concatenate((np.arange(start, end + 1, dtype = 'int').reshape((-1, 1)), 
                                          bbx), axis = 1)


# Extracting Ground Truth Bounding Boxes

In [None]:
# go back to original directory
% cd -

/content


In [None]:
with open(TRACK_PATH, 'rb') as f:
  tracks = pickle.load(f)

KeyboardInterrupt: ignored

In [None]:
ground_truth = [[] for i in range(len(shots))]

tracks_res = [{} for _ in range(len(shots))]
for trackid, vals in tracks['episode01']['face'].items():

    shot_id = -1
    for i in range(len(shots)):
      if shots[i][0] <= vals[0,0] <= shots[i][1]:
        shot_id = i
        tracks_res[shot_id][trackid] = []

    if shot_id >= 0:
      for frame, bbx1, bbx2, bbx3, bbx4 in vals:
        start_frame = shots[shot_id][0]
        tracks_res[shot_id][trackid].append(np.array([int(frame) - start_frame, bbx1, bbx2, bbx3, bbx4]))
  
for s in range(len(shots)):
  for items in tracks_res[s].values():
    to_add = [items[0]]
    for i in range(1, len(items)):
      if items[i][0] - items[i - 1][0] > 1:
        ground_truth[s].append(to_add)
        to_add = []
      to_add.append(items[i])
    ground_truth[s].append(to_add)

In [None]:
! ls

data  drive  laeonetplus-main  main.zip  sample_data


# Computing the Spatial and Spatio-Temporal IOUs

In [None]:
! wget https://raw.githubusercontent.com/vkalogeiton/caffe/act-detector/act-detector-scripts/ACT_utils.py -O ACT_utils.py

--2021-12-11 18:25:07--  https://raw.githubusercontent.com/vkalogeiton/caffe/act-detector/act-detector-scripts/ACT_utils.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5791 (5.7K) [text/plain]
Saving to: ‘ACT_utils.py’


2021-12-11 18:25:07 (45.4 MB/s) - ‘ACT_utils.py’ saved [5791/5791]



In [None]:
ls

ACT_utils.py  [0m[01;34mdata[0m/  [01;34mdrive[0m/  [01;34mlaeonetplus-main[0m/  main.zip  [01;34msample_data[0m/


In [None]:
from ACT_utils import iou3dt

In [None]:
for i in range(len(ground_truth)):
  for j in range(len(ground_truth[i])):
    ground_truth[i][j] = np.array(ground_truth[i][j])
  ground_truth[i] = np.array(ground_truth[i])

for i in range(len(detections)):
  for j in range(len(detections[i])):
    detections[i][j] = np.array(detections[i][j])
  detections[i] = np.array(detections[i])

In [None]:
similarity_temporal_spatial = [np.empty((len(ground_truth[s]), len(detections[s]))) for s in range(len(shots))]
similarity_spatial = [np.empty((len(ground_truth[s]), len(detections[s]))) for s in range(len(shots))]

for s in range(len(shots)):
  for i in range(len(ground_truth[s])):
    for j in range(len(detections[s])):
      similarity_temporal_spatial[s][i][j] = iou3dt(ground_truth[s][i], detections[s][j])
      similarity_spatial[s][i][j] = iou3dt(ground_truth[s][i], detections[s][j], True)


In [None]:
from numpy import unravel_index
import copy

def match(arr):
  similarity = copy.deepcopy(arr)
  matched = [[] for _ in range(len(similarity))]
  for s in range(len(similarity)):
    while (np.count_nonzero(similarity[s])) > 0:
      max_score = np.amax(similarity[s])
      i, j = unravel_index(similarity[s].argmax(), similarity[s].shape)
      matched[s].append((i, j, max_score))
      similarity[s][:,j] = 0
  return matched

In [None]:
len(match(similarity_spatial)[0])

0

In [None]:
match(similarity_temporal_spatial)[0]

[]