<a href="https://colab.research.google.com/github/rahiakela/computer-vision-research-and-practice/blob/main/machine-learning-with-video-data/01_loading_video_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Loading video data

This tutorial demonstrates how to load and preprocess [AVI](https://en.wikipedia.org/wiki/Audio_Video_Interleave) video data using the [UCF101 human action dataset](https://www.tensorflow.org/datasets/catalog/ucf101). Once you have preprocessed the data, it can be used for such tasks as video classification/recognition, captioning or clustering. The original dataset contains realistic action videos collected from YouTube with 101 categories, including playing cello, brushing teeth, and applying eye makeup. You will learn how to:

* Load the data from a zip file.

* Read sequences of frames out of the video files.

* Visualize the video data.

* Wrap the frame-generator [`tf.data.Dataset`](https://www.tensorflow.org/guide/data).

This video loading and preprocessing tutorial is the first part in a series of TensorFlow video tutorials. Here are the other three tutorials:

- [Build a 3D CNN model for video classification](https://www.tensorflow.org/tutorials/video/video_classification): Note that this tutorial uses a (2+1)D CNN that decomposes the spatial and temporal aspects of 3D data; if you are using volumetric data such as an MRI scan, consider using a 3D CNN instead of a (2+1)D CNN.
- [MoViNet for streaming action recognition](https://www.tensorflow.org/hub/tutorials/movinet): Get familiar with the MoViNet models that are available on TF Hub.
- [Transfer learning for video classification with MoViNet](https://www.tensorflow.org/tutorials/video/transfer_learning_with_movinet): This tutorial explains how to use a pre-trained video classification model trained on a different dataset with the UCF-101 dataset.

## Setup

Begin by installing and importing some necessary libraries, including:
[remotezip](https://github.com/gtsystem/python-remotezip) to inspect the contents of a ZIP file, [tqdm](https://github.com/tqdm/tqdm) to use a progress bar, [OpenCV](https://opencv.org/) to process video files, and [`tensorflow_docs`](https://github.com/tensorflow/docs/tree/master/tools/tensorflow_docs) for embedding data in a Jupyter notebook.

In [None]:
# The way this tutorial uses the `TimeDistributed` layer requires TF>=2.10
!pip install -U "tensorflow>=2.10.0"

In [None]:
!pip install remotezip tqdm opencv-python
!pip install -q git+https://github.com/tensorflow/docs

In [3]:
import tqdm
import random
import pathlib
import itertools
import collections

import os
import cv2
import numpy as np
import remotezip as rz

import tensorflow as tf

# Some modules to display an animation using imageio.
import imageio
from IPython import display
from urllib import request
from tensorflow_docs.vis import embed

##UCF101 dataset

The [UCF101 dataset](https://www.tensorflow.org/datasets/catalog/ucf101) contains 101 categories of different actions in video, primarily used in action recognition. You will use a subset of these categories in this demo.

In [4]:
URL = "https://storage.googleapis.com/thumos14_files/UCF101_videos.zip"

The above URL contains a zip file with the UCF 101 dataset. 

Let's create a function that uses the `remotezip` library to examine the contents of the zip file in that URL:

In [5]:
def list_files_from_zip_url(zip_url):
  """
  List the files in each class of the dataset given a URL with the zip file.
  Args:
    zip_url: A URL from which the files can be extracted from.
  Returns:
    List of files in each of the classes.
  """
  files = []
  with rz.RemoteZip(zip_url) as zip:
    for zip_info in zip.infolist():
      files.append(zip_info.filename)
  return files

In [6]:
files = list_files_from_zip_url(URL)
files = [f for f in files if f.endswith(".avi")]
files[:10]

['UCF101/v_ApplyEyeMakeup_g01_c01.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c02.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c03.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c04.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c05.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c06.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c01.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c02.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c03.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c04.avi']

Begin with a few videos and a limited number of classes for training. After running the above code block, notice that the class name is included in the filename of each video.

Let's define the `get_class` function that retrieves the class name from a filename. Then, create a function called `get_files_per_class` which converts the list of all files (`files` above) into a dictionary listing the files for each class:

In [7]:
def get_class(fname):
  """Retrieve the name of the class given a filename"""
  return fname.split("_")[-3]

In [8]:
def get_files_per_class(files):
  """Retrieve the files that belong to each class."""
  files_for_class = collections.defaultdict(list)
  for fname in files:
    class_name = get_class(fname)
    files_for_class[class_name].append(fname)
  return files_for_class

Once you have the list of files per class, you can choose how many classes you would like to use and how many videos you would like per class in order to create your dataset. 

In [9]:
NUM_CLASSES = 10
FILES_PER_CLASS = 50

In [25]:
files_for_class = get_files_per_class(files)
classes = list(files_for_class.keys())

print(f"Num classes: {classes}")
print(f"Num videos for class[0]: {len(files_for_class[classes[0]])}")

Num classes: ['ApplyEyeMakeup', 'ApplyLipstick', 'Archery', 'BabyCrawling', 'BalanceBeam', 'BandMarching', 'BaseballPitch', 'BasketballDunk', 'Basketball', 'BenchPress', 'Biking', 'Billiards', 'BlowDryHair', 'BlowingCandles', 'BodyWeightSquats', 'Bowling', 'BoxingPunchingBag', 'BoxingSpeedBag', 'BreastStroke', 'BrushingTeeth', 'CleanAndJerk', 'CliffDiving', 'CricketBowling', 'CricketShot', 'CuttingInKitchen', 'Diving', 'Drumming', 'Fencing', 'FieldHockeyPenalty', 'FloorGymnastics', 'FrisbeeCatch', 'FrontCrawl', 'GolfSwing', 'Haircut', 'Hammering', 'HammerThrow', 'HandstandPushups', 'HandstandWalking', 'HeadMassage', 'HighJump', 'HorseRace', 'HorseRiding', 'HulaHoop', 'IceDancing', 'JavelinThrow', 'JugglingBalls', 'JumpingJack', 'JumpRope', 'Kayaking', 'Knitting', 'LongJump', 'Lunges', 'MilitaryParade', 'Mixing', 'MoppingFloor', 'Nunchucks', 'ParallelBars', 'PizzaTossing', 'PlayingCello', 'PlayingDaf', 'PlayingDhol', 'PlayingFlute', 'PlayingGuitar', 'PlayingPiano', 'PlayingSitar', 'Play

In [11]:
videos = list(files_for_class.values())
print(f"Few videos: {videos[0][:10]}")

Few videos: ['UCF101/v_ApplyEyeMakeup_g01_c01.avi', 'UCF101/v_ApplyEyeMakeup_g01_c02.avi', 'UCF101/v_ApplyEyeMakeup_g01_c03.avi', 'UCF101/v_ApplyEyeMakeup_g01_c04.avi', 'UCF101/v_ApplyEyeMakeup_g01_c05.avi', 'UCF101/v_ApplyEyeMakeup_g01_c06.avi', 'UCF101/v_ApplyEyeMakeup_g02_c01.avi', 'UCF101/v_ApplyEyeMakeup_g02_c02.avi', 'UCF101/v_ApplyEyeMakeup_g02_c03.avi', 'UCF101/v_ApplyEyeMakeup_g02_c04.avi']


Now, let's create a new function called `select_subset_of_classes` that selects a subset of the classes present within the dataset and a particular number of files per class:

In [12]:
def select_subset_of_classes(files_for_class, classes, files_per_class):
  """Create a dictionary with the class name and a subset of the files in that class."""
  files_subset = dict()
  for class_name in classes:
    class_files = files_for_class[class_name]
    files_subset[class_name] = class_files[:files_per_class]
  return files_subset

In [13]:
files_subset = select_subset_of_classes(files_for_class, classes[:NUM_CLASSES], FILES_PER_CLASS)
list(files_subset.keys())

['ApplyEyeMakeup',
 'ApplyLipstick',
 'Archery',
 'BabyCrawling',
 'BalanceBeam',
 'BandMarching',
 'BaseballPitch',
 'BasketballDunk',
 'Basketball',
 'BenchPress']

In [16]:
list(files_subset.values())[0][:10]

['UCF101/v_ApplyEyeMakeup_g01_c01.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c02.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c03.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c04.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c05.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c06.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c01.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c02.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c03.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c04.avi']

In [18]:
list(files_subset.values())[2][:10]

['UCF101/v_Archery_g01_c01.avi',
 'UCF101/v_Archery_g01_c02.avi',
 'UCF101/v_Archery_g01_c03.avi',
 'UCF101/v_Archery_g01_c04.avi',
 'UCF101/v_Archery_g01_c05.avi',
 'UCF101/v_Archery_g01_c06.avi',
 'UCF101/v_Archery_g01_c07.avi',
 'UCF101/v_Archery_g02_c01.avi',
 'UCF101/v_Archery_g02_c02.avi',
 'UCF101/v_Archery_g02_c03.avi']

Let's define helper functions that split the videos into training, validation, and test sets. The videos are downloaded from a URL with the zip file, and placed into their respective subdirectiories.

In [19]:
def download_from_zip(zip_url, to_dir, file_names):
  """Download the contents of the zip file from the zip URL."""
  with rz.RemoteZip(zip_url) as zip:
    for fn in tqdm.tqdm(file_names):
      class_name = get_class(fn)
      zip.extract(fn, str(to_dir / class_name))
      unzipped_file = to_dir / class_name / fn

      fn = pathlib.Path(fn).parts[-1]
      output_file = to_dir / class_name / fn
      unzipped_file.rename(output_file)

The following function returns the remaining data that hasn't already been placed into a subset of data. It allows you to place that remaining data in the next specified subset of data.

In [20]:
def split_class_lists(files_for_class, count):
  """
  Returns the list of files belonging to a subset of data as well as the remainder of files that need to be downloaded.
  """
  split_files = []
  remainder = {}
  for cls in files_for_class:
    split_files.extend(files_for_class[cls][:count])
    remainder[cls] = files_for_class[cls][:count]
  return split_files, remainder 

The following `download_ucf_101_subset` function allows you to download a subset of the UCF101 dataset and split it into the training, validation, and test sets. 

You can specify the number of classes that you would like to use. The `splits` argument allows you to pass in a dictionary in which the key values are the name of subset (example: "train") and the number of videos you would like to have per class.

In [21]:
def download_ucf_101_subset(zip_url, num_classes, splits, download_dir):
  """Download a subset of the UCF101 dataset and split them into various parts, such as training, validation, and test."""
  files = list_files_from_zip_url(zip_url)
  for f in files:
    path = os.path.normpath(f)
    tokens = path.split(os.sep)
    if len(tokens) <= 2:
      files.remove(f) # Remove that item from the list if it does not have a filename
    
  files_for_class = get_files_per_class(files)
  classes = list(files_for_class.keys())[:num_classes]
  for cls in classes:
    random.shuffle(files_for_class[cls])
  
  # Only use the number of classes you want in the dictionary
  files_for_class = {x: files_for_class[x] for x in classes}

  dirs = {}
  for split_name, split_count in splits.items():
    print(split_name, ":")
    split_dir = download_dir / split_name
    split_files, files_for_class = split_class_lists(files_for_class, split_count)
    download_from_zip(zip_url, split_dir, split_files)
    dirs[split_name] = split_dir
  return dirs

In [22]:
download_dir = pathlib.Path("./UCF101_subset/")
subset_paths = download_ucf_101_subset(URL,
                                       num_classes=NUM_CLASSES,
                                       splits={"train": 30, "val": 10, "test": 10},
                                       download_dir=download_dir)

train :


100%|██████████| 300/300 [00:35<00:00,  8.48it/s]


val :


100%|██████████| 100/100 [00:08<00:00, 11.42it/s]


test :


100%|██████████| 100/100 [00:07<00:00, 14.22it/s]


After downloading the data, you should now have a copy of a subset of the UCF101 dataset. 

Run the following code to print the total number of videos you have amongst all your subsets of data.

In [23]:
video_count_train = len(list(download_dir.glob("train/*/*.avi")))
video_count_val = len(list(download_dir.glob("val/*/*.avi")))
video_count_test = len(list(download_dir.glob("test/*/*.avi")))

video_total = video_count_train + video_count_val + video_count_test
print(f"Total videos: {video_total}")

Total videos: 500


You can also preview the directory of data files now.

In [24]:
!find ./UCF101_subset

./UCF101_subset
./UCF101_subset/train
./UCF101_subset/train/BasketballDunk
./UCF101_subset/train/BasketballDunk/UCF101
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g01_c04.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g07_c01.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g19_c01.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g24_c04.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g23_c05.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g05_c05.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g01_c06.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g18_c05.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g18_c03.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g15_c03.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g22_c01.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g13_c02.avi
./UCF101_subset/train/BasketballDunk/v_BasketballDunk_g25_c01.avi
./UCF101_subset/train/B

#### Create frames from each video file