<a href="https://colab.research.google.com/github/rahiakela/computer-vision-research-and-practice/blob/main/machine-learning-with-video-data/01_loading_video_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Loading video data

This tutorial demonstrates how to load and preprocess [AVI](https://en.wikipedia.org/wiki/Audio_Video_Interleave) video data using the [UCF101 human action dataset](https://www.tensorflow.org/datasets/catalog/ucf101). Once you have preprocessed the data, it can be used for such tasks as video classification/recognition, captioning or clustering. The original dataset contains realistic action videos collected from YouTube with 101 categories, including playing cello, brushing teeth, and applying eye makeup. You will learn how to:

* Load the data from a zip file.

* Read sequences of frames out of the video files.

* Visualize the video data.

* Wrap the frame-generator [`tf.data.Dataset`](https://www.tensorflow.org/guide/data).

This video loading and preprocessing tutorial is the first part in a series of TensorFlow video tutorials. Here are the other three tutorials:

- [Build a 3D CNN model for video classification](https://www.tensorflow.org/tutorials/video/video_classification): Note that this tutorial uses a (2+1)D CNN that decomposes the spatial and temporal aspects of 3D data; if you are using volumetric data such as an MRI scan, consider using a 3D CNN instead of a (2+1)D CNN.
- [MoViNet for streaming action recognition](https://www.tensorflow.org/hub/tutorials/movinet): Get familiar with the MoViNet models that are available on TF Hub.
- [Transfer learning for video classification with MoViNet](https://www.tensorflow.org/tutorials/video/transfer_learning_with_movinet): This tutorial explains how to use a pre-trained video classification model trained on a different dataset with the UCF-101 dataset.

## Setup

Begin by installing and importing some necessary libraries, including:
[remotezip](https://github.com/gtsystem/python-remotezip) to inspect the contents of a ZIP file, [tqdm](https://github.com/tqdm/tqdm) to use a progress bar, [OpenCV](https://opencv.org/) to process video files, and [`tensorflow_docs`](https://github.com/tensorflow/docs/tree/master/tools/tensorflow_docs) for embedding data in a Jupyter notebook.

In [None]:
# The way this tutorial uses the `TimeDistributed` layer requires TF>=2.10
!pip install -U "tensorflow>=2.10.0"

In [None]:
!pip install remotezip tqdm opencv-python
!pip install -q git+https://github.com/tensorflow/docs

In [1]:
import tqdm
import random
import pathlib
import itertools
import collections

import os
import cv2
import numpy as np
import remotezip as rz

import tensorflow as tf

# Some modules to display an animation using imageio.
import imageio
from IPython import display
from urllib import request
from tensorflow_docs.vis import embed

##UCF101 dataset

The [UCF101 dataset](https://www.tensorflow.org/datasets/catalog/ucf101) contains 101 categories of different actions in video, primarily used in action recognition. You will use a subset of these categories in this demo.

In [2]:
URL = "https://storage.googleapis.com/thumos14_files/UCF101_videos.zip"

The above URL contains a zip file with the UCF 101 dataset. 

Let's create a function that uses the `remotezip` library to examine the contents of the zip file in that URL:

In [3]:
def list_files_from_zip_url(zip_url):
  """
  List the files in each class of the dataset given a URL with the zip file.
  Args:
    zip_url: A URL from which the files can be extracted from.
  Returns:
    List of files in each of the classes.
  """
  files = []
  with rz.RemoteZip(zip_url) as zip:
    for zip_info in zip.infolist():
      files.append(zip_info.filename)
  return files

In [4]:
files = list_files_from_zip_url(URL)
files = [f for f in files if f.endswith(".avi")]
files[:10]

['UCF101/v_ApplyEyeMakeup_g01_c01.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c02.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c03.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c04.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c05.avi',
 'UCF101/v_ApplyEyeMakeup_g01_c06.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c01.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c02.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c03.avi',
 'UCF101/v_ApplyEyeMakeup_g02_c04.avi']

Begin with a few videos and a limited number of classes for training. After running the above code block, notice that the class name is included in the filename of each video.

Let's define the `get_class` function that retrieves the class name from a filename. Then, create a function called `get_files_per_class` which converts the list of all files (`files` above) into a dictionary listing the files for each class:

In [5]:
def get_class(fname):
  """Retrieve the name of the class given a filename"""
  return fname.split("_")[-3]

In [6]:
def get_files_per_class(files):
  """Retrieve the files that belong to each class."""
  files_for_class = collections.defaultdict(list)
  for fname in files:
    class_name = get_class(fname)
    files_for_class[class_name].append(fname)
  return files_for_class

Once you have the list of files per class, you can choose how many classes you would like to use and how many videos you would like per class in order to create your dataset. 

In [7]:
NUM_CLASSES = 10
FILES_PER_CLASS = 50

In [10]:
files_for_class = get_files_per_class(files)
classes = list(files_for_class.keys())

print(f"Num classes: {classes}")
print(f"Num videos for class[0]: {len(files_for_class[classes[0]])}")
print(f"Few videos for classes: {files_for_class}")

Num classes: ['ApplyEyeMakeup', 'ApplyLipstick', 'Archery', 'BabyCrawling', 'BalanceBeam', 'BandMarching', 'BaseballPitch', 'BasketballDunk', 'Basketball', 'BenchPress', 'Biking', 'Billiards', 'BlowDryHair', 'BlowingCandles', 'BodyWeightSquats', 'Bowling', 'BoxingPunchingBag', 'BoxingSpeedBag', 'BreastStroke', 'BrushingTeeth', 'CleanAndJerk', 'CliffDiving', 'CricketBowling', 'CricketShot', 'CuttingInKitchen', 'Diving', 'Drumming', 'Fencing', 'FieldHockeyPenalty', 'FloorGymnastics', 'FrisbeeCatch', 'FrontCrawl', 'GolfSwing', 'Haircut', 'Hammering', 'HammerThrow', 'HandstandPushups', 'HandstandWalking', 'HeadMassage', 'HighJump', 'HorseRace', 'HorseRiding', 'HulaHoop', 'IceDancing', 'JavelinThrow', 'JugglingBalls', 'JumpingJack', 'JumpRope', 'Kayaking', 'Knitting', 'LongJump', 'Lunges', 'MilitaryParade', 'Mixing', 'MoppingFloor', 'Nunchucks', 'ParallelBars', 'PizzaTossing', 'PlayingCello', 'PlayingDaf', 'PlayingDhol', 'PlayingFlute', 'PlayingGuitar', 'PlayingPiano', 'PlayingSitar', 'Play

In [14]:
videos = list(files_for_class.values())
print(f"Few videos: {videos[:10]}")

Few videos for classes: [['UCF101/v_ApplyEyeMakeup_g01_c01.avi', 'UCF101/v_ApplyEyeMakeup_g01_c02.avi', 'UCF101/v_ApplyEyeMakeup_g01_c03.avi', 'UCF101/v_ApplyEyeMakeup_g01_c04.avi', 'UCF101/v_ApplyEyeMakeup_g01_c05.avi', 'UCF101/v_ApplyEyeMakeup_g01_c06.avi', 'UCF101/v_ApplyEyeMakeup_g02_c01.avi', 'UCF101/v_ApplyEyeMakeup_g02_c02.avi', 'UCF101/v_ApplyEyeMakeup_g02_c03.avi', 'UCF101/v_ApplyEyeMakeup_g02_c04.avi', 'UCF101/v_ApplyEyeMakeup_g03_c01.avi', 'UCF101/v_ApplyEyeMakeup_g03_c02.avi', 'UCF101/v_ApplyEyeMakeup_g03_c03.avi', 'UCF101/v_ApplyEyeMakeup_g03_c04.avi', 'UCF101/v_ApplyEyeMakeup_g03_c05.avi', 'UCF101/v_ApplyEyeMakeup_g03_c06.avi', 'UCF101/v_ApplyEyeMakeup_g04_c01.avi', 'UCF101/v_ApplyEyeMakeup_g04_c02.avi', 'UCF101/v_ApplyEyeMakeup_g04_c03.avi', 'UCF101/v_ApplyEyeMakeup_g04_c04.avi', 'UCF101/v_ApplyEyeMakeup_g04_c05.avi', 'UCF101/v_ApplyEyeMakeup_g04_c06.avi', 'UCF101/v_ApplyEyeMakeup_g04_c07.avi', 'UCF101/v_ApplyEyeMakeup_g05_c01.avi', 'UCF101/v_ApplyEyeMakeup_g05_c02.avi',

Now, let's create a new function called `select_subset_of_classes` that selects a subset of the classes present within the dataset and a particular number of files per class: