# I3D (Inflated 3D ConvNet) for Video Classification

This notebook demonstrates how to use a pre-trained I3D model from TensorFlow Hub for video action recognition. I3D models are a type of 3D Convolutional Neural Network that are 'inflated' from 2D image classification architectures (like Inception-v1), making them effective for learning spatio-temporal features from video data.

We will:
1.  Load a pre-trained I3D model (trained on the Kinetics-400 dataset).
2.  Download a sample video.
3.  Preprocess the video to fit the model's input requirements.
4.  Run inference to classify the action in the video.
5.  Display the top predictions.

## 1. Setup

First, let's install the necessary libraries. If you are running this in a Google Colab environment, this step is straightforward. If you are running locally, make sure you have a compatible Python environment.

In [None]:
# Install necessary packages. Uncomment the line below if you don't have them installed.
!pip install -q tensorflow tensorflow-hub opencv-python-headless imageio matplotlib

Now, let's import all the required modules.

In [None]:
import os
import cv2
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
from urllib import request

from IPython.display import display, Image

## 2. Load Model and Labels

We'll load the I3D model pre-trained on the Kinetics-400 dataset from TensorFlow Hub. We also need the corresponding class labels to interpret the model's output.

In [None]:
# URL for the I3D model on TensorFlow Hub
I3D_MODEL_URL = "https://tfhub.dev/deepmind/i3d-kinetics-400/1"

# URL for the Kinetics-400 labels file
KINETICS_400_LABELS_URL = 'https://raw.githubusercontent.com/deepmind/kinetics-i3d/master/data/label_map.txt'
KINETICS_400_LABELS_PATH = 'kinetics_400_labels.txt'

# Download the labels file
if not os.path.exists(KINETICS_400_LABELS_PATH):
    print(f"Downloading {KINETICS_400_LABELS_URL} to {KINETICS_400_LABELS_PATH}")
    request.urlretrieve(KINETICS_400_LABELS_URL, KINETICS_400_LABELS_PATH)

# Load the labels
with open(KINETICS_400_LABELS_PATH) as f:
    kinetics_400_labels = [line.strip() for line in f.readlines()]

print(f"Loaded {len(kinetics_400_labels)} labels from Kinetics-400 dataset.")
print("Example labels:", kinetics_400_labels[:5])

# Load the I3D model
print("\nLoading I3D model from TensorFlow Hub...")
i3d = hub.load(I3D_MODEL_URL)
print("Model loaded.")

## 3. Preprocessing Helper Function

The I3D model expects a video tensor of a specific shape and format. The input tensor should be `[batch_size, num_frames, height, width, 3]`, where the pixel values are normalized to be between 0 and 1. We'll create a helper function to load a video from a path (or URL) and preprocess it accordingly.

In [None]:
def load_video(path, max_frames=64, resize=(224, 224)):
    """Loads a video from a path, normalizes it, and returns a tensor."""
    cap = cv2.VideoCapture(path)
    frames = []
    try:
        while True:
            ret, frame = cap.read()
            if not ret:
                break
            # OpenCV reads in BGR, convert to RGB
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            # Resize frame
            frame = cv2.resize(frame, resize)
            # Normalize to [0, 1]
            frame = frame / 255.0
            frames.append(frame)
            
            if len(frames) == max_frames:
                break
    finally:
        cap.release()
    
    frames = np.array(frames)
    if len(frames) < 1:
        return np.array([])
        
    # Add a batch dimension
    video_tensor = np.expand_dims(frames, axis=0)
    return video_tensor

## 4. Download Sample Video and Run Inference

Let's test our setup with a sample video. We'll download a GIF of someone playing guitar, which corresponds to the 'playing guitar' class in the Kinetics-400 dataset.

In [None]:
# A sample video URL (e.g., a person playing guitar)
SAMPLE_VIDEO_URL = "https://upload.wikimedia.org/wikipedia/commons/8/86/End_of_a_jam.gif"
SAMPLE_VIDEO_PATH = "sample.gif"

# Download the sample video
if not os.path.exists(SAMPLE_VIDEO_PATH):
    print(f"Downloading {SAMPLE_VIDEO_URL} to {SAMPLE_VIDEO_PATH}")
    request.urlretrieve(SAMPLE_VIDEO_URL, SAMPLE_VIDEO_PATH)

# Display the downloaded video
print("\nSample video:")
display(Image(SAMPLE_VIDEO_PATH, width=300))

# Load and preprocess the video
print("\nPreprocessing video...")
video_tensor = load_video(SAMPLE_VIDEO_PATH)
print(f"Video tensor shape: {video_tensor.shape}")

# Ensure the video tensor is not empty
if video_tensor.size == 0:
    print("Could not load video. Please check the path or URL.")
else:
    # The model expects a float32 tensor
    video_tensor = tf.convert_to_tensor(video_tensor, dtype=tf.float32)

    # Get the model's signature
    model_signature = i3d.signatures['default']

    # Run inference
    print("\nRunning inference...")
    logits = model_signature(video_tensor)['default']
    probabilities = tf.nn.softmax(logits)

    # Get the top 5 predictions
    top_k = 5
    top_predictions = tf.math.top_k(probabilities, k=top_k)
    top_indices = top_predictions.indices.numpy().flatten()
    top_probs = top_predictions.values.numpy().flatten()

    print(f"\nTop {top_k} predictions:")
    for i in range(top_k):
        label = kinetics_400_labels[top_indices[i]]
        prob = top_probs[i]
        print(f"{i+1}. {label}: {prob:.2%}")

## 5. Try Another Example

Let's try another video, this time of someone doing a front flip. This corresponds to the 'front flip' class.

In [None]:
# Another sample video URL (e.g., a person doing a front flip)
SAMPLE_VIDEO_URL_2 = "https://upload.wikimedia.org/wikipedia/commons/4/42/Front_flip.gif"
SAMPLE_VIDEO_PATH_2 = "sample_2.gif"

# Download the sample video
if not os.path.exists(SAMPLE_VIDEO_PATH_2):
    print(f"Downloading {SAMPLE_VIDEO_URL_2} to {SAMPLE_VIDEO_PATH_2}")
    request.urlretrieve(SAMPLE_VIDEO_URL_2, SAMPLE_VIDEO_PATH_2)

# Display the downloaded video
print("\nSample video 2:")
display(Image(SAMPLE_VIDEO_PATH_2, width=300))

# Load and preprocess the video
print("\nPreprocessing video...")
video_tensor_2 = load_video(SAMPLE_VIDEO_PATH_2)
print(f"Video tensor shape: {video_tensor_2.shape}")

if video_tensor_2.size == 0:
    print("Could not load video. Please check the path or URL.")
else:
    video_tensor_2 = tf.convert_to_tensor(video_tensor_2, dtype=tf.float32)

    # Run inference
    print("\nRunning inference...")
    logits_2 = model_signature(video_tensor_2)['default']
    probabilities_2 = tf.nn.softmax(logits_2)

    # Get the top 5 predictions
    top_k = 5
    top_predictions_2 = tf.math.top_k(probabilities_2, k=top_k)
    top_indices_2 = top_predictions_2.indices.numpy().flatten()
    top_probs_2 = top_predictions_2.values.numpy().flatten()

    print(f"\nTop {top_k} predictions:")
    for i in range(top_k):
        label = kinetics_400_labels[top_indices_2[i]]
        prob = top_probs_2[i]
        print(f"{i+1}. {label}: {prob:.2%}")