<div class="alert alert-info">

#### **LSTM Experiment**

In this notebook we are going to train a simple LSTM classifier on the climbing dataset.

</div>

In [1]:
%load_ext autoreload
%autoreload 2

<div class="alert alert-info">

#### **1- Preliminary**

We first do a preliminary work to prepare the datasets. In order to know more about this, please read the `experiments/preliminary.ipynb` notebook.

</div>

In [2]:
from experiments.helpers.preliminary import preliminary, FilteringMode, FilteringOperator

In [3]:
# FILTERING_MODE = FilteringMode(0)
# FILTERING_MODE = FilteringMode.NO_PERSONLESS
# FILTERING_MODE = FilteringMode.NO_PERSONLESS | FilteringMode.NO_NOTHING_CLASS
FILTERING_MODE = FilteringMode.NO_PERSONLESS | FilteringMode.NO_NOTHING_CLASS | FilteringMode.NO_STOPWATCH_CLASS
# FILTERING_MODE = FilteringMode.NO_PERSONLESS | FilteringMode.NO_STOPWATCH_CLASS | FilteringMode.NO_NOTHING_CLASS | FilteringMode.NO_MULTI_CLASS

In [4]:
datasets, filtered_datasets, extractors = preliminary(
    filtering_mode=FILTERING_MODE,
    filtering_operator=FilteringOperator.OR
)

  from .autonotebook import tqdm as notebook_tqdm
Using cache found in /Users/nadir/.cache/torch/hub/facebookresearch_pytorchvideo_main


[missing-keys]: <All keys matched successfully>


Using cache found in /Users/nadir/.cache/torch/hub/facebookresearch_pytorchvideo_main
Using cache found in /Users/nadir/.cache/torch/hub/facebookresearch_pytorchvideo_main
Using cache found in /Users/nadir/.cache/torch/hub/facebookresearch_pytorchvideo_main
Using cache found in /Users/nadir/.cache/torch/hub/facebookresearch_pytorchvideo_main
Using cache found in /Users/nadir/.cache/torch/hub/facebookresearch_pytorchvideo_main


[INFO]: frames for "climb_1-climber_MoubeAdrian-bloc_1-angle_face" already exist. skipping extraction.
[INFO]: frames for "climb_1-climber_MoubeAdrian-bloc_1-angle_profile" already exist. skipping extraction.
[INFO]: frames for "climb_10-climber_DouglasSophia-bloc_1-angle_face" already exist. skipping extraction.
[INFO]: frames for "climb_10-climber_DouglasSophia-bloc_1-angle_profile" already exist. skipping extraction.
[INFO]: frames for "climb_11-climber_MoubeAdrian-bloc_2-angle_face" already exist. skipping extraction.
[INFO]: frames for "climb_11-climber_MoubeAdrian-bloc_2-angle_profile" already exist. skipping extraction.
[INFO]: frames for "climb_12-climber_MrideEsteban-bloc_2-angle_face" already exist. skipping extraction.
[INFO]: frames for "climb_12-climber_MrideEsteban-bloc_2-angle_profile" already exist. skipping extraction.
[INFO]: frames for "climb_13-climber_FonneLana-bloc_2-angle_face" already exist. skipping extraction.
[INFO]: frames for "climb_13-climber_FonneLana-blo

In [5]:
initial_size = len(datasets[0])
filtered_size = len(filtered_datasets[0])

reduction_percentage = 100 * (initial_size - filtered_size) / initial_size

print(f"[filtering]: {reduction_percentage:.2f}%")

[filtering]: 21.43%


<div class="alert alert-info">

#### **2- Data Adaptation**

The dataset is structured in term of segments, we need to group and order the segments by videos in order to pass the whole video to an LSTM.

</div>

In [6]:
import torch

import numpy as np

from experiments.helpers.full_videos_features_dataset import FullVideosFeaturesDataset

In [7]:
def transform(sample):
    features, annotations, video_id = sample
    
    return torch.stack(features), torch.tensor(np.array(annotations)[0:, 0])

In [8]:
videos_datasets = [
    FullVideosFeaturesDataset(
        dataset=dataset,
        transform=transform,
        verbose=True    
    ) for dataset in filtered_datasets
]

100%|██████████| 3220/3220 [00:00<00:00, 14353.92it/s]
100%|██████████| 3220/3220 [00:00<00:00, 4854.00it/s]
100%|██████████| 3220/3220 [00:00<00:00, 5074.11it/s]
100%|██████████| 3220/3220 [00:00<00:00, 6198.88it/s]
100%|██████████| 3220/3220 [00:00<00:00, 6884.51it/s]
100%|██████████| 3220/3220 [00:00<00:00, 6918.43it/s]
100%|██████████| 3220/3220 [00:00<00:00, 7882.30it/s]
100%|██████████| 3220/3220 [00:00<00:00, 7388.47it/s]
100%|██████████| 3220/3220 [00:00<00:00, 6590.41it/s]
100%|██████████| 3220/3220 [00:00<00:00, 4918.06it/s]
100%|██████████| 3220/3220 [00:00<00:00, 7422.00it/s]


<div class="alert alert-info">

#### **3- Model Definition**

Now we define the globally temporal aware models and their training functions.

</div>

In [9]:
class GloballyTemporalAwareModel(torch.nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1, dropout=0.0):
        """
        Parameters:
        -----------
        input_size: The feature size for each time step.
        hidden_size: The number of hidden units in the LSTM.
        output_size: The number of classes.
        num_layers: The number of LSTM layers. Default is 1.
        dropout: The dropout probability. Default is 0.0.
        """
        super(GloballyTemporalAwareModel, self).__init__()
        
        self.lstm = torch.nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, dropout=dropout)
        self.fc = torch.nn.Linear(hidden_size, output_size)

    def forward(self, x):
        if x.dim() == 4:
            x = x.flatten(start_dim=2, end_dim=3)
            
        lstm_out, (final_hidden_states, final_cell_states) = self.lstm(x, None)
        
        output = self.fc(lstm_out)
        
        return output

In [10]:
clip = torch.rand(1, 16, 128, 128)

print(f"[clip.shape]: {clip.shape}")

torch.Size([1, 152, 8, 34])

if clip.dim() == 4:
    clip = clip.flatten(start_dim=2, end_dim=3)
    
print(f"[clip.shape]: {clip.shape}")

[clip.shape]: torch.Size([1, 16, 128, 128])
[clip.shape]: torch.Size([1, 16, 16384])


In [None]:
def train_model_one_epoch(model, train_loader, optimizer, criterion, device):
    model.train()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    for features, annotations in train_loader:
        features, annotations = features.to(device), annotations.to(device)

        optimizer.zero_grad()
        
        # Forward pass
        output = model(features)

        # Compute loss
        loss = criterion(output.view(-1, output.size(-1)), annotations.view(-1))
        loss.backward()
        
        optimizer.step()
        
        running_loss += loss.item()

        # Compute accuracy
        _, predicted = torch.max(output, -1)
        correct_predictions += (predicted.view(-1) == annotations.view(-1)).sum().item()
        total_predictions += annotations.numel()

    avg_loss = running_loss / len(train_loader)
    accuracy = correct_predictions / total_predictions * 100
    return avg_loss, accuracy

def validate_model(model, val_loader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0

    with torch.no_grad():
        for features, annotations in val_loader:
            features, annotations = features.to(device), annotations.to(device)

            # Forward pass
            output = model(features)
            
            # Compute loss
            loss = criterion(output.view(-1, output.size(-1)), annotations.view(-1))
            running_loss += loss.item()

            # Compute accuracy
            _, predicted = torch.max(output, -1)
            correct_predictions += (predicted.view(-1) == annotations.view(-1)).sum().item()
            total_predictions += annotations.numel()

    avg_loss = running_loss / len(val_loader)
    accuracy = correct_predictions / total_predictions * 100
    return avg_loss, accuracy

def train_model(model, training_loader, validation_loader):
    hidden_size = 128
    output_size = 5
    learning_rate = 0.001
    num_epochs = 32
    num_layers = 1
    dropout = 0.0

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    criterion = torch.nn.CrossEntropyLoss()
    
    return {
        "history": history,
        "best_training_accuracy": best_training_accuracy,
        "best_validation_accuracy": best_validation_accuracy,
        "best_epoch": best_epoch,
        "best_training_loss": best_training_loss,
        "best_validation_loss": best_validation_loss
    }

<div class="alert alert-info">

#### **4- Training**

We'll train the different models.

</div>

In [12]:
class WrapperDataset(torch.utils.data.Dataset):
    def __init__(self, dataset, transform=None):
        self.dataset = dataset
        self.transform = transform
        
    def __getitem__(self, index):
        if self.transform:
            return self.transform(self.dataset[index])
        else:
            return self.dataset[index]
        
    def __len__(self):
        return len(self.dataset)

In [13]:
def transform(sample):
    features, annotations = sample
    
    annotations = torch.nn.functional.one_hot(annotations, num_classes=5).float()
    
    return features, annotations

In [15]:
NUMBER_OF_FOLDS = 5
NUMBER_ANNOTATED_VIDEOS = 22

from utils import LabelEncoderFactory

from experiments.helpers.trainer import Trainer
from experiments.helpers.splits_generator import splits_generator
from experiments.helpers.videos_to_indices import videos_to_indices

hidden_size = 128
output_size = 5
num_layers = 1
dropout = 0.0

folds_histories: list[dict] = []

for fold_index, folds in enumerate(splits_generator(dataset_length=NUMBER_ANNOTATED_VIDEOS, k=NUMBER_OF_FOLDS)):
    histories = {}
    
    for dataset, extractor in zip(videos_datasets, extractors):
        training_videos_ids, validation_videos_ids = folds
    
        training_dataset = WrapperDataset(torch.utils.data.Subset(dataset, training_videos_ids), transform=transform)
        validation_dataset = WrapperDataset(torch.utils.data.Subset(dataset, validation_videos_ids), transform=transform)
        
        training_dataloader = torch.utils.data.DataLoader(training_dataset, batch_size=1, shuffle=True)
        validation_dataloader = torch.utils.data.DataLoader(validation_dataset, batch_size=1, shuffle=False)
    
        if dataset[0][0].dim() == 3:
            input_size = dataset[0][0].shape[1] * dataset[0][0].shape[2]
        else:
            input_size = dataset[0][0].shape[1]
        
        model = GloballyTemporalAwareModel(input_size, hidden_size, output_size, num_layers, dropout)    
            
        trainer = Trainer(model)
        
        statistics = trainer.train(training_dataloader, validation_dataloader, title=f"[training-{extractor.get_name()}-{fold_index + 1}/{NUMBER_OF_FOLDS}]")
        
        histories[extractor.get_name()] = statistics
        
    folds_histories.append(histories)

[training-yolo-1/5]: 100%|██████████| 32/32 [00:14<00:00,  2.20epoch/s, training-loss=127, training-accuracy=8.82, validation-loss=137, validation-accuracy=22.8, best-validation-accuracy=0, best-training-accuracy=13]     
[training-resnet-3d-1/5]: 100%|██████████| 32/32 [00:16<00:00,  1.91epoch/s, training-loss=125, training-accuracy=10.2, validation-loss=131, validation-accuracy=0, best-validation-accuracy=0, best-training-accuracy=25.3]     
[training-i3d-1/5]: 100%|██████████| 32/32 [00:16<00:00,  1.98epoch/s, training-loss=125, training-accuracy=18.1, validation-loss=134, validation-accuracy=23.8, best-validation-accuracy=141, best-training-accuracy=21.8] 
[training-clip-1/5]: 100%|██████████| 32/32 [00:18<00:00,  1.72epoch/s, training-loss=126, training-accuracy=18.6, validation-loss=136, validation-accuracy=17.8, best-validation-accuracy=14.2, best-training-accuracy=9.71]
[training-x3d_xs-1/5]: 100%|██████████| 32/32 [00:17<00:00,  1.79epoch/s, training-loss=124, training-accurac

KeyboardInterrupt: 

In [None]:
# hidden_size = 128
# output_size = 5
# learning_rate = 0.001
# num_epochs = 32
# num_layers = 1
# dropout = 0.0

# for extractor, dataset in zip(extractors, videos_datasets):
#     if dataset[0][0].dim() == 3:
#         input_size = dataset[0][0].shape[1] * dataset[0][0].shape[2]
#     else:
#         input_size = dataset[0][0].shape[1]
        
#     print(f"[{extractor.get_name()}]:")
#     print(f"[input_size]: {input_size}")
    
#     model = GloballyTemporalAwareModel(input_size, hidden_size, output_size, num_layers, dropout)
    
#     # --- --- ---
    
#     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#     optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
#     criterion = torch.nn.CrossEntropyLoss()

#     # --- --- ---

#     training_size = int(0.7 * len(dataset))
#     validation_size = len(dataset) - training_size

#     training_videos, validation_videos = torch.utils.data.random_split(dataset, [training_size, validation_size])

#     training_dataloader = torch.utils.data.DataLoader(training_videos, batch_size=1, shuffle=True)
#     validation_dataloader = torch.utils.data.DataLoader(validation_videos, batch_size=1, shuffle=False)
        
#     # --- --- ---
    
#     best_val_acc = 0.0
#     best_model_state = None
    
#     for epoch in range(num_epochs):
#         print(f"Epoch {epoch + 1}/{num_epochs}")
        
#         # Train the model
#         train_loss, train_acc = train_model_one_epoch(model, training_dataloader, optimizer, criterion, device)
#         print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_acc:.2f}%")

#         # Validate the model
#         val_loss, val_acc = validate_model(model, validation_dataloader, criterion, device)
#         print(f"Validation Loss: {val_loss:.4f}, Validation Accuracy: {val_acc:.2f}%")

#         # Track the best model
#         if val_acc > best_val_acc:
#             best_val_acc = val_acc
#             best_model_state = model.state_dict()

#     print(f"\nBest Validation Accuracy: {best_val_acc:.2f}%")

[yolo]:
[input_size]: 272
Epoch 1/32
Train Loss: 1.2702, Train Accuracy: 35.21%
Validation Loss: 1.0684, Validation Accuracy: 58.11%
Epoch 2/32
Train Loss: 1.0749, Train Accuracy: 45.46%
Validation Loss: 0.9967, Validation Accuracy: 52.44%
Epoch 3/32
Train Loss: 1.0437, Train Accuracy: 47.13%
Validation Loss: 0.9497, Validation Accuracy: 61.00%
Epoch 4/32
Train Loss: 0.9870, Train Accuracy: 54.22%
Validation Loss: 0.8873, Validation Accuracy: 59.50%
Epoch 5/32
Train Loss: 0.9400, Train Accuracy: 55.08%
Validation Loss: 0.9087, Validation Accuracy: 60.00%
Epoch 6/32
Train Loss: 0.9199, Train Accuracy: 55.94%
Validation Loss: 0.8623, Validation Accuracy: 63.58%
Epoch 7/32
Train Loss: 0.8628, Train Accuracy: 62.35%
Validation Loss: 0.8501, Validation Accuracy: 65.77%
Epoch 8/32
Train Loss: 0.8788, Train Accuracy: 57.83%
Validation Loss: 0.7844, Validation Accuracy: 67.26%
Epoch 9/32
Train Loss: 0.7928, Train Accuracy: 65.10%
Validation Loss: 0.8866, Validation Accuracy: 64.68%
Epoch 10/32

<div class="alert alert-info">

#### **5- Results**

Below we are going to display the training results for each model.

</div>

In [None]:
# Hyperparameters
input_size = 2048
hidden_size = 128
output_size = 5
learning_rate = 0.001
num_epochs = 32
num_layers = 4
dropout = 0.0

# Initialize model, optimizer, and loss function
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GloballyTemporalAwareModel(input_size, hidden_size, output_size, num_layers, dropout).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = torch.nn.CrossEntropyLoss()

# Training and validation loop with best model tracking
best_val_acc = 0.0
best_model_state = None

for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1}/{num_epochs}")
    
    # Train the model
    train_loss, train_acc = train_model(model, train_loader, optimizer, criterion, device)
    print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_acc:.2f}%")

    # Validate the model
    val_loss, val_acc = validate_model(model, val_loader, criterion, device)
    print(f"Validation Loss: {val_loss:.4f}, Validation Accuracy: {val_acc:.2f}%")

    # Track the best model
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        best_model_state = model.state_dict()

print(f"\nBest Validation Accuracy: {best_val_acc:.2f}%")

# Optionally save the best model
torch.save(best_model_state, "best_model.pth")

Epoch 1/32
Train Loss: 1.4450, Train Accuracy: 32.46%
Validation Loss: 1.4717, Validation Accuracy: 34.10%
Epoch 2/32
Train Loss: 1.4060, Train Accuracy: 34.87%
Validation Loss: 1.4194, Validation Accuracy: 34.10%
Epoch 3/32
Train Loss: 1.3418, Train Accuracy: 51.56%
Validation Loss: 1.3300, Validation Accuracy: 51.96%
Epoch 4/32
Train Loss: 1.1888, Train Accuracy: 57.46%
Validation Loss: 1.2516, Validation Accuracy: 51.19%
Epoch 5/32
Train Loss: 1.1000, Train Accuracy: 60.13%
Validation Loss: 1.1343, Validation Accuracy: 55.97%
Epoch 6/32
Train Loss: 1.0252, Train Accuracy: 62.43%
Validation Loss: 1.2163, Validation Accuracy: 53.27%
Epoch 7/32
Train Loss: 1.0032, Train Accuracy: 62.31%
Validation Loss: 1.0949, Validation Accuracy: 58.28%
Epoch 8/32
Train Loss: 0.9996, Train Accuracy: 61.50%
Validation Loss: 1.0743, Validation Accuracy: 58.97%
Epoch 9/32
Train Loss: 0.9073, Train Accuracy: 65.10%
Validation Loss: 1.0754, Validation Accuracy: 59.82%
Epoch 10/32
Train Loss: 0.8663, Train