# Lab CudaVision SS23 - MA-INF 4308 - Overview Notebook

*   Jon Breid


Directory Structure:

```
├── overview_notebook_data
│   ├── models
│   ├── tboard_logs
│   │   ├── angle_RMix
│   │   ├── angle_STMix
│   │   ├── euklid_RMix
│   │   └──  euklid_STMix
│   └──  Visualization
├── utils
│   ├── dataset_utils.py
│   ├── eval.py
│   ├── h3m_utils.py
│   ├── train.py
│   ├── utils.py
│   └──  visualize.py
├── model_architecutre.py
└──  README.md
```


When downloading notebooks from google colab, animated visuals are not included, therefore mp4 files of visualizations can be downloaded from https://drive.google.com/drive/folders/1DkQOK1ne8Mg1eYfegKg-L_efz02Su77H?usp=drive_link
Model files are also in this drive directory.

## Importing Required Modules

In [27]:
import torch
from torch.utils.data import DataLoader
import numpy as np


from utils.dataset_utils import Human36, AISData
from model_architecture import RectMotionMixerCNN, MotionMixerCNN
from utils.utils import model_params, smooth, save_model, load_model_stats, load_model, set_random_seed, mpjpe_error
from utils.train import Trainer
from utils.eval import eval_model
from utils.visualize import viz_figure, vis_one_prediction

## Train and validation Datasets and Dataloader

### Human3.6m Dataset

The data of the Human3.6m Dataset can be loaded in exponential map format or as three dimensional euklidean coordinates. The number of seed frames is always 10. The number of prediction frames 10 (or 25 in the test dataset for testing).

The initialization of the dataset can take a few minutes.


**The whole dataset is loaded into the RAM for faster training.**

##### Loading Data in angular representation

In [8]:
path = '/content/drive/MyDrive/FinalProject/h3.6m' # this needs to be the path to the 'h3.6m' directory (in the original form it was downloaded)
prediction_len = 10

In [9]:
valid_dataset_a = Human36(path, d_mode = 'valid', data_format = 'exp_map', seed_length = 10, prediction_length = 10)
train_dataset_a = Human36(path, d_mode = 'train', data_format = 'exp_map', seed_length = 10, prediction_length = 10)
test_dataset_a = Human36(path, d_mode = 'test', data_format = 'exp_map', seed_length = 10, prediction_length = prediction_len)

print(f'The train dataset consists of {train_dataset_a.__len__()} items, the validation dataset of {valid_dataset_a.__len__()} items and the test_dataset of {test_dataset_a.__len__()}.')

The train dataset consists of 9183 items, the validation dataset of 2466 items and the test_dataset of 1443.


In [10]:
from torch.utils.data import DataLoader

train_loader_a = DataLoader(train_dataset_a, batch_size=64, shuffle=True)
valid_loader_a = DataLoader(valid_dataset_a, batch_size=64, shuffle=False)
test_loader_a = DataLoader(test_dataset_a, batch_size=64, shuffle=False, num_workers=0)

###### Loading Data in in three dimensional euklidean space

In [11]:
path = '/content/drive/MyDrive/FinalProject/h3.6m'
prediction_len = 10

In [12]:
valid_dataset_e = Human36(path, d_mode = 'valid', data_format = 'coord_3d', seed_length = 10, prediction_length = 10)
train_dataset_e = Human36(path, d_mode = 'train', data_format = 'coord_3d', seed_length = 10, prediction_length = 10)
test_dataset_e = Human36(path, d_mode = 'test', data_format = 'coord_3d', seed_length = 10, prediction_length = prediction_len)

  self.data_list[i] = torch.tensor(xyz).float()


In [13]:
train_loader_e = DataLoader(train_dataset_e, batch_size=64, shuffle=True)
valid_loader_e = DataLoader(valid_dataset_e, batch_size=64, shuffle=False)
test_loader_e = DataLoader(test_dataset_e, batch_size=64, shuffle=False, num_workers=0)

##### Some visualizations for the Human3.6m Dataset.

In the following part a random item from the test dataset is visualized.

The output for the angular data and the euklidean data is identical.

 The video of the sequence is saved in 'human36m_viz_euklid.mp4' and 'human36m_viz_angle.mp4'.


 (mp4 files are attached in an extra directory and not in this notebook as google colab does not save them in notebooks. )

In [None]:
test_item = test_dataset_e.__getitem__(129) # get a random item from one of the datasets

save_path = 'human36m_viz_euklid.mp4'

anim = viz_figure(test_item[0], test_item[1], pred_frames = None, save_path= save_path, angles = False)

In [None]:
test_item = test_dataset_a.__getitem__(129) # get a random item from one of the datasets

save_path = 'human36m_viz_angle.mp4'

anim = viz_figure(test_item[0], test_item[1], pred_frames = None, save_path= save_path, angles = True)

#### Some example images from the Human3.6m dataset.




### AIS Data

The sequences for extra validation are always loaded as three dimensional euklidean coordinates.

In [16]:
prediction_len = 10

In [17]:
ais_data_path = '/content/drive/MyDrive/FinalProject/VisionLabSS23_3DPoses' # needs to be the directory of the AIS data
AIS_test_data = AISData(data_path = ais_data_path, seed_length = 10, prediction_length = prediction_len)
AIS_test_loader = DataLoader(AIS_test_data, batch_size=64, shuffle=False, num_workers=0)

print(f'The AIS test dataset consists of {AIS_test_data.__len__()} items.')

The AIS test dataset consists of 836 items.


#### Some visualizations for the extra data.

The visualization is saved in 'AIS_viz.mp4'

In [None]:
ais_test_item = AIS_test_data.__getitem__(145) # get a random item from one of the datasets

save_path = 'AIS_viz.mp4'

anim = viz_figure(ais_test_item[0], ais_test_item[1], pred_frames = None, save_path= save_path, angles = False)

## Initialize Models and Training

In the following the best model for angular and euklidean data representation are initialized, for each a *Convolutional STMixer* and a *Convolutional Rectangular Mixer*.

### Convolutional STMixer

#### Angular data

In [19]:
angle_STMix = MotionMixerCNN(input_size_spatial = 51,
               input_size_temporal = 10,
               embedding_size = 128,
               n_block = 5,
               n_output = 10,
               n_temp_layers = 3,
               n_spatial_layers = 3, # pos 3
               temp_kernel_size = 9, # 4
               spatial_kernel_size = 51, # 5
               hidden_size = 32) # 6

print(f"The model has {model_params(angle_STMix)} parameter.")

The model has 342945 parameter.


#### Euklidean data format

In [20]:
euklid_STMix = MotionMixerCNN(input_size_spatial = 51,
               input_size_temporal = 10, # not in config_string
               embedding_size = 128, # pos 0
               n_block = 5, # pos 1
               n_output = 10, # not important
               n_temp_layers = 3, # pos 2
               n_spatial_layers = 3, # pos 3
               temp_kernel_size = 9, # 4
               spatial_kernel_size = 51, # 5
               hidden_size = 32) # 6

print(f"The model has {model_params(euklid_STMix)} parameter.")

The model has 342945 parameter.


### Convolutional RectangularMixer

#### Angular data

In [51]:
angle_RMix = RectMotionMixerCNN(input_size_spatial = 51,
                                input_size_temporal = 10,
                                embedding_size = 128,
                                n_block = 5,
                                n_output = 10,
                                n_layers = 3,
                                kernel_size = (9,15),
                                hidden_size = 32)

print(f"The model has {model_params(angle_RMix)} parameter.")

The model has 749340 parameter.


#### Euklidean data format

In [22]:
euklid_RMix= RectMotionMixerCNN(input_size_spatial = 51,
                                input_size_temporal = 10,
                                embedding_size = 128,
                                n_block = 5,
                                n_output = 10,
                                n_layers = 3,
                                kernel_size = (9,9),
                                hidden_size = 32)

print(f"The model has {model_params(euklid_RMix)} parameter.")

The model has 455580 parameter.


## Train Models

##### First load tensorboard writer and create a directory for the log files.
Without a writer the trainer does not work.



In [23]:
config = 'angle_STMix' # or 'angle_RMix' or 'euklid_STMix' or 'euklid_RMix'

In [24]:
import os
import shutil
from torch.utils.tensorboard import SummaryWriter

data_path  = "overview_notebook_data/tboard_logs"

TBOARD_LOGS = os.path.join(os.getcwd(), data_path, config)
if not os.path.exists(TBOARD_LOGS):
    os.makedirs(TBOARD_LOGS)

shutil.rmtree(TBOARD_LOGS)
writer = SummaryWriter(TBOARD_LOGS)

### Train with 3D Euclidean Data

Execute only one of the two cells below, depending on which model you want to train.

In [31]:
model_e = euklid_RMix

In [25]:
model_e = euklid_STMix

In [32]:
SAVE_PATH = '/content/overview_notebook_data/models/' + config

if not os.path.exists(SAVE_PATH):
  os.makedirs(SAVE_PATH)

trainer = Trainer(model = model_e,
                    NUM_EPOCHS  = 61,
                    train_loader = train_loader_e,
                    valid_loader = valid_loader_e,
                    EVAL_FREQ = 5,
                    SAVE_FREQ = 10,
                    criterion = mpjpe_error, # or torch.nn.MSELoss() for angular data
                    savepath = SAVE_PATH,
                    writer = writer,
                    lr = 1e-2,
                    gamma = 0.1,
                    step_size = 10,
                    loadpath = None,
                    input_format = 'coord_3d')

In [None]:
stats, model =  trainer.train() # start the training.

### Train with Exponental Map Data

Execute only one of the two cells below, depending on which model you want to train.

In [34]:
model_a = angle_RMix

In [38]:
model_a = angle_STMix

In [39]:
SAVE_PATH = 'overview_notebook_data/models/' + config

if not os.path.exists(SAVE_PATH):
  os.makedirs(SAVE_PATH)

trainer = Trainer(model = model_a,
                    NUM_EPOCHS  = 61,
                    train_loader = train_loader_a,
                    valid_loader = valid_loader_a,
                    EVAL_FREQ = 5,
                    SAVE_FREQ = 10,
                    criterion = torch.nn.MSELoss(), # or torch.nn.MSELoss() for angular data
                    savepath = SAVE_PATH,
                    writer = writer,
                    lr = 1e-2,
                    gamma = 0.1,
                    step_size = 10,
                    loadpath = None)

In [None]:
stats, model = trainer.train()

### Train stats

Training stats are recorded with Tensorboard and can be viewd with the commands.

%load_ext tensorboard #(is only needed for google colab.)

%tensorboard --logdir PATH_TO_LOG/

In [None]:
%load_ext tensorboard
%tensorboard --logdir overview_notebook_data/tboard_logs # this has to be the directory containing the log files

All loss curves and metric curves on the validation dataset look as one would expect, except the curve when training the model using CRMixer layers, where it does not look like the loss converges.

### Load Model Weights

Load the weights for the best models for each variant.

In [42]:
device = device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Load the best model using a *Convolutional STMixer* trained with exponential map data format.

In [49]:
path_to_models = 'overview_notebook_data/models/' # this needs to be whatever the path to the model files is (same in the cells below)

load_path = path_to_models + 'angle_STMix' + '/model.pth'

angle_STMix, stats = load_model_stats(angle_STMix, load_path)
angle_STMix.to(device)

MotionMixerCNN(
  (poseEmbedding): PoseEmbedding(
    (fc_W0): Conv1d(51, 128, kernel_size=(1,), stride=(1,))
  )
  (mixer): Sequential(
    (0): STMixerBlock(
      (spatial_mixer): SpatialMixCNN(
        (cnn_block): CNNMixer(
          (layers): Sequential(
            (0): Conv2d(1, 32, kernel_size=(1, 51), stride=(1, 1), padding=(0, 25))
            (1): GELU(approximate='none')
            (2): Dropout(p=0.1, inplace=False)
            (3): Conv2d(32, 32, kernel_size=(1, 51), stride=(1, 1), padding=(0, 25))
            (4): GELU(approximate='none')
            (5): Dropout(p=0.1, inplace=False)
            (6): Conv2d(32, 1, kernel_size=(1, 51), stride=(1, 1), padding=(0, 25))
            (7): GELU(approximate='none')
            (8): Dropout(p=0.1, inplace=False)
          )
        )
      )
      (SEblock): SEBlock(
        (squeeze): AdaptiveAvgPool1d(output_size=1)
        (excite): Sequential(
          (0): Linear(in_features=10, out_features=1, bias=False)
          (1): 

Load the best model using a *Convolutional Rectangular Mixer* trained with exponential map data.

In [53]:
path_to_models = 'overview_notebook_data/models/'

load_path = path_to_models + 'angle_RMix' + '/model.pth'
angle_RMix, stats = load_model_stats(angle_RMix, load_path)
angle_RMix.to(device)

RectMotionMixerCNN(
  (poseEmbedding): PoseEmbedding(
    (fc_W0): Conv1d(51, 128, kernel_size=(1,), stride=(1,))
  )
  (mixer): Sequential(
    (0): RectMixerCNN(
      (SEblock): SEBlock(
        (squeeze): AdaptiveAvgPool1d(output_size=1)
        (excite): Sequential(
          (0): Linear(in_features=10, out_features=1, bias=False)
          (1): ReLU()
          (2): Linear(in_features=1, out_features=10, bias=False)
          (3): Sigmoid()
        )
      )
      (cnn_block): CNNMixer(
        (layers): Sequential(
          (0): Conv2d(1, 32, kernel_size=(9, 15), stride=(1, 1), padding=(4, 7))
          (1): GELU(approximate='none')
          (2): Dropout(p=0.1, inplace=False)
          (3): Conv2d(32, 32, kernel_size=(9, 15), stride=(1, 1), padding=(4, 7))
          (4): GELU(approximate='none')
          (5): Dropout(p=0.1, inplace=False)
          (6): Conv2d(32, 1, kernel_size=(9, 15), stride=(1, 1), padding=(4, 7))
          (7): GELU(approximate='none')
          (8): Dro

Load the best model using a *Convolutional STMixer* with 3d euklidean data.

In [None]:
path_to_models = 'overview_notebook_data/models/'

load_path = path_to_models + 'euklid_STMix' + '/model.pth'
euklid_STMix, stats = load_model_stats(euklid_STMix, load_path)
euklid_STMix.to(device)

Load the best model using a *Convolutional Rectangular Mixer* with 3d euklidean data.

In [None]:
path_to_models = 'overview_notebook_data/models/'

load_path = path_to_models + 'euklid_RMix' + '/model.pth'

euklid_RMix, stats = load_model_stats(euklid_RMix, load_path)
euklid_RMix.to(device)

## Evaluate Models

### Evaluate results with 10 and 25 prediction Frames

Here the Datasets initialized at the top of the notebook can be used.

In [56]:
test_dataset_a_25 = Human36(path, d_mode = 'test', data_format = 'exp_map', seed_length = 10, prediction_length = 25)
test_loader_a_25 = DataLoader(test_dataset_a_25, batch_size=64, shuffle=False, num_workers=0)

test_dataset_e_25 = Human36(path, d_mode = 'test', data_format = 'coord_3d', seed_length = 10, prediction_length = 25)
test_loader_e_25 = DataLoader(test_dataset_e_25, batch_size=64, shuffle=False, num_workers=0)

AIS_test_data_25 = AISData(data_path = ais_data_path, seed_length = 10, prediction_length = 25)
AIS_test_loader_25 = DataLoader(AIS_test_data_25, batch_size=64, shuffle=False, num_workers=0)

  self.data_list[i] = torch.tensor(xyz).float()


#### Convolutional STMixer

###### Angular Data

In [57]:
_, geodesic, euler, pck_auc, position = eval_model(angle_STMix, test_loader_a, torch.nn.MSELoss(), device = device, input = 'angle', loss_only = False, output_len = 10)

print(f'The geodesic error is {round(geodesic, 3)}.')
print(f'The euler angle error is {round(euler, 3)}.')
print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')


The geodesic error is 0.11800000071525574.
The euler angle error is 0.6759999990463257.
The AUC PCA value is 0.637.
The MPJPE is 52.22999954223633.


In [58]:
_, geodesic, euler, pck_auc, position = eval_model(angle_STMix, test_loader_a_25, torch.nn.MSELoss(), device = device, input = 'angle', loss_only = False, output_len = 25)

print(f'The geodesic error is {round(geodesic, 3)}.')
print(f'The euler angle error is {round(euler, 3)}.')
print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The geodesic error is 0.17100000381469727.
The euler angle error is 0.9599999785423279.
The AUC PCA value is 0.55.
The MPJPE is 73.4000015258789.


###### 3d Euklidean Data

In [59]:
_, _, _, pck_auc, position = eval_model(euklid_STMix, test_loader_e, mpjpe_error, device = device, input = 'coord_3d', loss_only = False, output_len = 10)

print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The AUC PCA value is 0.826.
The MPJPE is 24.059999465942383.


In [60]:
_, _, _, pck_auc, position = eval_model(euklid_STMix, test_loader_e_25, mpjpe_error, device = device, input = 'coord_3d', loss_only = False, output_len = 25)

print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The AUC PCA value is 0.699.
The MPJPE is 49.09000015258789.


Test with the AIS Data

In [61]:
_, _, _, pck_auc, position = eval_model(euklid_STMix, AIS_test_loader, mpjpe_error, device = device, input = 'coord_3d', loss_only = False, output_len = 10)

print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The AUC PCA value is 0.482.
The MPJPE is 81.77999877929688.


In [62]:
_, _, _, pck_auc, position = eval_model(euklid_STMix, AIS_test_loader_25, mpjpe_error, device = device, input = 'coord_3d', loss_only = False, output_len = 25)

print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The AUC PCA value is 0.363.
The MPJPE is 124.11000061035156.


#### Convolutional Rectangular Mixer

###### Angular Data

In [63]:
_, geodesic, euler, pck_auc, position = eval_model(euklid_STMix, test_loader_a, torch.nn.MSELoss(), device = device, input = 'angle', loss_only = False, output_len = 10)

print(f'The geodesic error is {round(geodesic, 3)}.')
print(f'The euler angle error is {round(euler, 3)}.')
print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The geodesic error is 1.7910000085830688.
The euler angle error is 9.479999542236328.
The AUC PCA value is 0.189.
The MPJPE is 497.4599914550781.


In [64]:
_, geodesic, euler, pck_auc, position = eval_model(euklid_STMix, test_loader_a_25, torch.nn.MSELoss(), device = device, input = 'angle', loss_only = False, output_len = 25)

print(f'The geodesic error is {round(geodesic, 3)}.')
print(f'The euler angle error is {round(euler, 3)}.')
print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The geodesic error is 1.8009999990463257.
The euler angle error is 9.661999702453613.
The AUC PCA value is 0.191.
The MPJPE is 485.2900085449219.


###### 3d Euklidean Data

In [65]:
_, _, _, pck_auc, position = eval_model(euklid_RMix, test_loader_e, mpjpe_error, device = device, input = 'coord_3d', loss_only = False, output_len = 10)

print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The AUC PCA value is 0.816.
The MPJPE is 25.309999465942383.


In [66]:
_, _, _, pck_auc, position = eval_model(euklid_RMix, test_loader_e_25, mpjpe_error, device = device, input = 'coord_3d', loss_only = False, output_len = 25)

print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The AUC PCA value is 0.701.
The MPJPE is 48.27000045776367.


Test with the AIS Data

In [67]:
_, _, _, pck_auc, position = eval_model(euklid_RMix, AIS_test_loader, mpjpe_error, device = device, input = 'coord_3d', loss_only = False, output_len = 10)

print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The AUC PCA value is 0.506.
The MPJPE is 77.87000274658203.


In [68]:
_, _, _, pck_auc, position = eval_model(euklid_RMix, AIS_test_loader_25, mpjpe_error, device = device, input = 'coord_3d', loss_only = False, output_len = 25)

print(f'The AUC PCA value is {round(pck_auc, 3)}.')
print(f'The MPJPE is {round(position, 2)}.')

The AUC PCA value is 0.41.
The MPJPE is 111.41000366210938.


## Visualize some predictions

mp4 files of visualizations can be found under https://drive.google.com/drive/folders/1DkQOK1ne8Mg1eYfegKg-L_efz02Su77H?usp=drive_link in the directory *Visualization/Visualization of predictions*

### Visualize some results with the Human3.6m dataset.


#### With data in exponential map format

##### For 10 frames

In [None]:
vis_one_prediction(angle_STMix, test_dataset_a, output_len=10, index = 910, save_path = 'angle_STMix_10frames.mp4')
vis_one_prediction(angle_RMix, test_dataset_a, output_len=10, index = 910, save_path = 'angle_RMix_10frames.mp4')

For 25 Frames

In [None]:
vis_one_prediction(angle_STMix, test_dataset_a_25, output_len=25, index = 300, save_path = 'angle_STMix_25frames.mp4')
vis_one_prediction(angle_RMix, test_dataset_a_25, output_len=25, index = 300, save_path = 'angle_RMix_25frames.mp4')

#### With data in 3D Euclidean space

##### For 10 Frames

In [None]:
vis_one_prediction(euklid_STMix, test_dataset_e, output_len=10, index = 910, save_path = 'euklid_STMix_10frames.mp4', angles = False)
vis_one_prediction(euklid_RMix, test_dataset_e, output_len=10, index = 910, save_path = 'euklid_RMix_10frames.mp4', angles = False)

##### For 25 Frames

In [None]:
vis_one_prediction(euklid_STMix, test_dataset_e_25, output_len=25, index = 300, save_path = 'euklid_STMix_25frames.mp4', angles = False)
vis_one_prediction(euklid_RMix, test_dataset_e_25, output_len=25, index = 300, save_path = 'euklid_RMix_25frames.mp4', angles = False)

### Visualize some results with the AIS dataset.


##### For 10 Frames

In [None]:
vis_one_prediction(euklid_STMix, AIS_test_data, output_len=10, index = 100, save_path = 'AIS_STMix_10frames.mp4', angles = False)
vis_one_prediction(euklid_RMix, AIS_test_data, output_len=10, index = 100, save_path = 'AIS_RMix_10frames.mp4', angles = False)

##### For 25 Frames

In [None]:
vis_one_prediction(euklid_STMix, AIS_test_data_25, output_len=25, index = 100, save_path = 'AIS_STMix_25frames.mp4', angles = False)
vis_one_prediction(euklid_RMix, AIS_test_data_25, output_len=25, index = 100, save_path = 'AIS_RMix_25frames.mp4', angles = False)