### Benchmark without tensorrt

To benchmark with tensorrt, go to `perf_benchmark_trt.ipynb`.

Benchmark only after you have generated the benchmark pickle file (see the `Generate benchmark pickle file` section below)!

What is a benchmark pickle file?
- It contains a test set generator and a loaded model.
- We noticed that the preprocessing steps of VideoPose3D consumed a lot of memory, so Nano started to heavily rely on swap. 
- We think this may incur I/O bottleneck when inferencing and will produce inaccurate benchmark FPS. 
- We do this to cut down memory cost and hopefully reduce the I/O bottleneck during benchmark.
- We admit that this is a sketchy workaround, and we plan to investigate into the I/O bottleneck issue further later.

Remember to keep the GPU warm!
- We observed that the FPS tends to be skewed when you are running on CUDA for the first time after a kernel restart.
- Hence, remember to "warm up" your GPU by running something on CUDA after a kernel restart.
- This will give you a more accurate FPS benchmark result.

In [1]:
import os
import os.path as osp
if osp.basename(os.getcwd()) != 'VideoPose3D': os.chdir('../VideoPose3D')
import torch
import pickle
from common.loss import mpjpe
import time

In [None]:
with open('../notebooks/benchmark_arc_33333_ch_1024.pkl', 'rb') as f:
    test_generator, model_pos = pickle.load(f)

In [8]:
# per video benchmark
# keep warm!
test_videos = 3
model_pos.eval()
if torch.cuda.is_available(): model_pos = model_pos.cuda()
epoch_loss_3d_valid = 0
num_videos = 0
N = 0

tic = time.time()

for cam, batch, batch_2d in test_generator.next_epoch():
    inputs_3d = torch.from_numpy(batch.astype('float32'))
    inputs_2d = torch.from_numpy(batch_2d.astype('float32'))
    if torch.cuda.is_available():
        inputs_3d = inputs_3d.cuda()
        inputs_2d = inputs_2d.cuda()
    inputs_traj = inputs_3d[:, :, :1].clone()
    inputs_3d[:, :, 0] = 0
    predicted_3d_pos = model_pos(inputs_2d)
    loss_3d_pos = mpjpe(predicted_3d_pos, inputs_3d)
    epoch_loss_3d_valid += inputs_3d.shape[0] * inputs_3d.shape[1] * loss_3d_pos.item()
    N += inputs_3d.shape[0] * inputs_3d.shape[1]

    num_videos += 1
    if num_videos > test_videos: break
        
toc = time.time()

elapse = toc - tic
print('Total eval loss is:', 1000 * epoch_loss_3d_valid / N)
print('Total frames:', N)
print('Total elapse time:', elapse * 1e3, 'ms')
print('Avg frame time:', (elapse / N) * 1e3, 'ms')

torch.Size([1, 2598, 17, 2])
torch.Size([1, 2598, 17, 2])
torch.Size([1, 2598, 17, 2])
torch.Size([1, 2598, 17, 2])
Total eval loss is: 46.201025135815144
Total frames: 9424
Total elapse time: 2866.8224811553955 ms
Avg frame time: 0.30420442287302585 ms


In [9]:
# per receptive field benchmark
# keep warm!
test_videos = 3
receptive_field = 3*3*3*3*3
model_pos.eval()
if torch.cuda.is_available(): model_pos = model_pos.cuda()
epoch_loss_3d_valid = 0
num_videos = 0
N = 0

tic = time.time()

for cam, batch, batch_2d in test_generator.next_epoch():
    inputs_3d = torch.from_numpy(batch.astype('float32'))
    inputs_2d = torch.from_numpy(batch_2d.astype('float32'))
    num_frames = inputs_2d.shape[1]
    
    if torch.cuda.is_available():
        inputs_3d = inputs_3d.cuda()
        inputs_2d = inputs_2d.cuda()
    
    for i in range(receptive_field, num_frames):
        target_inputs_3d = inputs_3d[:,i-receptive_field:i-receptive_field+1,:,:]
        chunk_inputs_2d = inputs_2d[:,i-receptive_field:i,:,:]
        chunk_inputs_traj = target_inputs_3d[:, :, :1].clone()
        target_inputs_3d[:, :, 0] = 0
        predicted_3d_pos = model_pos(chunk_inputs_2d)
        #loss_3d_pos = mpjpe(predicted_3d_pos, target_inputs_3d)
        loss_3d_pos = 0
        #epoch_loss_3d_valid += target_inputs_3d.shape[0] * target_inputs_3d.shape[1] * loss_3d_pos.item()
        epoch_loss_3d_valid = 0
    
    N += inputs_3d.shape[0] * inputs_3d.shape[1]
    
    num_videos += 1
    if num_videos > test_videos: break
    
toc = time.time()

elapse = toc - tic
print('Total eval loss is:', 1000 * epoch_loss_3d_valid / N)
print('Total frames:', N)
print('Total elapse time:', elapse * 1e3, 'ms')
print('Avg frame time:', (elapse / N) * 1e3, 'ms')

Total eval loss is: 0.0
Total frames: 9424
Total elapse time: 410775.92253685 ms
Avg frame time: 43.58827700942805 ms


### Generate benchmark pickle file

I changed the VideoPose3D repo a bit so importing from run will prompt you to paste the command line argument to stdin, and the argparse module will take it from there. The pickle-file-to-dump will contain a test set generator and a loaded model, and it will be saved to PROJECT_ROOT/notebooks.

**Remember to do a kernel restart after generating the pickle file!**

In [7]:
import sys
import torch
import os
import os.path as osp
import pickle
if osp.basename(os.getcwd()) != 'VideoPose3D': os.chdir('../VideoPose3D')
from run import test_generator, model_pos
# use the below command line argument for test run

Command Line Arguments: -k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -ch 1024 -c checkpoint --evaluate arc_33333_ch_1024_epoch_80.bin --benchmark
Namespace(actions='*', architecture='3,3,3,3,3', batch_size=1024, benchmark=True, bone_length_term=True, by_subject=False, causal=False, channels=1024, checkpoint='checkpoint', checkpoint_frequency=10, data_augmentation=True, dataset='h36m', dense=False, disable_optimizations=False, downsample=1, dropout=0.25, epochs=60, evaluate='arc_33333_ch_1024_epoch_80.bin', export_training_curves=False, keypoints='cpn_ft_h36m_dbb', learning_rate=0.001, linear_projection=False, lr_decay=0.95, no_eval=False, no_proj=False, render=False, resume='', stride=1, subjects_test='S9,S11', subjects_train='S1,S5,S6,S7,S8', subjects_unlabeled='', subset=1, test_time_augmentation=True, viz_action=None, viz_bitrate=3000, viz_camera=0, viz_downsample=1, viz_export=None, viz_limit=-1, viz_no_ground_truth=False, viz_output=None, viz_size=5, viz_skip=0, viz_subject=None, viz_video=N

-k cpn_ft_h36m_dbb -arc 3,3,3,3,3 -ch 1024 -c checkpoint --evaluate arc_33333_ch_1024_epoch_80.bin --benchmark

In [8]:
with open('../notebooks/benchmark_arc_33333_ch_1024.pkl', 'wb') as f:
    pickle.dump((test_generator, model_pos), f)

In [None]:
# !!! do a kernel restart after generating the pickle file !!!