Extracting several kinds of visual representations from videos.
The following frame-level (*_features
) and video-level (*_gloabal
) visual representations are supported:
- 2D-CNN (
cnn_features
,cnn_globals
,cnn_sem_globals
) - 3D-CNN (
c3d_features
,c3d_globals
,i3d_features
,i3d_globals
) - ECO (
eco_features
,eco_globals
,eco_sem_features
,eco_sem_globals
) - TSM (
tsm_features
,tsm_globals
,tsm_sem_features
,tsm_sem_globals
)
Note: *_sem_*
representations are based on the classification level (probability distribution) of respective models.
This package has been tested for extracting visual representations from videos of the following video-caption datasets:
- MSVD
- M-VAD
- MSR-VTT
- TRECVID-2020
- TRECVID-2020-Test
- TGIF
- VATEX
- ActivityNet
- ActivityNet-Test
- ActivityNet-Fragments