Skip to content
TensorFlow Implementation of "Learnable Pooling Methods for Video Classification"
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
paper Update README Oct 1, 2018
LICENSE Update Oct 8, 2018 code cleaning Sep 20, 2018 code cleaning Sep 20, 2018 updated yaml runtime version to 1.8 Jun 12, 2018 Fixed minor bug in script Jun 17, 2018 Added path hack modules for import issues Jun 12, 2018 Added path hack modules for import issues Jun 12, 2018 Add Uniform Sampling Jul 30, 2018 Changed copyright name to Deep Topology Jun 25, 2018 Removed print statements Jul 16, 2018 Added "TransformerEncoderMod" to improve "TransformerEncoder", and al… Oct 1, 2018 Renamed and moved models. Oct 1, 2018

Learnable Pooling Methods for Video Classification

The repository is based on the starter code provided by Google AI. It contains a code for training and evaluating models for YouTube-8M dataset. The detailed table of contents and descriptions can be found at original repository.

The repository contains models from team "Deep Topology". Our approach was accepted in ECCV - The 2nd Workshop on YouTube-8M Large-Scale Video Understanding. The presentation is accessible in ECCV Workshop page.

Presentation: TBA
Paper: Link, Arxiv


In, prototype 1, 2 and 3 refer to sections 3.1, 3.2 and 3.2 in the paper. The detailed instructions instructions to train and evaluate the model can be found at YT8M repository. The following is the example training command to reproduce the result.

Prototype 1 (Attention Enhanced NetVLAD)

python --train_data_pattern="<path to train .tfrecord>" --model=NetVladV1 --train_dir="<path for model checkpoints>" --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=80 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=512 --iterations=256 --learning_rate_decay=0.85

Prototype 2 (NetVLAD with Attention Based Cluster Similarities)

python --train_data_pattern="<path to train .tfrecord>" --model=NetVladV2 --train_dir="<path for model checkpoints>" --frame_features=True --feature_names="rgb,audio" --feature_sizes="1024,128" --batch_size=80 --base_learning_rate=0.0002 --netvlad_cluster_size=256 --netvlad_hidden_size=512 --iterations=256 --learning_rate_decay=0.85

Prototype 3 (Regularized Function Approximation Approach)



  • 1.00 (31 August 2018)
    • Initial public release
  • 2.00 (30 September 2018)
    • Code cleaning
    • Model usage


If you find our apporaches useful, please cite our paper.

  title={Learnable Pooling Methods for Video Classification},
  author={Kmiec, Sebastian and Bae, Juhan and An, Ruijian},
  journal={arXiv preprint arXiv:1810.00530},

Contributors (Alphabetical Order)

You can’t perform that action at this time.