Skip to content

Multiresolution CNN implementation for sports video classification on the Sports-1M dataset

Notifications You must be signed in to change notification settings

nirmal-25/Multi-Res-CNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multiresolution CNNs for Video Classification

Implemented Multiresolution CNN for video classification on the Sports-1M dataset based on the architecture given in [1]. The model uses two separate streams – ‘fovea’ and ‘context’ that are responsible for learning features from different scaled-down resolutions, and are concatenated later. This helps in avoiding losing important information while speeding up the training process.

Model architecture from [1]

  • Images are resized to 200x200
  • 170x170 crops are randomly sampled
  • Horizontal flipping = 0.5
  • Each pixel is mean subtracted
  • Optimization - mini-batches = 32, momentum = 0.9, weight decay = 0.0005, learning rate = 0.001
  • Local Response Normalization layers are replaced by Batch Normalization layers

The sports video dataset can be downloaded from this link

Achieved the highest validation accuracy of 65 % using this implementation which is comparable to the results obtained in [1]

The sample video outputs can be seen here

Reference

[1] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-Scale Video Classification with Convolutional Neural Networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 1725-1732, doi: 10.1109/CVPR.2014.223. Link

About

Multiresolution CNN implementation for sports video classification on the Sports-1M dataset

Topics

Resources

Stars

Watchers

Forks

Languages