Skip to content

mohammed-elkomy/two-stream-action-recognition

Repository files navigation

Action Recognition [no longer maintained]

In this repo we study the problem of action recognition(recognizing actions in videos) on UCF101 famous dataset.

Here, I reimplemented two-stream approach for action recognition using pre-trained Xception networks for both streams(Look at references).

Live demo on Colab

Just clone Live Demo Two-steam net.ipynb notebook to your drive and run the cells on Google Colab (Something like the demo gif will be generated in video format)

Get started:

A full demo of the code in the repo can be found in Action_Recognition_Walkthrough.ipynb notebook.

Please clone Action_Recognition_Walkthrough.ipynb notebook to your drive account and run it on Google Colab on python3 GPU-enabled instance.

Environment and requirements:

This code requires python 3.6,

Tensorflow 1.11.0 (GPU enabled-the code uses keras associated with Tensorflow)
Imgaug 0.2.6
opencv 3.4.2.17
numpy 1.14.1

All of these requirements are satisfied by (python3 Colab GPU-enabled instance) Just use it and the notebook Action_Recognition_Walkthrough.ipynb will install the rest :)

Dataset:

I used UCF101 dataset originally found here.

Also the dataset is processed and published by feichtenhofer/twostreamfusion)

  • RGB images(single zip file split into three parts)
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.001
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.002
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.003
  • Optical Flow u/v frames(single zip file split into three parts)
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_tvl1_flow.zip.001
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_tvl1_flow.zip.002
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_tvl1_flow.zip.003

Code Features:

  • You have variety of models to exchange between them easily.
  • Saves checkpoints on regular intervals and those checkpoints are synchronized to google drive using Drive API which means you can resume training anywhere for any Goggle Colab Instance.
  • Accesses the public models on my drive and you can resume and fine-tune them at different time stamps. Where the name of every checkpoint is as follows, EPOCH.BEST_TOP_1_ACC.CURRENT_TOP_1_ACC for example this which is 300-0.84298-0.84166.zip in folder heavy-mot-xception-adam-1e-05-imnet at this checkpoint,
    • epoch=300
    • best top 1 accuracy was 0.84298 (obtained in checkpoint before 300)
    • the current accuracy is 0.84166
    • in the experiment heavy-mot-xception-adam-1e-05-imnet

Models:

I used pre-trained models on imagenet provided by keras applications here.

The best results are obtained using Xception architecture.

Network Top1-Acc
Spatial VGG19 stream ~75%
Spatial Resnet50 stream 81.2%
Spatial Xception stream 86.04%
------------------------ -------
Motion Resnet50 stream ~75%
Motion xception stream 84.4%
------------------------ -------
Average fusion 91.25%
------------------------ -------
Recurrent network fusion 91.7%

Pre-trained Model

All the pre-trained models could be found here.

It's the same drive folder accessed by the code while training and resuming training from a checkpoint.

Reference Papers:

Nice implementations of two-stream approach:

Future directions:

Useful links: