Unified Action Recognition System

Presenting a highly adaptable unified Action Recognition System.

The goal is to enable users to easily customize the system to suit their applications.

This system can be fine-tuned to recognize as many actions as needed, by just providing a few videos for each action. The resulting fine-tuned model is capable of identifying any custom action, as well as real-time action recognition with webcam.

Demo

Fall Detection

Ball Kicking Analysis

Aircraft Marshaller Signals

Setup Environment for Demo

The code requires python>=3.8, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Clone the repository locally

git clone https://github.com/kaikaic1998/Unified_Action_Recognition_System.git
cd Unified_Action_Recognition_System

Installation

Step 1. Install libraries

pip install -r requirements.txt

Step 2. Install Cython_bbox

pip install -e git+https://github.com/samson-wang/cython_bbox.git#egg=cython-bbox

Step 3. Install lap

pip install lap

If above is not successful, try below:

git clone https://github.com/gatagat/lap.git
cd lap
python setup.py build
python setup.py install
cd ../

Pre-trained Model Checkoints

Download YOLOv7 Pose pre-trained models and put it in pretrained/ folder

Demo

A sample video is provided in video/fall.mp4 for demo.

Provided model is pretrained for Human Fall Detection.

python demo.py

Fine-tune

You can try fine-tuning the model to recognize your own custom action.

Data Preparation

Supported media format

image formats: bmp, jpg, jpeg, png, tif, tiff, dng, webp, mpo
video formats: mov, avi, mp4, mpg, mpeg, m4v, wmv, mkv

Put your videos in dataset/ following below folder structures:

├── dataset
│   ├── train
│   │   ├──  class1
│   │   │   ├──  video1.mp4
│   │   │   ├──  video2.mp4
│   │   │   .
│   │   │   .
│   │   │   .
│   │   ├──  other classes
│   │   .
│   │   .
│   │   .
│   ├── val
│   │   ├──  class1
│   │   │   ├──  video1.mp4
│   │   │   ├──  video2.mp4
│   │   │   .
│   │   │   .
│   │   │   .
│   │   ├──  other classes
│   │   .
│   │   .
│   │   .

Start fine-tuning

The newly fine-tuned model can automatically identfy the number of actions to be recognized base on the number of class folders provided.

python train.py --save-model True

More about the Unified Action Recognition System

System Backbone

Description	Model	Repo Link
Detection & Pose Estimation	YOLOv7 Pose	YOLOv7
Tracking	BoT-SORT	BoT-SORT
Skeleton Action Recognition	STGCN++	PYSKL

System Architecture

Input media is first input into the first layer where the system detect, track all human in a frame and produce a set of keypoints for each of them.

As long as each set is filled with 20 keypoints, then is fed into the Action Recognition layer to predict an action label for each set of keypoints, hence, one label for one tracked human. This oepration is realized by using the sliding window method, enabling real-time action recognition.

Experiment on Learning Rate

The most suitable learning rate is discovered and provided as default value for users to easily fine-tune the model without the concern of underperforming training process.

During the experiment, different learning rate values are trialed in training and the experiment is performed multiple times.

Below is one of the trials, showing multiple learning rate candidates. The learning rate of 0.01 is found to have the best outcome.

Citation

@article{aharon2022bot,
  title={BoT-SORT: Robust Associations Multi-Pedestrian Tracking},
  author={Aharon, Nir and Orfaig, Roy and Bobrovsky, Ben-Zion},
  journal={arXiv preprint arXiv:2206.14651},
  year={2022}
}

@article{wang2022yolov7,
  title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors},
  author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
  journal={arXiv preprint arXiv:2207.02696},
  year={2022}
}

@inproceedings{duan2022pyskl,
  title={Pyskl: Towards good practices for skeleton action recognition},
  author={Duan, Haodong and Wang, Jiaqi and Chen, Kai and Lin, Dahua},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={7351--7354},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
VideoCameraCorrection/VideoCameraCorrection		VideoCameraCorrection/VideoCameraCorrection
assets		assets
configs		configs
dataset		dataset
fast_reid		fast_reid
label_map		label_map
pretrained		pretrained
pyskl		pyskl
tracker		tracker
video		video
yolov7		yolov7
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
demo.py		demo.py
general_utils.py		general_utils.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Action Recognition System

Demo

Fall Detection

Ball Kicking Analysis

Aircraft Marshaller Signals

Setup Environment for Demo

Clone the repository locally

Installation

Pre-trained Model Checkoints

Demo

Fine-tune

More about the Unified Action Recognition System

System Backbone

System Architecture

Experiment on Learning Rate

Citation

About

Releases

Packages

Languages

License

kaikaic1998/Unified_Action_Recognition_System

Folders and files

Latest commit

History

Repository files navigation

Unified Action Recognition System

Demo

Fall Detection

Ball Kicking Analysis

Aircraft Marshaller Signals

Setup Environment for Demo

Clone the repository locally

Installation

Pre-trained Model Checkoints

Demo

Fine-tune

More about the Unified Action Recognition System

System Backbone

System Architecture

Experiment on Learning Rate

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages