PyTorch implementation, codes and pretrained models of the paper:
IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition
Gibran Benitez-Garcia, Jesus Olivares-Mercado, Gabriel Sanchez-Perez, and Keiji Yanai
Accepted at ICPR 2020
This paper proposes the IPN Hand dataset, a new benchmark video dataset with sufficient size, variation, and real-world elements able to train and evaluate deep neural networks for continuous Hand Gesture Recognition (HGR). With our dataset, the performance of three 3D-CNN models is evaluated on the tasks of isolated and continuous real-time HGR. Since IPN hand contains RGB videos only, we analyze the possibility of increasing the recognition accuracy by adding multiple modalities derived from RGB frames, i.e., optical flow and semantic segmentation, while keeping the real-time performance.
The subjects from the dataset were asked to record gestures using their own PC keeping the defined resolution and frame rate. Thus, only RGB videos were captured, and the distance between the camera and each subject varies. All videos were recorded in the resolution of 640x480 at 30 fps.
Each subject continuously performed 21 gestures with three random breaks in a single video. We defined 13 gestures to control the pointer and actions focused on the interaction with touchless screens.
Description and statics of each gesture are shown in the next table. Duration is measured in the number of frames (30 frames = 1 s).
id | Label | Gesture | Instances | Mean duration (std) |
---|---|---|---|---|
1 | D0X | Non-gesture | 1431 | 147 (133) |
2 | B0A | Pointing with one finger | 1010 | 219 (67) |
3 | B0B | Pointing with two fingers | 1007 | 224 (69) |
4 | G01 | Click with one finger | 200 | 56 (29) |
5 | G02 | Click with two fingers | 200 | 60 (43) |
6 | G03 | Throw up | 200 | 62 (25) |
7 | G04 | Throw down | 201 | 65 (28) |
8 | G05 | Throw left | 200 | 66 (27) |
9 | G06 | Throw right | 200 | 64 (28) |
10 | G07 | Open twice | 200 | 76 (31) |
11 | G08 | Double click with one finger | 200 | 68 (28) |
12 | G09 | Double click with two fingers | 200 | 70 (30) |
13 | G10 | Zoom in | 200 | 65 (29) |
14 | G11 | Zoom out | 200 | 64 (28) |
All non-gestures: | 1431 | 147 (133) | ||
All gestures: | 4218 | 140 (94) | ||
Total: | 5649 | 142 (105) |
Baseline results for isolated and continuous hand gesture recognition of the IPN Hand dataset can be found here.
Please install the following requirements.
- Python 3.5+
- PyTorch 1.0+
- TorchVision
- Pillow
- OpenCV
- ResNeXt-101 models
- ResNet-50 models
- HarDNet model (soon)
- Optical Flow model
- Download the dataset from here
- Clone this repository
$ git clone https://github.com/GibranBenitez/IPN-hand
- Store all pretrained models in
./report_ipn/
- Change the path of the dataset from
./tests/run_offline_ipn_Clf.sh
and run
$ bash run_offline_ipn_Clf.sh
- Change the path of the dataset from
./tests/run_online_ipnTest.sh
and run
$ bash run_online_ipnTest.sh
If you find useful the IPN Hand dataset for your research, please cite the paper:
@inproceedings{bega2020IPNhand,
title={IPN Hand: A Video Dataset and Benchmark for Real-Time Continuous Hand Gesture Recognition},
author={Benitez-Garcia, Gibran and Olivares-Mercado, Jesus and Sanchez-Perez, Gabriel and Yanai, Keiji},
booktitle={25th International Conference on Pattern Recognition, {ICPR 2020}, Milan, Italy, Jan 10--15, 2021},
pages={1--8},
year={2021},
organization={IEEE},
}
This project is inspired by many previous works, including:
- Real-time hand gesture detection and classification using convolutional neural networks, Kopuklu et al, FG 2019 [code]
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?, Hara et al, CVPR 2018 [code]
- Optical Flow Estimation Using A Spatial Pyramid Network, Ranjan and Black, CVPR 2017 [code by Niklaus]
- HarDNet: A Low Memory Traffic Network, Chao et al, ICCV 2019 [code]
- Learning to estimate 3d hand pose from single rgb images, Zimmermann and Brox, ICCV 2017 [dataset]