Deep Neural Network Gesture Classifier

Static gestures classification using Convolutional Neural Network on the example of the Russian Sign Language

Publications on ResearchGate

Domain Background

Interfaces of human-computer interaction are diverse in their implementation and scope e.g. systems with console input-output, controllers with gesture control, brain-computer interfaces and others. Systems using data input based on the recognition of custom gestures have gained wide popularity since 2010 after the release of the contactless game controller Kinect from Microsoft. Gesture based controllers increase their market share and become a part of everyday life of different categories of users. So, for example, the Volkswagen car manufacturer introduced the multimedia system Golf R Touch Gesture Control to control the multimedia system of the car by gesture commands.

To translate gesture commands into a control signal, a gesture classification mechanism is needed, which can be obtained from various devices: special gloves defining joint coordinates, as well as 2D and 3D video cameras. The approach using gloves has a significant drawback – a user needs to wear a special device connected to the computer. In turn, the approach based on the concept of computer vision using video cameras is considered more natural and less expensive.

Problem Statement

Project demonstrates the core of Russian Sign Language static gesture classification system, which is based on the approach of computer vision with using convolutional neural network. The work is actual and represents a starting point for researchers in the field of gesture recognition.

Datasets and Inputs

To solve described problem with CNN approach I need a dataset. I didn’t found any open dataset in this field and decided to create my own.

The dataset for learning, validating and testing a neural network consists of around 1000 images (1042 on the moment of writing of this paper) with a resolution of 128x128 pixels. The image is divided into 10 classes, each of which corresponds to a strictly defined gesture.

Dataset (described in gestnames.csv) is a part of repository: hand-dataset.zip

Dataset stats:

The image is divided into 10 classes, each of which corresponds to a strictly defined gesture (picture 1):

Line 1 – Class Id.
Line 2 – Value / Letter.
Line 3 – Quantity in dataset.
Line 4 – image with gesture example.

Solution Statement

Step 1 – image preprocessing. In this step, I’m going to check preprocessing techniques to apply it for an image before providing it for CNN.
Step 2 – normalization. Each pixel value must be in the range 0 to 1 (for Keras input).
Step 3 – training/validation/test split of the dataset. Here I’ll need to choose a proportion.
Step 4 – define CNN architecture. In this step, I need to check different CNN architecture and to see which one performs better.

Data preprocessing

Data preprocessing includes 2 steps:

Convert RGB to YCrCb color space
Threshold color components to perform hand segmentation

skin_ycrcb_mint = np.array((0, 133, 77))
skin_ycrcb_maxt = np.array((255, 173, 127))

Result after preprocessing:

After preprocessing we need to split dataset to the 3 sets:

Training set: 666
Cross-validation set: 167
Test set: 209

Benchmark Model

In this project, I decide to use the state-of-the-art benchmark model (architecture) – LeNet-5 as a BASE for my CNN architecture. It shows impressive result in image classification tasks and has very good performance.

Neural Network Architecture

The LeNet-5 architecture taken as a base. The final model architecture consisted of a convolutional neural network with the following layers and layer sizes:

Important design choice - to apply Dropout - a simple way to prevent neural networks from overfitting).

Evaluation Metrics

Training. Evaluating parameters – accuracy, loss.
Validation. Performs every epoch of CNN training process. Evaluating parameters – accuracy, loss.
Test. Performs after the training of CNN on the data that not presented during the training process. Evaluating parameter – accuracy.

Training results

The validation set helped determine if the model was over or under fitting:

And, finally, test set results:

Test loss: 0.238594918445
Test accuracy: 0.913875598657

Experiment

Evaluate the data that never been in the dataset:

The probability distribution for 10 examples:

Evaluation

Class 0 (А): True, Probability = 1
Class 1 (Б): True, Probability = 0.998
Class 2 (В): True, Probability = 0.998
Class 3 (Г): False, Probability = 0 (Detected Class 7 with Probability = 1)
Class 4 (Е): False, Probability = 0.021 (Detected Class 7 with Probability = 0.783)
Class 5 (И): True, Probability = 0.94
Class 6 (О): True, Probability = 1
Class 7 (П): True, Probability = 1
Class 8 (С): True, Probability = 1
Class 9 (Я): True, Probability = 0.997

91.3% of images from the test set were correct classified.
80% of images from the out of the dataset were correct classified.

Possible points of improvement

Use much more data (to think about croudsourcing data collection)
Use data augmentation techniques
Use advanced preprocessing techniques (more experiments with color spaces, hog features etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
00-Legacy		00-Legacy
01-gesture-dnn-showcase		01-gesture-dnn-showcase
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Neural Network Gesture Classifier

Static gestures classification using Convolutional Neural Network on the example of the Russian Sign Language

Publications on ResearchGate

Domain Background

Problem Statement

Datasets and Inputs

Solution Statement

Data preprocessing

Benchmark Model

Neural Network Architecture

Evaluation Metrics

Training results

Experiment

Evaluation

Possible points of improvement

About

Releases

Packages

Languages

License

olpotkin/DNN-Gesture-Classifier

Folders and files

Latest commit

History

Repository files navigation

Deep Neural Network Gesture Classifier

Static gestures classification using Convolutional Neural Network on the example of the Russian Sign Language

Publications on ResearchGate

Domain Background

Problem Statement

Datasets and Inputs

Solution Statement

Data preprocessing

Benchmark Model

Neural Network Architecture

Evaluation Metrics

Training results

Experiment

Evaluation

Possible points of improvement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages