# Gesture Recognition


### Machine Learning

Machine Learning is the field of study that gives computers the capability to learn without being explicitly programmed. "Machine Learning" emphasizes that the computer program (or machine) must do some work after it is given data.  The Learning step is made explicit. Eventhough Machine Learning was started in use to recognize patterns, Researchers started applying Machine Learning to Robotics (reinforcement learning, manipulation, motion planning, grasping), to genome data, as well as to predict financial markets. 

<img src="./images/ml-eng.png">

### Deep Learning

Fast forward to today and what we’re seeing is a large interest in something called Deep Learning which is a subset of Machine Learning. Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans: learn by example. Deep learning is a key technology behind driverless cars, enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost. The most popular kinds of Deep Learning models, as they are using in large scale image recognition tasks, are known as Convolutional Neural Nets, or simply ConvNets. 

<img src="./images/traditional-ml-deep-learning-2.png">

#### Convolutional Neural Network

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.

The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the entire visual area.

<img src="./images/Typical_cnn.png">

### How it works

#### Input image

In the figure, we have an RGB image which has been separated by its three color planes — Red, Green, and Blue. There are a number of such color spaces in which images exist — Grayscale, RGB, HSV, CMYK, etc.

<img src="./images/input-img.png">

You can imagine how computationally intensive things would get once the images reach dimensions, say 8K (7680×4320). The role of the ConvNet is to reduce the images into a form which is easier to process, without losing features which are critical for getting a good prediction.

#### Convolutional layer

Think of convolution as applying a filter to our image. We pass over a mini image, usually called a kernel, and output the resulting, filtered subset of our image.

<img src="./images/Convolution_schematic.gif">

The objective of the Convolution Operation is to extract the high-level features such as edges, from the input image.

<img src="./images/convolution-layer.gif">

#### Pooling layer

Similar to Convolution layer, the pooling layer decreases the computational power required to process the data through dimensionality reduction. Furthermore, it is useful for extracting dominant features which are rotational and positional invariant, thus maintaining the process of effectively training of the model.

<img src="./images/pooling-layer.gif">


The Convolutional Layer and the Pooling Layer, together form the i-th layer of a Convolutional Neural Network. Depending on the complexities in the images, the number of such layers may be increased for capturing low-levels details even further, but at the cost of more computational power.

After going through the above process, we have successfully enabled the model to understand the features. Moving on, we are going to flatten the final output and feed it to a regular Neural Network for classification purposes.


### PyTorch

A replacement for NumPy to use the power of GPUs. 

Lets construct a randomly initialized matrix. Run the snippet below.

In [4]:
import torch

x = torch.rand(5, 3)
print(x)

tensor([[9.8809e-02, 5.7240e-01, 9.6262e-05],
        [7.8903e-01, 7.3890e-01, 8.3572e-01],
        [1.6577e-01, 8.9676e-01, 4.5417e-01],
        [4.0741e-01, 6.9280e-01, 7.5464e-01],
        [6.6123e-01, 6.3295e-01, 3.9002e-01]])


PyTorch uses an imperative / eager paradigm. That is, each line of code required to build a graph defines a component of that graph. We can independently perform computations on these components itself, even before your graph is built completely. This is called “define-by-run” methodology.

<img src="./images/pytorch-variable.gif">

#### Tensors

Tensors are nothing but multidimensional arrays. Tensors in PyTorch are similar to numpy’s ndarrays.

In [2]:
import torch

# define a tensor
a = torch.FloatTensor([2])
b = torch.FloatTensor([3])

print(a + b)

tensor([5.])


### Loading the DataSet