# MicroVision Classification Challenge

The goal of this task is to understand your approach in solving a hand gesture classification problem using a dataset comprising of depth images obtained from a Kinect v2 Camera (obtained from [Kaggle](https://www.kaggle.com/gti-upm/depthgestrecog) and redistributed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License)](https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode). 

The database is composed by 11 different hand-gestures that are performed by 6 different subjects. The goal of the task is to classify a given frame into one of the 11 gesture classes. The dataset consists of static and dynamic(time-varying gestures: like palm_lr, thumb_ud). For the purpose of this task you can choose to consider all gestures to be static and time-independent.

The database is structured in different folders as (a detailed description can be found below):

    /fist (fist hand-gesture)
        /fist/video_base_1 (base fist hand-gesture of subject 1)
            /fist/video_base_1/s01_g10_011_1.png,...,s01_g10_050_1.png,...
        /fist/video_base_2 (base fist hand-gesture of subject 2)
            /fist/video_base_2/s02_g10_011_1.png,...,s02_g10_050_1.png,...
        /fist/video_base_6 (base fist hand-gesture of subject 6)
            /fist/video_base_6/s06_g10_011_1.png,...,s06_g10_050_1.png,...
        /fist/video_moved_1_1 (moved up fist hand-gesture of subject 1)
            /fist/video_moved_1_1/s01_g10_011_2.png,...,s01_g10_050_2.png,...
        /fist/video_moved_1_2 (moved down fist hand-gesture of subject 1)
            /fist/video_moved_1_2/s01_g10_011_3.png,...,s01_g10_050_3.png,...
        /fist/video_moved_1_8 (moved up and left fist hand-gesture of subject 1)
        /fist/video_moved_1_8/s01_g10_011_9.png,...,s01_g10_050_9.png,...
        /fist/video_moved_6_1 (moved up fist hand-gesture of subject 6)
        /fist/video_moved_6_8 (moved up and left fist hand-gesture of subject 6)
    /grab
    /one_finger
    /palm (palm hand-gesture)
    /thumb_ud

Every root folder (fist, grab,...) contains the range images of one hand-gesture. The folder name is the identifier of the hand-gesture (for example fist, palm, thumb_ud,...).

Inside every root folder, there are 54 folders: 6 of them are the base hand-gestures (/fist/video_base_1,...,/fist/video_base_2) and the others are the moved hand-gestures used to increment the training samples (/fist/video_moved_1_1,...,/fist/video_moved_1_8,...,/fist/video_moved_6_1,...,/fist_video_moved_6_8). Inside every subfolder, there are a set of range hand images that can be true/positive samples or false/negative samples. The structure of the name of each frame is the same: sXX_gYY_ZZZ_M.png where: - XX is the subject identifier. - YY is the gesture identifier. - ZZZ is the number of the frame. - M indicates if the frame belongs to the base video (M = 1) or if it belongs to a moved video (M=2,...9).

For example, the frame 's02_g_05_060_1' indicates that the frame belongs to the fifth gesture, which is performed by the second subject, it is the frame number 60, and it belongs to the base video.

Some of the code snippets below may have bugs and/or can be optimized to improve performance. Feel free to correct and improve the code by editing and adding to the appropriate cells.

If you do not have access to the computing resources required to train a ML based network, please include the full network architecture you propose, and fully functional code that will allow us to train the model on our end. You are free to choose the deep learning framework of your choice, or you may use an approach that is not deep learning based. 

The model must generalize to obtain high accuracy on data that was not included while training. Accuracy will be tested on a hidden test set.

We expect you not to have to spend more than 4 hours on this task, including training time. Please feel free to mention in comments if there are further improvements you would have implemented in the future.

In [None]:
# Import any other libraries and frameworks you may be using

import numpy as np
import sys
import os
import matplotlib.pyplot as plt
import scipy

### Below cell is used to create a Python dictionary that maps the gesture string to a class number. The code is not optimal and there is a bug in the code that prevents it from executing. Please make changes as necessary.

In [None]:
data_dir = "depthGestRecog/"

label_dict={}


for label in os.listdir(data_dir):
    for sub_dir in os.listdir(os.path.join(data_dir,label)):
        class_num = int(os.listdir(os.path.join(data_dir,label,sub_dir))[:,0][5:7])

    label_dict[label] = class_num
    


### Starter code to load data has been provided. Edit as necessary to change dimensions and optimize

In [None]:
x = []
y = []
for root, _, files in os.walk(data_dir):
    for i in range(len(files)):
        x.append((plt.imread((os.path.join(root,files[i])))))
        y.append((int(files[i][5:7])))
        
x = np.asarray(x, dtype=np.float32)
y = np.asarray(y)

### To Do: Convert labels to one-hot encoded vectors

In [None]:
# Please implement (from scratch) a function that converts the labels to one-hot encoded vectors

### To Do: Split dataset into training and validation sets

In [None]:
# You may perform this by using your own functions or using an existing library module

### To Do: Visualize a few entries from dataset (Use matplotlib/OpenCV/PIL libraries)

In [None]:
# 

### To Do (Optional): Perform pre-processing/augmentation/fine-tuning if necessary

In [None]:
#

### To Do: Define Network Architecture and Hyperparameters


In [1]:
#Define learning_rate, n_of_epochs, batch_size etc.


#Please mention in comments your rationale behind selecting architecture, loss function, optimizers and  
#other hyperparameters, with an overview of approach taken 


### To Do: Plot Training/Validation Losses, and Calculate Accuracy

In [2]:
# 1) Using matplotlib, plot training loss, validation loss, training accuracy and 
#    validation accuracy with respect to epochs

# 2) Define a function that takes in as input an array of 'n' images, and a list of 'n' corresponding 
#    class labels, and returns average classification accuracy.