<a href="https://colab.research.google.com/github/ssahu912/caddy-gesture-identification/blob/main/caddy_gestures_course_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Jovian Commit Essentials
# Please retain and execute this cell without modifying the contents for `jovian.commit` to work
!pip install jovian --upgrade -q
import jovian
jovian.utils.colab.set_colab_file_id('1FdIwG5tk-mNrei-SVpC6SEbi9VAhL-pY')

[?25l[K     |█████                           | 10kB 12.5MB/s eta 0:00:01[K     |██████████                      | 20kB 9.7MB/s eta 0:00:01[K     |██████████████▉                 | 30kB 7.1MB/s eta 0:00:01[K     |███████████████████▉            | 40kB 7.1MB/s eta 0:00:01[K     |████████████████████████▉       | 51kB 4.6MB/s eta 0:00:01[K     |█████████████████████████████▊  | 61kB 4.6MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 3.3MB/s 
[?25h  Building wheel for uuid (setup.py) ... [?25l[?25hdone


In [None]:
!pip install jovian --upgrade --quiet

In [None]:
project_name='caddy-gestures-course-project'

# Exploring CADDY Underwater Gestures Dataset
### Human-Robot Interaction (HRI) for Diver and AUVs activities
This is an open access dataset distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
<p>
The dataset can be downloaded from the <a href="http://www.caddian.eu//assets/caddy-gestures-TMP/CADDY_gestures_complete_v2_release.zip">link</a>.
<p>
Choosing this dataset of hand gestures used by divers underwater to provide instructtions in 8 different scenarios. The scenarios involved in this dataset are as follows: 
<table><tr>
<td> <img src="http://www.caddian.eu//assets/caddy-gestures-TMP/biograd-A/true_positives/raw/biograd-A_00162_left.jpg" alt="Drawing" style="width: 250px;"/> </td>
<td> <img src="http://www.caddian.eu//assets/caddy-gestures-TMP/biograd-B/true_positives/raw/biograd-B_00032_right.jpg" alt="Drawing" style="width: 250px;"/> </td>
<td> <img src="http://www.caddian.eu//assets/caddy-gestures-TMP/biograd-C/true_positives/raw/biograd-C_00098_right.jpg" alt="Drawing" style="width: 250px;"/> </td>
<td> <img src="http://www.caddian.eu//assets/caddy-gestures-TMP/brodarski-A/true_positives/raw/brodarski-A_00018_right.jpg" alt="Drawing" style="width: 250px;"/> </td>
</tr>
<td>BioGrad-A</td>
<td>BioGrad-B</td>
<td>BioGrad-C</td>
<td>Brodarski-A</td>
</tr>
<tr>
<td> <img src="http://www.caddian.eu//assets/caddy-gestures-TMP/brodarski-B/true_positives/raw/brodarski-B_00029_right.jpg" alt="Drawing" style="width: 250px;"/> </td>
<td> <img src="http://www.caddian.eu//assets/caddy-gestures-TMP/brodarski-C/true_positives/raw/brodarski-C_00006_left.jpg" alt="Drawing" style="width: 250px;"/> </td>
<td> <img src="http://www.caddian.eu//assets/caddy-gestures-TMP/brodarski-D/true_positives/raw/brodarski-D_00032_right.jpg" alt="Drawing" style="width: 250px;"/> </td>
<td> <img src="http://www.caddian.eu//assets/caddy-gestures-TMP/genova-A/true_positives/raw/genova-A_00032_right.jpg" alt="Drawing" style="width: 250px;"/> </td>
</tr>
</tr>
<td>Brodarski-B</td>
<td>Brodarski-C</td>
<td>Brodarski-D</td>
<td>Genova-A</td>
</tr>
</table>
It is a classification type problem where given an image of the person with the gesture the machine will identify the gesture meaning in multiple scenarios.


If you wish to upload the data on the drive then use this code to mount the drive so that it is available to use.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
import torch
import torchvision
import zipfile
from torchvision.datasets.utils import download_url
from torch.utils.data import random_split

If you wish to download the data set and load it while running the notebook and do not want ot upload it to your drive please uncomment the below cell and execute. 

In [None]:
# # Dowload the dataset
# dataset_url = "http://www.caddian.eu//assets/caddy-gestures-TMP/CADDY_gestures_complete_v2_release.zip"
# download_url(dataset_url, '.')
# # Extract from downloaded archive
# with zipfile.ZipFile('./CADDY_gestures_complete_v2_release.zip', 'r') as zip_ref:
#     zip_ref.extractall('./data')

In [None]:
# Extract from drive archive
with zipfile.ZipFile('./drive/MyDrive/CADDY_gestures_complete_v2_release.zip', 'r') as zip_ref:
    zip_ref.extractall('./data')

## Data Analysis

Let's explore the data!

Looking at the folder structure we see that we have 8 folders for 8 scenarios which contain raw images of multiple gestures captured by the ledt and right stereo.
<p>
The folder structure of the data is something like this:
<p>
data => biograd-A => true_positives => raw => image1, image2 ... imageN

Let's take a look what's inside the the csv files and how are they structured!!
(We'll be looking at the true positives only for the scope of the project.)

In [None]:
# Importing required EDA tools
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [None]:
dataFromCSV = pd.read_csv('./data/CADDY_gestures_complete_v2_release/CADDY_gestures_all_true_positives_release_v2.csv', index_col='index')
dataFromCSV.head()

Unnamed: 0_level_0,scenario,stereo left,stereo right,label name,label id,roi left,roi right,synthetic,iqa_mdm_entropy,iqa_mdm_d,iqa_mdm_dcomp,distortion,param 1,param 2
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,biograd-A,/biograd-A/true_positives/raw/biograd-A_00000_...,/biograd-A/true_positives/raw/biograd-A_00000_...,num_delimiter,10,"[237,236,54,65]","[155,236,54,65]",0,6.971026,0.957653,0.902,,,
1,biograd-A,/biograd-A/true_positives/blurred/dir_00/biogr...,/biograd-A/true_positives/blurred/dir_00/biogr...,num_delimiter,10,"[237,236,54,65]","[155,236,54,65]",1,,,,blur,7.0,
2,biograd-A,/biograd-A/true_positives/blurred/dir_01/biogr...,/biograd-A/true_positives/blurred/dir_01/biogr...,num_delimiter,10,"[237,236,54,65]","[155,236,54,65]",1,,,,blur,11.0,
3,biograd-A,/biograd-A/true_positives/blurred/dir_02/biogr...,/biograd-A/true_positives/blurred/dir_02/biogr...,num_delimiter,10,"[237,236,54,65]","[155,236,54,65]",1,,,,blur,15.0,
4,biograd-A,/biograd-A/true_positives/noisy/dir_00/biograd...,/biograd-A/true_positives/noisy/dir_00/biograd...,num_delimiter,10,"[237,236,54,65]","[155,236,54,65]",1,,,,channel noise,5.0,


In [None]:
dataFromCSV.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 92390 entries, 0 to 92389
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   scenario         92390 non-null  object 
 1   stereo left      92390 non-null  object 
 2   stereo right     92390 non-null  object 
 3   label name       92390 non-null  object 
 4   label id         92390 non-null  int64  
 5   roi left         91790 non-null  object 
 6   roi right        91440 non-null  object 
 7   synthetic        92390 non-null  int64  
 8   iqa_mdm_entropy  9239 non-null   float64
 9   iqa_mdm_d        9239 non-null   float64
 10  iqa_mdm_dcomp    9239 non-null   float64
 11  distortion       83151 non-null  object 
 12  param 1          83151 non-null  object 
 13  param 2          18478 non-null  float64
dtypes: float64(4), int64(2), object(8)
memory usage: 10.6+ MB


For the scope of the project where we are classifying the gestures we would only need the label name of the the image from the CSV.
<p>
Also a point worth noting is not all the images are present in the data set as mentioned in CSV. If we look closely inside each and every scenario there are 2 folders "true_positives" and "true_neagtives", and inside each of them there's only one folder called "raw". This means that only raw images are present and not the distorted ones.
<p>
Hence we'll be working on the raw set of images which are around 18,400 which is a pretty decent data size.

Let's idenetify how many different gestures are available in the dataset.

In [None]:
classes = dataFromCSV["label name"].unique()
classes

array(['num_delimiter', 'five', 'end_comm', 'start_comm', 'one', 'two',
       'three', 'four', 'up', 'down', 'backwards', 'mosaic', 'boat',
       'carry', 'here', 'photo'], dtype=object)

In [None]:
len(classes)

16

We see that we have 16 unique classes hence we have to categorize the images into 16 classes.(output)

### Data Cleaning 

As we saw earlier we have a lot of missing and junk data which we need to filter out, or we can say we need to extract the useful data from the CSV and map them with the images.
<p>
So let's create another CSV which can help map the raw data image paths with their labels.


In [None]:
!pip install jovian --upgrade --quiet

In [None]:
import jovian

In [None]:
jovian.commit(project=project_name)

[jovian] Detected Colab notebook...[0m
[jovian] Please enter your API key ( from https://jovian.ai/ ):[0m
API KEY: ··········
[jovian] Uploading colab notebook to Jovian...[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/shubham912sahu/caddy-gestures-course-project[0m


'https://jovian.ai/shubham912sahu/caddy-gestures-course-project'