<a href="https://colab.research.google.com/github/haniaraslan/ASL_Gesture_recognition/blob/main/ASL_Recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Project: /mediapipe/_project.yaml
Book: /mediapipe/_book.yaml

<link rel="stylesheet" href="/mediapipe/site.css">

# Hand gesture recognition model customization guide

<table align="left" class="buttons">
  <td>
    <a href="https://colab.research.google.com/github/googlesamples/mediapipe/blob/main/examples/customization/gesture_recognizer.ipynb" target="_blank">
      <img src="https://developers.google.com/static/mediapipe/solutions/customization/colab-logo-32px_1920.png" alt="Colab logo"> Run in Colab
    </a>
  </td>

  <td>
    <a href="https://github.com/googlesamples/mediapipe/blob/main/examples/customization/gesture_recognizer.ipynb" target="_blank">
      <img src="https://developers.google.com/static/mediapipe/solutions/customization/github-logo-32px_1920.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
</table>

In [None]:
#@title License information
# Copyright 2023 The MediaPipe Authors.
# Licensed under the Apache License, Version 2.0 (the "License");
#
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

The MediaPipe Model Maker package is a low-code solution for customizing on-device machine learning (ML) Models.

This notebook shows the end-to-end process of customizing a gesture recognizer model for recognizing some common hand gestures in the [HaGRID](https://www.kaggle.com/datasets/innominate817/hagrid-sample-30k-384p) dataset.

## Prerequisites

Install the MediaPipe Model Maker package.

In [None]:
!pip install --upgrade pip
!pip install mediapipe-model-maker

In [None]:
!pip install gTTS


Import the required libraries.

In [None]:
from google.colab import files
import os
import tensorflow as tf
assert tf.__version__.startswith('2')

from mediapipe_model_maker import gesture_recognizer

import matplotlib.pyplot as plt
from gtts import gTTS
%matplotlib inline

## Simple End-to-End Example

This end-to-end example uses Model Maker to customize a model for on-device gesture recognition.

### Get the dataset

The dataset for gesture recognition in model maker requires the following format: `<dataset_path>/<label_name>/<img_name>.*`. In addition, one of the label names (`label_names`) must be `none`. The `none` label represents any gesture that isn't classified as one of the other gestures.

This example uses a rock paper scissors dataset sample which is downloaded from GCS.

In [None]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d grassknoted/asl-alphabet

In [None]:
import zipfile
import os

with zipfile.ZipFile("/content/asl-alphabet.zip", 'r') as zip_ref:
    zip_ref.extractall("dataset-folder")


Verify the rock paper scissors dataset by printing the labels. There should be 4 gesture labels, with one of them being the `none` gesture.

In [None]:
dataset_path="/content/dataset-folder/asl_alphabet_train/asl_alphabet_train"
print(dataset_path)
labels = []
for i in os.listdir(dataset_path):
  if os.path.isdir(os.path.join(dataset_path, i)) and i != ".ipynb_checkpoints":
    labels.append(i)
print(labels)

To better understand the dataset, plot a couple of example images for each gesture.

In [None]:
NUM_EXAMPLES = 5

for label in labels:
  label_dir = os.path.join(dataset_path, label)
  example_filenames = os.listdir(label_dir)[:NUM_EXAMPLES]
  fig, axs = plt.subplots(1, NUM_EXAMPLES, figsize=(10,2))
  for i in range(NUM_EXAMPLES):
    axs[i].imshow(plt.imread(os.path.join(label_dir, example_filenames[i])))
    axs[i].get_xaxis().set_visible(False)
    axs[i].get_yaxis().set_visible(False)
  fig.suptitle(f'Showing {NUM_EXAMPLES} examples for {label}')

plt.show()

### Run the example
The workflow consists of 4 steps which have been separated into their own code blocks.

**Load the dataset**

Load the dataset located at `dataset_path` by using the `Dataset.from_folder` method. When loading the dataset, run the pre-packaged hand detection model from MediaPipe Hands to detect the hand landmarks from the images. Any images without detected hands are ommitted from the dataset. The resulting dataset will contain the extracted hand landmark positions from each image, rather than images themselves.

The `HandDataPreprocessingParams` class contains two configurable options for the data loading process:
* `shuffle`: A boolean controlling whether to shuffle the dataset. Defaults to true.
* `min_detection_confidence`: A float between 0 and 1 controlling the confidence threshold for hand detection.

Split the dataset: 80% for training, 10% for validation, and 10% for testing.

In [None]:
data = gesture_recognizer.Dataset.from_folder(
    dirname=dataset_path,
    hparams=gesture_recognizer.HandDataPreprocessingParams()
)
train_data, rest_data = data.split(0.8)
validation_data, test_data = rest_data.split(0.5)

**Train the model**

Train the custom gesture recognizer by using the create method and passing in the training data, validation data, model options, and hyperparameters. For more information on model options and hyperparameters, see the [Hyperparameters](#hyperparameters) section below.

In [None]:
hparams = gesture_recognizer.HParams(export_dir="exported_model", epochs = 10, batch_size= 60)
options = gesture_recognizer.GestureRecognizerOptions(hparams=hparams)
model = gesture_recognizer.GestureRecognizer.create(
    train_data=train_data,
    validation_data=validation_data,
    options=options
)

**Evaluate the model performance**

After training the model, evaluate it on a test dataset and print the loss and accuracy metrics.

In [None]:
loss, acc = model.evaluate(test_data, batch_size=1)
print(f"Test loss:{loss}, Test accuracy:{acc}")

In [None]:
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2

plt.rcParams.update({
    'axes.spines.top': False,
    'axes.spines.right': False,
    'axes.spines.left': False,
    'axes.spines.bottom': False,
    'xtick.labelbottom': False,
    'xtick.bottom': False,
    'ytick.labelleft': False,
    'ytick.left': False,
    'xtick.labeltop': False,
    'xtick.top': False,
    'ytick.labelright': False,
    'ytick.right': False
})

mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles


def display_one_image(image, title, subplot, titlesize=16):
    """Displays one image along with the predicted category name and score."""
    plt.subplot(*subplot)
    plt.imshow(image)
    if len(title) > 0:
        plt.title(title, fontsize=int(titlesize), color='black', fontdict={'verticalalignment':'center'}, pad=int(titlesize/1.5))
    return (subplot[0], subplot[1], subplot[2]+1)


def display_batch_of_images_with_gestures_and_hand_landmarks(images, results):
    """Displays a batch of images with the gesture category and its score along with the hand landmarks."""
    # Images and labels.
    images = [image.numpy_view() for image in images]
    gestures = [top_gesture for (top_gesture, _) in results]
    multi_hand_landmarks_list = [multi_hand_landmarks for (_, multi_hand_landmarks) in results]

    # Auto-squaring: this will drop data that does not fit into square or square-ish rectangle.
    rows = int(math.sqrt(len(images)))
    cols = len(images) // rows

    # Size and spacing.
    FIGSIZE = 13.0
    SPACING = 0.1
    subplot=(rows,cols, 1)
    if rows < cols:
        plt.figure(figsize=(FIGSIZE,FIGSIZE/cols*rows))
    else:
        plt.figure(figsize=(FIGSIZE/rows*cols,FIGSIZE))

    # Display gestures and hand landmarks.
    for i, (image, gestures) in enumerate(zip(images[:rows*cols], gestures[:rows*cols])):
        title = f"{gestures.category_name} ({gestures.score:.2f})"
        dynamic_titlesize = FIGSIZE*SPACING/max(rows,cols) * 40 + 3
        annotated_image = image.copy()

        for hand_landmarks in multi_hand_landmarks_list[i]:
          hand_landmarks_proto = landmark_pb2.NormalizedLandmarkList()
          hand_landmarks_proto.landmark.extend([
            landmark_pb2.NormalizedLandmark(x=landmark.x, y=landmark.y, z=landmark.z) for landmark in hand_landmarks
          ])

          mp_drawing.draw_landmarks(
            annotated_image,
            hand_landmarks_proto,
            mp_hands.HAND_CONNECTIONS,
            mp_drawing_styles.get_default_hand_landmarks_style(),
            mp_drawing_styles.get_default_hand_connections_style())

        subplot = display_one_image(annotated_image, title, subplot, titlesize=dynamic_titlesize)

    # Layout.
    plt.tight_layout()
    plt.subplots_adjust(wspace=SPACING, hspace=SPACING)
    plt.show()

In [None]:
# Function to speak the detected letter
def speak_detection_gtts(letter):
  if(letter):
    if isinstance(letter, str):  # Ensure it's a string before passing to gTTS
        tts = gTTS(text=letter, lang='en')
        tts.save(f"speech_{letter}.mp3")
        os.system("mpg321 speech.mp3")  # Play the generated speech
    else:
        print(f"Invalid input to TTS: {letter}")
  else:
    print("No letter detected")

In [None]:
# STEP 1: Import the necessary modules.
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import math
import matplotlib.pyplot as plt
%matplotlib inline


# STEP 2: Create an GestureRecognizer object.
base_options = python.BaseOptions(model_asset_path='exported_model/gesture_recognizer.task')
options = vision.GestureRecognizerOptions(base_options=base_options)

recognizer = vision.GestureRecognizer.create_from_options(options)

images = []
results = []
true_labels = []  # Ground truth labels
predicted_labels = []  # Model's predictions

for label in test_data.label_names:
  label_dir = ''
  if(label == 'none'):
    label_dir = os.path.join(dataset_path, "none")
  else:
    label_dir = os.path.join(dataset_path, label)
  example_filenames2 = os.listdir(label_dir)[:5]

  for image_file_name in example_filenames2:
    print(label_dir +'/'+ image_file_name)
    # STEP 3: Load the input image.
    image = mp.Image.create_from_file(label_dir +'/'+ image_file_name)

    # STEP 4: Recognize gestures in the input image.
    recognition_result = recognizer.recognize(image)
    true_labels.append(label)

    if (recognition_result.gestures):
    # STEP 5: Process the result. In this case, visualize it.
      images.append(image)
      top_gesture = recognition_result.gestures[0][0]
      predicted_labels.append(top_gesture.category_name)
      speak_detection_gtts(top_gesture.category_name)
      hand_landmarks = recognition_result.hand_landmarks
      results.append((top_gesture, hand_landmarks))
    else:
      predicted_labels.append('none')  # Assuming 'none' for no gesture detected


display_batch_of_images_with_gestures_and_hand_landmarks(images, results)

In [None]:
hello = ["/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/M_test.jpg",
          "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/O_test.jpg",
          "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/O_test.jpg",
          "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/N_test.jpg",
          "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/space_test.jpg",
         "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/L_test.jpg",
        "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/I_test.jpg",
        "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/G_test.jpg",
        "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/H_test.jpg",
        "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/T_test.jpg"]


external_test = [ "/content/S.jpg",
                  "/content/A.jpg",
                 "/content/L.jpg",
                 "/content/E.jpeg",
                  "/content/dataset-folder/asl_alphabet_test/asl_alphabet_test/space_test.jpg",
                  "/content/N.jpg",
                  "/content/E.jpeg",
                  "/content/W.jpg",]

name = ["/content/my name/H.jpg",
        "/content/my name/A.jpg",
        "/content/my name/N3.jpg",
        "/content/my name/I.jpg",
        "/content/my name/A.jpg",
        ]

In [None]:
images = []
results = []
predicted_labels = []  # Model's predictions
word = ""
for image_file_name in hello:
    # STEP 3: Load the input image.
    image = mp.Image.create_from_file(image_file_name)
    # STEP 4: Recognize gestures in the input image.
    recognition_result = recognizer.recognize(image)

    if (recognition_result.gestures):
    # STEP 5: Process the result. In this case, visualize it.
      images.append(image)
      top_gesture = recognition_result.gestures[0][0]
      predicted_labels.append(top_gesture.category_name)
      if(top_gesture.category_name == "space"):
        word+=" "
      else:
        word+=(top_gesture.category_name)
      hand_landmarks = recognition_result.hand_landmarks
      results.append((top_gesture, hand_landmarks))
    else:
      predicted_labels.append('none')  # Assuming 'none' for no gesture detected
print(word)
speak_detection_gtts(word.lower())

**Export to Tensorflow Lite Model**

After creating the model, convert and export it to a Tensorflow Lite model format for later use on an on-device application. The export also includes model metadata, which includes the label file.

In [None]:
model.export_model()
!ls exported_model

In [None]:
files.download('exported_model/gesture_recognizer.task')

In [None]:
model.export_tflite("tflite_model")
!ls tflite_model
files.download('/content/ASL_Gesture_recognition/tflite_model/model.tflite')