<a href="https://colab.research.google.com/github/fernando-resende/libras-interpreter/blob/main/LibrasInterpreter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Interpretador de Libras - Linguagem Brasileira de Sinais

## Sobre

Interpretador, em tempo real, do alfabeto em libras.

Até o momento, apenas caracteres de gestos estáticos são interpreatados, portanto caracteres como J, X e Z não são reconhecidos.

Referência da execução dos gestos do alfabeto manual em libras: [http://www.spreadthesign.com/pt.br/alphabet/11/](http://www.spreadthesign.com/pt.br/alphabet/11/)

Representação do alfabeto manual:

![Alfabeto manual de libras](https://s1.static.brasilescola.uol.com.br/img/2019/09/alfabeto.png)

**Fonte:** [https://brasilescola.uol.com.br/educacao/lingua-brasileira-sinais-libras.htm](https://brasilescola.uol.com.br/educacao/lingua-brasileira-sinais-libras.htm)

## Dependências

### Instalando bibliotecas

Necessária a intalação das bibliotecas mediapipe e cvzone.

In [3]:
!pip install cvzone
!pip install mediapipe

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting cvzone
  Downloading cvzone-1.5.6.tar.gz (12 kB)
Building wheels for collected packages: cvzone
  Building wheel for cvzone (setup.py) ... [?25l[?25hdone
  Created wheel for cvzone: filename=cvzone-1.5.6-py3-none-any.whl size=18768 sha256=6ef9cc73232d95c520796138898279ba4f298e158a1c3c3e2b04c5b0feb926e8
  Stored in directory: /root/.cache/pip/wheels/c1/e8/e9/80f482161ba9f5dcf4832b76ac70540edd11a3136a58445c52
Successfully built cvzone
Installing collected packages: cvzone
Successfully installed cvzone-1.5.6
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting mediapipe
  Downloading mediapipe-0.8.11-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (31.5 MB)
[K     |████████████████████████████████| 31.5 MB 1.2 MB/s 
Installing collected packages: mediapipe
Successfully installed mediapipe-0.8.11


### Importando bibliotecas

Preparando as dependências necessárias para executar o projeto.

In [4]:
from IPython.display import display, Javascript, Image
from google.colab.output import eval_js
from google.colab.patches import cv2_imshow
from base64 import b64decode, b64encode
import PIL
import io

import math
import shutil
import string
import time
import os
import cv2
import numpy as np
from cvzone.HandTrackingModule import HandDetector
from cvzone.ClassificationModule import Classifier
from enum import Enum

## Constantes e variáveis

Definindo as contantes a serem utilizadas e as variáveis.

Nessa etapa é realizada a importação de modelo previamente treinado

In [10]:
#Downaloading model and labels
#!wget --no-check-certificate -r https://drive.google.com/file/d/1Gnqak-ibADDd5NcpmfqLSlEFo41hTCp0/view \
#    -O /tmp/keras_model.h5
#!wget --no-check-certificate -r https://drive.google.com/file/d/1KEPMVv7vE2zw4dIj7Ab4KW5zb1azNwj6/view \
#    -O /tmp/labels.txt

!gdown --id 1Gnqak-ibADDd5NcpmfqLSlEFo41hTCp0 #model
!gdown --id 1KEPMVv7vE2zw4dIj7Ab4KW5zb1azNwj6 #labels

Downloading...
From: https://drive.google.com/uc?id=1Gnqak-ibADDd5NcpmfqLSlEFo41hTCp0
To: /content/keras_model.h5
100% 2.46M/2.46M [00:00<00:00, 223MB/s]
Downloading...
From: https://drive.google.com/uc?id=1KEPMVv7vE2zw4dIj7Ab4KW5zb1azNwj6
To: /content/labels.txt
100% 105/105 [00:00<00:00, 274kB/s]


In [11]:
#Constants
OFFSET = 20
IMG_SIZE = 300
COLOR_MAIN = (0, 255, 0)
COLOR_CONTRAST = (255, 255, 255)

detector = HandDetector(maxHands=1, detectionCon=0.8, minTrackCon=0.8)

#Downloading model and labels
model_link = '/content/keras_model.h5'
labels_link = '/content/labels.txt'

labels = open(labels_link, 'r').readlines()
classifier = Classifier(model_link,labels_link)



## Funções

Definindo funções auxiliares ao projeto.

In [12]:
def getTimeInMilli(period):
    return int(round(period * 1000))

def cv2PutTextWithShadow(img, text, org = (5,15), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.5, color=COLOR_MAIN, thickness=1):
    #cv2.rectangle(img, (0,0), (300,50), color=(250,250,250), thickness=cv2.FILLED) #Background test
    cv2.putText(img, text, org, fontFace=fontFace, fontScale=fontScale, color=(100,100,100), thickness=4) #Shadow
    cv2.putText(img, text, org, fontFace=fontFace, fontScale=fontScale, color=color, thickness=thickness) #Text

As funções do bloco de código a seguir foram obtidas da seguinte referência: https://colab.research.google.com/drive/1cadc7M1_qmZD5ok5G2V-_mMIMmiBhINB#scrollTo=A578sDm_-BJ9

In [13]:
# Helpers
# Reference: https://colab.research.google.com/drive/1cadc7M1_qmZD5ok5G2V-_mMIMmiBhINB#scrollTo=A578sDm_-BJ9
# function to convert the JavaScript object into an OpenCV image
def js_to_image(js_reply):
  """
  Params:
          js_reply: JavaScript object containing image from webcam
  Returns:
          img: OpenCV BGR image
  """
  # decode base64 image
  image_bytes = b64decode(js_reply.split(',')[1])
  # convert bytes to numpy array
  jpg_as_np = np.frombuffer(image_bytes, dtype=np.uint8)
  # decode numpy array into OpenCV BGR image
  img = cv2.imdecode(jpg_as_np, flags=1)

  return img

# function to convert OpenCV Rectangle bounding box image into base64 byte string to be overlayed on video stream
def bbox_to_bytes(bbox_array):
  """
  Params:
          bbox_array: Numpy array (pixels) containing rectangle to overlay on video stream.
  Returns:
        bytes: Base64 image byte string
  """
  # convert array into PIL image
  bbox_PIL = PIL.Image.fromarray(bbox_array, 'RGBA')
  iobuf = io.BytesIO()
  # format bbox into png for return
  bbox_PIL.save(iobuf, format='png')
  # format return string
  bbox_bytes = 'data:image/png;base64,{}'.format((str(b64encode(iobuf.getvalue()), 'utf-8')))

  return bbox_bytes

# JavaScript to properly create our live video stream using our webcam as input
def video_stream():
  js = Javascript('''
    var video;
    var div = null;
    var stream;
    var captureCanvas;
    var imgElement;
    var labelElement;
    
    var pendingResolve = null;
    var shutdown = false;
    
    function removeDom() {
       stream.getVideoTracks()[0].stop();
       video.remove();
       div.remove();
       video = null;
       div = null;
       stream = null;
       imgElement = null;
       captureCanvas = null;
       labelElement = null;
    }
    
    function onAnimationFrame() {
      if (!shutdown) {
        window.requestAnimationFrame(onAnimationFrame);
      }
      if (pendingResolve) {
        var result = "";
        if (!shutdown) {
          captureCanvas.getContext('2d').drawImage(video, 0, 0, 640, 480);
          result = captureCanvas.toDataURL('image/jpeg', 0.8)
        }
        var lp = pendingResolve;
        pendingResolve = null;
        lp(result);
      }
    }
    
    async function createDom() {
      if (div !== null) {
        return stream;
      }

      div = document.createElement('div');
      div.style.border = '2px solid black';
      div.style.padding = '3px';
      div.style.width = '100%';
      div.style.maxWidth = '600px';
      document.body.appendChild(div);
      
      const modelOut = document.createElement('div');
      modelOut.innerHTML = "<span>Status:</span>";
      labelElement = document.createElement('span');
      labelElement.innerText = 'No data';
      labelElement.style.fontWeight = 'bold';
      modelOut.appendChild(labelElement);
      div.appendChild(modelOut);
           
      video = document.createElement('video');
      video.style.display = 'block';
      video.width = div.clientWidth - 6;
      video.setAttribute('playsinline', '');
      video.onclick = () => { shutdown = true; };
      stream = await navigator.mediaDevices.getUserMedia(
          {video: { facingMode: "environment"}});
      div.appendChild(video);

      imgElement = document.createElement('img');
      imgElement.style.position = 'absolute';
      imgElement.style.zIndex = 1;
      imgElement.onclick = () => { shutdown = true; };
      div.appendChild(imgElement);
      
      const instruction = document.createElement('div');
      instruction.innerHTML = 
          '<span style="color: red; font-weight: bold;">' +
          'When finished, click here or on the video to stop this demo</span>';
      div.appendChild(instruction);
      instruction.onclick = () => { shutdown = true; };
      
      video.srcObject = stream;
      await video.play();

      captureCanvas = document.createElement('canvas');
      captureCanvas.width = 640; //video.videoWidth;
      captureCanvas.height = 480; //video.videoHeight;
      window.requestAnimationFrame(onAnimationFrame);
      
      return stream;
    }
    async function stream_frame(label, imgData) {
      if (shutdown) {
        removeDom();
        shutdown = false;
        return '';
      }

      var preCreate = Date.now();
      stream = await createDom();
      
      var preShow = Date.now();
      if (label != "") {
        labelElement.innerHTML = label;
      }
            
      if (imgData != "") {
        var videoRect = video.getClientRects()[0];
        imgElement.style.top = videoRect.top + "px";
        imgElement.style.left = videoRect.left + "px";
        imgElement.style.width = videoRect.width + "px";
        imgElement.style.height = videoRect.height + "px";
        imgElement.src = imgData;
      }
      
      var preCapture = Date.now();
      var result = await new Promise(function(resolve, reject) {
        pendingResolve = resolve;
      });
      shutdown = false;
      
      return {'create': preShow - preCreate, 
              'show': preCapture - preShow, 
              'capture': Date.now() - preCapture,
              'img': result};
    }
    ''')

  display(js)
  
def video_frame(label, bbox):
  data = eval_js('stream_frame("{}", "{}")'.format(label, bbox))
  return data

## Detecção em vídeo

Realizando a caputra de vídeo e inferindo as previsões em tempo real de acordos com os gestos realizados.

Importante: devido a adaptação para realizar a captura de imagens usando JavaScript e convertendo-as para o Python certamente ocorrerá certo atraso aparentando haver um pequeno travamento ou o efeito de vídeo sendo exibido com baixa taxa de quadros por segundo (fps).

Referência utilizada como base para identificação da mão e sinais: [https://www.youtube.com/watch?v=wa2ARoUUdU8](https://www.youtube.com/watch?v=wa2ARoUUdU8)

In [14]:
# start streaming video from webcam
video_stream()
# label for video
label_html = 'Capturing...'
# initialze bounding box to empty
bbox = ''

while True:
    js_reply = video_frame(label_html, bbox)
    if not js_reply:
        break

    # convert JS response to OpenCV Image
    img = js_to_image(js_reply["img"])

    # create transparent overlay for bounding box
    bbox_array = np.zeros([480,640,4], dtype=np.uint8)

    hands, img = detector.findHands(img)

    if hands:
        hand = hands[0]
        x, y, width, height = hand['bbox']
        imgBgBlack = np.zeros((IMG_SIZE, IMG_SIZE, 3), dtype=np.uint8)
        imgCropped = img[y - OFFSET:y + height, x- OFFSET:x + width + OFFSET]
        aspectRatio = height/width

        #Calculations need to center the hand image on a fixed width/height background
        baseCalc = (IMG_SIZE / height) if aspectRatio > 1 else (IMG_SIZE / width)
        aspectCalc = math.ceil((baseCalc * width) if aspectRatio > 1 else (baseCalc * height))
        gapStart = math.ceil((IMG_SIZE - aspectCalc) / 2)
        gapEnd = gapStart + aspectCalc

        try:
            if aspectRatio > 1:
                imgResized = cv2.resize(imgCropped, (aspectCalc, IMG_SIZE))
                imgBgBlack[:, gapStart:gapEnd] = imgResized

            else:
                imgResized = cv2.resize(imgCropped, (IMG_SIZE, aspectCalc))
                imgBgBlack[gapStart:gapEnd, :] = imgResized

            imgResizedShape = imgResized.shape

            #Predict libras char
            predictions, index = classifier.getPrediction(imgBgBlack, draw=False)
            confidence = predictions[index]
            prediction = pred=labels[index].split(" ")[1].replace("\n","")
            prediction = f'{prediction} | {round(confidence * 100, 2)}%' if confidence >= 0.7 else '?'

            bbox_array = cv2.rectangle(bbox_array, (x, y - OFFSET - 30), (x + width + OFFSET + 2, y - OFFSET), color=COLOR_MAIN, thickness=cv2.FILLED)
            bbox_array = cv2.putText(bbox_array, prediction, (x, y - 25), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.7, color=COLOR_CONTRAST, thickness=2)
            bbox_array = cv2.rectangle(bbox_array, (x - OFFSET, y - OFFSET), (x + width + OFFSET, y + height + OFFSET), color=COLOR_MAIN, thickness=3)
                
            
        except Exception as e:
            print('### ERROR FOUND ###')
            print(f'Aspect calculated: {aspectCalc}')
            print(f'Error: {e}')
    
    #cv2_imshow(img)
    
    bbox_array[:,:,3] = (bbox_array.max(axis = 2) > 0 ).astype(int) * 255
    # convert overlay of bbox into bytes
    bbox_bytes = bbox_to_bytes(bbox_array)
    # update bbox so next frame gets new overlay
    bbox = bbox_bytes

cv2.destroyAllWindows()

<IPython.core.display.Javascript object>

