<a href="https://colab.research.google.com/github/kjxlstad/gestures/blob/main/gestures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 1. Connectiong the webcam to Google Colab using Javascript(coffee)

In [29]:
#@title imports
import base64
import html
import io
import time

from IPython.display import display, Javascript
from google.colab.output import eval_js
import numpy as np
from PIL import Image
import cv2

def start_input():
  js = Javascript('''
    var video;
    var div = null;
    var stream;
    var captureCanvas;
    var imgElement;
    var labelElement;
    
    var pendingResolve = null;
    var shutdown = false;
    
    function removeDom() {
       stream.getVideoTracks()[0].stop();
       video.remove();
       div.remove();
       video = null;
       div = null;
       stream = null;
       imgElement = null;
       captureCanvas = null;
       labelElement = null;
    }
    
    function onAnimationFrame() {
      if (!shutdown) {
        window.requestAnimationFrame(onAnimationFrame);
      }
      if (pendingResolve) {
        var result = "";
        if (!shutdown) {
          captureCanvas.getContext('2d').drawImage(video, 0, 0, 512, 512);
          result = captureCanvas.toDataURL('image/jpeg', 0.8)
        }
        var lp = pendingResolve;
        pendingResolve = null;
        lp(result);
      }
    }
    
    async function createDom() {
      if (div !== null) {
        return stream;
      }
      div = document.createElement('div');
      div.style.border = '2px solid rgb(8, 76, 97)';
      div.style.padding = '3px';
      div.style.width = '100%';
      div.style.maxWidth = '600px';
      document.body.appendChild(div);
      
      const modelOut = document.createElement('div');
      modelOut.innerHTML = "<span>Status:</span>";
      labelElement = document.createElement('span');
      labelElement.innerText = 'No data';
      labelElement.style.fontWeight = 'bold';
      modelOut.appendChild(labelElement);
      div.appendChild(modelOut);
           
      video = document.createElement('video');
      video.style.display = 'block';
      video.width = div.clientWidth - 6;
      video.setAttribute('playsinline', '');
      video.onclick = () => { shutdown = true; };
      stream = await navigator.mediaDevices.getUserMedia(
          {video: { facingMode: "environment"}});
      div.appendChild(video);
      imgElement = document.createElement('img');
      imgElement.style.position = 'absolute';
      imgElement.style.zIndex = 1;
      imgElement.onclick = () => { shutdown = true; };
      div.appendChild(imgElement);
      
      const instruction = document.createElement('div');
      instruction.innerHTML = 
          '<span style="color: red; font-weight: bold;">' +
          'When finished, click here or on the video to stop this demo</span>';
      div.appendChild(instruction);
      instruction.onclick = () => { shutdown = true; };
      
      video.srcObject = stream;
      await video.play();
      captureCanvas = document.createElement('canvas');
      captureCanvas.width = 512; //video.videoWidth;
      captureCanvas.height = 512; //video.videoHeight;
      window.requestAnimationFrame(onAnimationFrame);
      
      return stream;
    }
    async function takePhoto(label, imgData) {
      if (shutdown) {
        removeDom();
        shutdown = false;
        return '';
      }
      var preCreate = Date.now();
      stream = await createDom();
      
      var preShow = Date.now();
      if (label != "") {
        labelElement.innerHTML = label;
      }
            
      if (imgData != "") {
        var videoRect = video.getClientRects()[0];
        imgElement.style.top = videoRect.top + "px";
        imgElement.style.left = videoRect.left + "px";
        imgElement.style.width = videoRect.width + "px";
        imgElement.style.height = videoRect.height + "px";
        imgElement.src = imgData;
      }
      
      var preCapture = Date.now();
      var result = await new Promise(function(resolve, reject) {
        pendingResolve = resolve;
      });
      shutdown = false;
      
      return {'create': preShow - preCreate, 
              'show': preCapture - preShow, 
              'capture': Date.now() - preCapture,
              'img': result};
    }
    ''')

  display(js)
  
def take_photo(label, img_data):
  data = eval_js('takePhoto("{}", "{}")'.format(label, img_data))
  return data

Basically there are two functions we need to use: start_inputand take_photo.
When running start_input, we open the webcam (make sure you allowed your browser / Google Colab to open your camera) then provide the canvas to put everything captured by the webcam and showed it to our Google Colab output.
We can adjust the size of canvas as desired by changing captureCanvas.width and captureCanvas.height.
take_photo return JavaScript object containing the image (in bytes) to be processed in YOLO, we will see in the next section.
The most highlighted inputtake_photo is img_data, it is an image we want to overlay over our webcam image. In our case, it is an image of the bounding boxes. It will be discussed in the last section.
Here is how we open webcam in Google Colab, to stop capturing just click the red text at the output bottom.

### 2. Get object detection bounding box from using YOLO from images on the webcam
First, we have to process the JavaScript output from take_photo function, we name it as js_reply, to get the image in an array format.
This is how we get it done, we decode it from a bytes format and convert it into an array.

Code for a decoder, turning javascrip webcam hooks response object into 512 x 512 x RGB array, and a decoder, whichs turns an 512 x 512 x RGBA array into a javascrip compatible bitstring

In [2]:
def decode(js_reply):
    """
    input: 
          js_reply: JavaScript object, contain image from webcam
    output: 
          image_array: image array RGB size 512 x 512 from webcam
    """
    jpeg_bytes = base64.b64decode(js_reply['img'].split(',')[1])
    image_PIL = Image.open(io.BytesIO(jpeg_bytes))
    image_array = np.array(image_PIL)

    return image_array

In [3]:
def encode(overlay):
    """
    input: 
          overlayy: image RGBA size 512 x 512 
                              contain bounding box and text from yolo prediction, 
                              channel A value = 255 if the pixel contains drawing properties (lines, text) 
                              else channel A value = 0
    output: 
          drawing_b64: string, encoded from overlay
    """

    drawing_PIL = Image.fromarray((overlay), 'RGBA')
    iobuf = io.BytesIO()
    drawing_PIL.save(iobuf, format='png')
    drawing_bytes = 'data:image/png;base64,{}'.format((str(base64.b64encode(iobuf.getvalue()), 'utf-8')))
    return drawing_bytes

Here we get calculate the overlay

In [4]:
def get_overlay(frame): 
    """
    input: 
          frame: image array RGB size 512 x 512 from webcam
    output: 
          overlay: image RGBA size 512 x 512 only contain bounding box and text, 
                              channel A value = 255 if the pixel contains drawing properties (lines, text) 
                              else channel A value = 0
    """
    overlay = np.zeros([512,512,4], dtype=np.uint8)
    
    # define region of interest
    roi = frame[128:384, 128:384]
    cv2.rectangle(overlay, (128, 128), (384, 384), (141, 142, 142, 255), 0)
    

    return overlay

In [28]:
import cv2
start_input()
label_html = 'Capturing...'
img_data = ''

while True:
  capture_start = time.time()
  js_reply = take_photo(label_html, img_data)
  capture_end = time.time()
  if not js_reply:
    break

  webcam_frame = decode(js_reply)
  overlay = get_overlay(webcam_frame)
  drawing_bytes = encode(overlay)
  img_data = drawing_bytes


<IPython.core.display.Javascript object>

KeyboardInterrupt: ignored