# Getting Images

With digital images or videos computers can be made to gain high-level understanding from digital images or videos and this is where Computer Vision comes into play.

Computer Vision, often abbreviated as CV, is defined as a field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos.

The problem of computer vision appears simple because it is trivially solved by people, even very young children. Nevertheless, it largely remains an unsolved problem based both on the limited understanding of biological vision and because of the complexity of vision perception in a dynamic and nearly infinitely varying physical world.


### Computer Vision

We are awash in images.

Smartphones have cameras, and taking a photo or video and sharing it has never been easier, resulting in the incredible growth of modern social networks like Instagram.

YouTube might be the second largest search engine and hundreds of hours of video are uploaded every minute and billions of videos are watched every day.

The internet is comprised of text and images. It is relatively straightforward to index and search text, but in order to index and search images, algorithms need to know what the images contain. For the longest time, the content of images and video has remained opaque, best described using the meta descriptions provided by the person that uploaded them.

To get the most out of image data, we need computers to “see” an image and understand the content.

This is a trivial problem for any human being.

    * A person can describe the content of a photograph they have seen once.
    * A person can summarize a video that they have only seen once.
    * A person can recognize a face that they have only seen once before.


Computer vision is a field of study focused on the problem of helping computers to see. At an abstract level, the goal of computer vision problems is to use the observed image data to infer something about the world.

<img src="./images/Computer-Vision.png">

### OpenCV

`OpenCV` is a library of programming functions mainly aimed at real-time computer vision. In addition to OpenCV, we will find ourselves use the following libraries in abundance as well. `Matplotlib` is an optional choice for displaying frames from video or images. We will show a couple of examples using it here. `Numpy` is used for all things "numbers and Python."


Lets go through few basic CV concepts.

NOTE : `cv2.waitkey(0)` is given so that everytime an image window popsup, just press any key to close the window and the program.

So every time a window pops up, press `ESC` key. Do not close the window manually, because, if you do so, you will have to stop and restart the kernel. (Two buttons are right next to Run)

#### RGB Color 

The red, green and blue light are added together in various ways to reproduce a broad array of colors and is widely used for sensing, representation, and display of images in electronic systems, such as televisions and computers.

So a single RGB image pixel will have 3 values of R,G and B each ranging from 0-255 which points to their respective intensities. Say, if this image is converted to a gray scale image, its single pixel will have only one value of white intensity ranging from 0-255.


<img src="./images/rgb.png">

#### Reading an image

To read the original image, simply call the `imread` function of the cv2 module, passing as input the path to the image, as a string.

Run the following code which will read the image and open it in OpenCV as a popup window.

In [None]:
import cv2

img = cv2.imread('./images/lenna.png')
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Converting an image to GrayScale

Converting images to GrayScale will convert images from values of type RGB (Red,Blue & Green) to W/B (White/Black) which makes processing tasks like finding edges, contours, etc. much easier.

For this, we need to call the `cvtColor` function, which allows to convert the image from a color space to another.

As first input, this function receives the original image. As second input, it receives the color space conversion code. Since we want to convert our original image from the BGR color space to gray, we use the code `COLOR_BGR2GRAY`.

In [2]:
import cv2
  
img = cv2.imread('./images/lenna.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  
cv2.imshow('Gray image', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Edge Detection

Edge detection is an image processing technique for finding the boundaries of objects within images. It works by detecting discontinuities in brightness. Edge detection is used for image segmentation and data extraction in computer vision.

Canny Edge Detection is one such method used to detect the edges in an image. It accepts a gray scale image as input and it uses a multistage algorithm.

You can perform this operation on an image using the `Canny()` method, following is the syntax of this method.

`Canny(image, edges, threshold1, threshold2)`

Parameters −

    image − A Mat object representing the source (input image) for this operation.

    edges − A Mat object representing the destination (edges) for this operation.

    threshold1 − A variable of the type double representing the first threshold for the hysteresis procedure.

    threshold2 − A variable of the type double representing the second threshold for the hysteresis procedure.


In [3]:
import cv2

img = cv2.imread('./images/lenna.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

edged = cv2.Canny(gray,30,200)
cv2.imshow('Canny Edges',edged)
cv2.waitKey(0)
cv2.destroyAllWindows()

#### Contour Detection

Contours can be explained simply as a curve joining all the continuous points (along the boundary), having same color or intensity. The contours are a useful tool for shape analysis and object detection and recognition.

In [4]:
import cv2

img = cv2.imread('./images/lenna.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

edged = cv2.Canny(gray,30,200)

# findContours updates the edged variable
img2, contours, hierarchy=cv2.findContours(edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
cv2.imshow('Canny Edges after Contouring', edged)

# drawing all found contours
cv2.drawContours(img, contours, -1, (0,255,0), 3)
cv2.imshow('Contours', img)

cv2.waitKey(0)
cv2.destroyAllWindows()

### Numpy

The NumPy package is the workhorse of data analysis, machine learning, and scientific computing in the python ecosystem. It vastly simplifies manipulating and crunching vectors and matrices. OpenCV relies on Numpy.


#### Creating Arrays
<img src="./images/create-numpy-array-1.png">
<img src="./images/create-numpy-array-ones-zeros-random.png">

#### Array Arithmetic

<img src="./images/numpy-array-subtract-multiply-divide.png">
<img src="./images/numpy-array-broadcast.png">

#### Array Indexing

<img src="./images/numpy-array-slice.png">

#### Creating Matrices

<img src="./images/numpy-array-create-2d.png">
<img src="./images/numpy-matrix-ones-zeros-random.png">
<img src="./images/numpy-3d-array-creation.png">


### Image Representation

An image is a matrix of pixels of size (height x width).


#### Grayscale Image

If the image is black and white (a.k.a. grayscale), each pixel can be represented by a single number (commonly between 0 (black) and 255 (white)

<img src="./images/numpy-grayscale-image.png">


#### Color Image

If the image is colored, then each pixel is represented by three numbers - a value for each of Red, Green, and Blue. In that case we need a 3rd dimension (because each cell can only contain one number). So a colored image is represented by an ndarray of dimensions: (height x width x 3).

<img src="./images/numpy-color-image.png">

Remember the color Leena image from before ? 

Let's try printing it and out to see how this color image is stored as a 3D (BGR) numpy matrix by OpenCV by running the snippet below. 

In [5]:
import numpy as np
import cv2

img = cv2.imread('./images/lenna.png')
print(img.shape)

(512, 512, 3)


### Getting Image from NAO

On the same note, once we get the image stream from NAO we will be able to process it and make sense of it.

We get images from NAO's top camera and convert them to numpy array to process using OpenCV. This is done by Subscribing to `ALVideoDevice` proxy which gives raw image in form of pixel array which can be converted to numpy array using `np.asarray()` function and reshaped to a 3 dimensional color image of (height x width x 3).

In [1]:
import cv2
import numpy as np
import time

from vision_definitions import kQVGA,kBGRColorSpace
from naoqi import ALProxy

NAO_IP="192.168.1.7" # <YOUR_NAO_IP> or nao.local


if __name__=="__main__":  # Should not run when imported

    camera_index = 0 # Top camera
   
    # Proxy for ALVideoDevice
    name = "nao_opencv"
    video = ALProxy("ALVideoDevice", NAO_IP, 9559)

    # Subscribe to video device on a specific camera
    # BGR for OpenCV
    name = video.subscribeCamera(name,camera_index,kQVGA,kBGRColorSpace,30)
    print "Subscribed to ", name

    try:
        frame = None
        # Keep Looping
        while True:
            # Get image
            img = video.getImageRemote(name)

            # Get image size attributes and pixel array buffer
            imageWidth = img[0]
            imageHeight = img[1]
            numChannels = img[2]
            imgBuffer = img[6]
         
            # Get OpenCV image (allocate on first pass)
            if frame is None:
                print 'Grabbed image: ',imageWidth,'x',imageHeight,' numChannels=',numChannels
                frame=np.asarray(bytearray(imgBuffer), dtype=np.uint8)
                frame=frame.reshape((imageHeight,imageWidth,3))
            else:
                frame.data=bytearray(imgBuffer)

            # Display the frame to our screen
            # NOTE : Do not run this code if your run your python in the robot
            # as NAO has no screen to show
            cv2.imshow("Frame", frame)
            
            # Get the key pressed in the image window
            key = cv2.waitKey(33)&0xFF
            if  key == ord('q') or key == 27:
                # Exit loop when 'q' or 'Esc' is pressed on the image window
                break
            
    finally: # As fallback we'll make sure to unsubscribe
        print "Unsubscribing ",name

cv2.destroyAllWindows()
video.unsubscribe(name)


Subscribed to  nao_opencv_1
Grabbed image:  320 x 240  numChannels= 3




Unsubscribing  nao_opencv_1


True

<img src="./images/nao-camera.png">

### Saving Images for Dataset

Now we know what NAO sees. Before we proceed to do gesture recognition on what NAO sees, we need to save some sample images on which we will train the AI model to recognize the gesture when it sees one.

We will be recognizing a left hand and a right hand signal as the gestures. To do this, we will be saving atleast 10 images (more images lead to better recognition) to train for each left and right hand seperately and also a couple of images in each gesture to test the trained model.

For convenience purposes, we will be saving the images for each gesture in their respective folders as follows.


In [None]:
+-----------+------------+--------------------+
| Images For| Gesture    | Folder Path        |
+-----------+------------+--------------------+
| Training  | Left Hand  | /data/train/left/  |
| Training  | Right Hand | /data/train/right/ |
|  Testing  | Left Hand  | /data/test/left/   |
|  Testing  | Right Hand | /data/test/right/  |
+-----------+------------+--------------------+

#### Getting Absolute Path

We will be saving the images in the `/ai4all/data/` folder in this project respectively as above. 
But every person might have this project in a different location in their device.
Hence, we need to get the absolute path of this project as below.

In [3]:
from os.path import dirname, abspath

# Inside the script use abspath('') to obtain the absolute path of this script
# Call os.path.dirname twice to get parent directory of this directory    

parent_directory = dirname(dirname(abspath('')))

print(parent_directory)

/home/rama/workspace/ai4all


#### Running Script to Save Images

Before we proceed to run the below script that saves images for training and test each for left and right hand gestures respectively. 

We need to replace `NAO_IP` with our NAO's IP.

Also, based on what images you are trying to capture, replace `data_directory` correspondingly from above table.

Now we can proceed to run the below code which will open a Popup of the image frame from NAO's camera and what it sees.

Press `C` key to capture the image once you position your left/right hand in the center of the camera to save the image and you can keep doing so till the number of images you would like to save.

Press `ESC` key when you are done to close the popup.

Repeat the same for train (left and right) and test (left and right) respectively.


In [7]:
import cv2
import numpy as np

from os.path import dirname, abspath
from vision_definitions import kQVGA,kBGRColorSpace
from naoqi import ALProxy


NAO_IP="192.168.1.7" # <YOUR_NAO_IP> or nao.local


if __name__=="__main__":  # Should not run when imported

    camera_index = 0 # Top camera
    image_count = 0 # We will use this to save image name as <image_count>.jpg
    
    
    parent_directory = dirname(dirname(abspath(''))) # Will give ../ai4all/ which is the parent folder
    
    data_directory = "/data/train/" # Replace which images you are saving and corresponding folder path
    
    image_prefix = parent_directory + data_directory
    image_suffix = ".jpg"

    # Proxy for ALVideoDevice
    name = "nao_opencv"
    video = ALProxy("ALVideoDevice", NAO_IP, 9559)

    # Subscribe to video device on a specific camera
    # BGR for OpenCV
    name = video.subscribeCamera(name,camera_index,kQVGA,kBGRColorSpace,30)
    print "Subscribed to ", name

    try:
        frame = None
        # Keep Looping
        while True:
            # Get image
            img = video.getImageRemote(name)

            # Get image attributes
            width = img[0]
            height = img[1]
            nchannels = img[2]
            imgbuffer = img[6]
            
            # Get OpenCV image (allocate on first pass)
            if frame is None:
                print 'Grabbed image: ',width,'x',height,' nchannels=',nchannels
                frame=np.asarray(bytearray(imgbuffer), dtype=np.uint8)
                frame=frame.reshape((height,width,3))
            else:
                frame.data=bytearray(imgbuffer)

            # Display the frame to our screen
            # NOTE : Do not run this code if your run your python in the robot
            # as NAO has no screen to show
            cv2.imshow("Frame", frame)

            # Get the key pressed in the image window
            key = cv2.waitKey(33)&0xFF
            if  key == ord('q') or key == 27:
                # Exit loop when 'q' or 'Esc' is pressed on the image window
                break
            elif key == 99:
                # When 'c' key is pressed -> Capture/Save image
                
                # Let's crop the image frame so the focus is in center
                upper_left = (80, 40)     #Crop: top left point
                bottom_right = (230, 190) #Crop: bottom right point
                cropped_frame = frame[upper_left[1] : bottom_right[1], upper_left[0] : bottom_right[0]]
                
                # Converting cropped color image to Grayscale
                gray_frame = cv2.cvtColor(cropped_frame, cv2.COLOR_BGR2GRAY)
                
                # Lets resize thee grayscale image to 28 x 28 (height x width) for convenience
                resized_frame = cv2.resize(gray_frame, dsize=(28, 28), interpolation=cv2.INTER_CUBIC)
                
                # Uncomment below line to see cropped and resized image frame
                # cv2.imshow("Resized Frame", resized_frame)
                
                # We will create a file path string as 
                # ../<parent directory>/.. <data directory>/ <image_count>.jpg
                image_path = image_prefix + str(image_count) + image_suffix
                
                # imwrite function of OpenCV saves the resized image in the above specified folder
                cv2.imwrite(image_path, resized_frame)
                
                # Change to image count so that the next image will be saved with next number
                # If not every image we save will be overwritten by same file name
                image_count = image_count+1

    finally: # As fallback we'll make sure to unsubscribe
        print "Unsubscribing ",name
       
video.unsubscribe(name)


Subscribed to  nao_opencv_1
Grabbed image:  320 x 240  nchannels= 3




Unsubscribing  nao_opencv_1


True

Saved Left Hand Signal Image

<img src="../../data/train/left/16.jpg">

Saved Right Hand Signal Image

<img src="../../data/train/right/2.jpg">


In [None]:
Now that we have got the image successfully, we proceed to recognize the gesture in the image.