
# Boosting classifier accuracy by grouping categories

In this tutorial, we will split the 1000 image-categories, which our model was trained to classify, into three disjoint sets: *dogs*, *cats*, and *other* (anything that isn't a dog or a cat). We will demonstrate how a classifier with low accuracy on the original 1000-class problem can have a sufficiently high accuracy on the simpler 3-class problem. We will write a Python script that reads images from the camera, barks when it sees a dog, and meows when it sees a cat.

[![screenshot](https://microsoft.github.io/ELL/tutorials/Boosting-classifier-accuracy-by-grouping-categories/thumbnail.png)](https://youtu.be/SOmV8tzg_DU)

#### Materials

* Laptop or desktop computer
* Raspberry Pi
* Headphones or speakers for your Raspberry Pi
* Raspberry Pi camera or USB webcam
* *optional* - Active cooling attachment (see our [tutorial on cooling your Pi](https://microsoft.github.io/ELL/tutorials/Active-cooling-your-Raspberry-Pi-3/))

#### Prerequisites

* Install [Jupyter](http://jupyter.readthedocs.io/en/latest/install.html) on your computer
* Follow the instructions for [setting up your Raspberry Pi](https://microsoft.github.io/ELL/tutorials/Setting-up-your-Raspberry-Pi).
* Complete the basic tutorial, [Getting started with image classification on Raspberry Pi](https://notebooks.azure.com/microsoft-ell/libraries/tutorials/html/Getting%20started%20with%20image%20classification%20on%20the%20Raspberry%20Pi%20%28Part%201%29.ipynb), to learn how to use an ELL model from the Gallery.

## Overview

The pre-trained models in the [ELL gallery](https://microsoft.github.io/ELL/gallery/) are trained to identify 1000 different image categories (see the category names [here](https://github.com/Microsoft/ELL-models/raw/master/models/ILSVRC2012/categories.txt)). Often times, we are only interested in a subset of these categories and we don't require the fine-grained categorization that the model was trained to provide. For example, we may want to classify images of dogs versus images of cats, whereas the model is actually trained to distinguish between 6 different varieties of cat and 106 different varieties of dog.

The dogs versus cats classification problem is easier than the original 1000 class problem, so a model that isn't very accurate on the original problem may be perfectly adequate on the simpler problem. Specifically, we will use a model that has an error rate of 64% on the 1000-class problem, but only 5% on the 3-class problem. We will build an application that grabs a frame from a camera, plays a barking sound when it recognizes one of the dog varieties, and plays a meow sound when it recognizes one of the cat varieties.

As a pre-step, we need to install `ell` in the Azure virtual machine.

In [1]:
!conda config --prepend channels conda-forge --prepend channels microsoft-ell
!conda install -y ell



Fetching package metadata .................



PackageNotFoundError: Packages missing in current channels:
            
  - ell

We have searched for the packages in the following channels:
            
  - https://conda.anaconda.org/microsoft-ell/win-64
  - https://conda.anaconda.org/microsoft-ell/noarch
  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.continuum.io/pkgs/main/win-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/win-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/win-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/win-64
  - https://repo.continuum.io/pkgs/pro/noarch
  - https://repo.continuum.io/pkgs/msys2/win-64
  - https://repo.continuum.io/pkgs/msys2/noarch
            



## Step 1: Deploy a pre-trained model on a Raspberry Pi

Start by repeating the steps of the basic tutorial, [Getting Started with Image Classification on Raspberry Pi](https://notebooks.azure.com/microsoft-ell/libraries/tutorials/html/Getting%20started%20with%20image%20classification%20on%20the%20Raspberry%20Pi%20%28Part%201%29.ipynb). This time, specify the Gallery model by name, specifically one that is faster and less accurate. As before, download the model and compile it for the Raspberry Pi.

In [7]:
from ell.pretrained_model import PretrainedModel
import ell.platform

pretrained_model = PretrainedModel('d_I160x160x3NCMNCMNBMNBMNBMNBMNC1A')
pretrained_model.download('boosting', rename='model')
pretrained_model.compile(ell.platform.PI3)

compiling...
compiled model up to date


'boosting\\pi3\\CMakeLists.txt'

We read in all 1000 labels from the label file. All the pet labels happen to be at the beginning of the list, in scattered locations. To keep things manageable, we consider only the first 240 labels, which includes all the dogs and cat.

In [3]:
categories = [line.strip('\n') for line in open('boosting/categories.txt', 'r').readlines()]
dogs = categories[151:270]
cats = categories[281:294]

## Step 2: Write a script 

We will write a Python script that invokes the model on a Raspberry Pi, groups the categories as described above, and takes action if a dog or cat is recognized. ** As with the previous tutorial, change the `ip` and `user` arguments to your Raspberry Pi's IP address and your user name before running the code in the cell below. **

In [19]:
%%rpi --user=pi --ip=157.54.152.78 --rpipath=/home/pi/compare --model=pretrained_model

import sys
import os
import numpy as np
import cv2
import time
import tutorialHelpers as helpers
import subprocess
if (os.name == "nt"):
    import winsound

# import the ELL model's Python module
import model

# Function to return an image from our camera using OpenCV
def get_image_from_camera(camera):
    if camera is not None:
        # if predictor is too slow frames get buffered, this is designed to flush that buffer
        ret, frame = camera.read()
        if (not ret):
            raise Exception('your capture device is not returning images')
        return frame
    return None

# Return an array of strings corresponding to the model's recognized categories or classes.
# The order of the strings in this file are expected to match the order of the
# model's output predictions.
def get_categories_from_file(fileName):
    labels = []
    with open(fileName) as f:
        labels = f.read().splitlines()
    return labels

# Returns True if an element of the comma separated label `a` is an element of the comma separated label `b`
def labels_match(a, b):
    x = [s.strip().lower() for s in a.split(',')]
    y = [s.strip().lower() for s in b.split(',')]
    for w in x:
        if (w in y):
            return True
    return False

# Returns True if the label is in the set of labels
def label_in_set(label, label_set):
    for x in label_set:
        if labels_match(label, x):
            return True
    return False

# Declare variables that define where to find the sounds files we will play
script_path = os.path.dirname(os.path.abspath(__file__))
woof_sound = os.path.join(script_path, "woof.wav")
meow_sound = os.path.join(script_path, "meow.wav")

# Helper function to play a sound
def play(filename):
    if (os.name == "nt"):
        winsound.PlaySound(filename, winsound.SND_FILENAME | winsound.SND_ASYNC)
    else:
        command = ["aplay", filename]
        proc = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, bufsize=0, universal_newlines = True)
        proc.wait()

# Helper function to decide what action to take when we detect a group
def take_action(group):
    if group == "Dog":
        # A prediction in the dog category group was detected, play a `woof` sound
        play(woof_sound)
    elif group == "Cat":
        # A prediction in the cat category group was detected, play a `meow` sound
        play(meow_sound)

# Open the video camera. To use a different camera, change the camera index.
camera = cv2.VideoCapture(0)

# Read the category labels
categories = get_categories_from_file("categories.txt")
dogs = categories[151:270]
cats = categories[281:294]

# Get the model's input dimensions. We'll use this information later to resize images appropriately.
inputShape = model.get_default_input_shape()

# Create a vector to hold the model's output predictions
outputShape = model.get_default_output_shape()
predictions = model.FloatVector(outputShape.Size())

headerText = ""

# Get an image from the camera. If you'd like to use a different image, load the image from some other source.
image = get_image_from_camera(camera)

# Prepare the image to pass to the model. This helper:
# - crops and resizes the image maintaining proper aspect ratio
# - reorders the image channels if needed
# - returns the data as a ravelled numpy array of floats so it can be handed to the model
input = helpers.prepare_image_for_model(image, inputShape.columns, inputShape.rows)

# Get the predicted classes using the model's predict function on the image input data. 
# The predictions are returned as a vector with the probability that the image
# contains the class represented by that index.
model.predict(input, predictions)

# Let's grab the value of the top prediction and its index, which represents the top most 
# confident match and the class or category it belongs to.
topN = helpers.get_top_n_predictions(predictions, 1)

# See whether the prediction is in one of our groups
group = ""
caption = ""
label = ""
if len(topN) > 0:
    top = topN[0]
    label = categories[top[0]]
    if label_in_set(label, dogs):
        group = "Dog"
    elif label_in_set(label, cats):
        group = "Cat"

if group != "":
    # A group was detected, so take action
    top = topN[0]
    take_action(group)
    headerText = "(" + str(int(top[1]*100)) + "%) " + group
    lastPredictionTime = now
    lastHist = hist
else:
    # No group was detected
    headerText = "? " + label

print('PREDICTION')
print(headerText, flush=True)

PREDICTION
? file, file cabinet, filing cabinet
