*This tutorial needs to be run on your local laptop or desktop machine. First, make sure you are running Jupyter on your laptop or desktop. Then, click the Download button above and save the .ipynb file to your Jupyter startup folder. After the download completes, the notebook will show up in the Jupyter Dashboard. Open the downloaded notebook from the Dashboard and carry on with the tutorial steps in that copy of the notebook. (You can run this tutorial in Azure, but the final steps will fail with a Timeout Error because the Raspberry Pi device cannot be reached on the network.)*


# Boosting classifier accuracy by grouping categories

In this tutorial, we will split the 1000 image-categories, which our model was trained to classify, into three disjoint sets: *dogs*, *cats*, and *other* (anything that isn't a dog or a cat). We will demonstrate how a classifier with low accuracy on the original 1000-class problem can have a sufficiently high accuracy on the simpler 3-class problem. We will write a Python script that reads images from the camera, barks when it sees a dog, and meows when it sees a cat.

[![screenshot](https://microsoft.github.io/ELL/tutorials/Boosting-classifier-accuracy-by-grouping-categories/thumbnail.png)](https://youtu.be/SOmV8tzg_DU)

#### Materials

* Laptop or desktop computer
* Raspberry Pi
* Headphones or speakers for your Raspberry Pi
* Raspberry Pi camera or USB webcam
* *optional* - Active cooling attachment (see our [tutorial on cooling your Pi](https://microsoft.github.io/ELL/tutorials/Active-cooling-your-Raspberry-Pi-3/))

#### Prerequisites

* Install [Jupyter](http://jupyter.readthedocs.io/en/latest/install.html) on your computer
* Follow the instructions for [setting up your Raspberry Pi](https://microsoft.github.io/ELL/tutorials/Setting-up-your-Raspberry-Pi).
* Complete the basic tutorial, [Getting started with image classification on Raspberry Pi](https://notebooks.azure.com/microsoft-ell/libraries/tutorials/html/Getting%20started%20with%20image%20classification%20on%20the%20Raspberry%20Pi%20%28Part%201%29.ipynb), to learn how to use an ELL model from the Gallery.

## Overview

The pre-trained models in the [ELL gallery](https://microsoft.github.io/ELL/gallery/) are trained to identify 1000 different image categories (see the category names [here](https://github.com/Microsoft/ELL-models/raw/master/models/ILSVRC2012/categories.txt)). Often times, we are only interested in a subset of these categories and we don't require the fine-grained categorization that the model was trained to provide. For example, we may want to classify images of dogs versus images of cats, whereas the model is actually trained to distinguish between 6 different varieties of cat and 106 different varieties of dog.

The dogs versus cats classification problem is easier than the original 1000 class problem, so a model that isn't very accurate on the original problem may be perfectly adequate on the simpler problem. Specifically, we will use a model that has an error rate of 64% on the 1000-class problem, but only 5% on the 3-class problem. We will build an application that grabs a frame from a camera, plays a barking sound when it recognizes one of the dog varieties, and plays a meow sound when it recognizes one of the cat varieties.

As a pre-step, we need to install `ell` in the Azure virtual machine.

In [1]:
!conda config --prepend channels conda-forge --prepend channels microsoft-ell
!conda install -y ell

Fetching package metadata ...............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /home/nbcommon/anaconda3_431:
#
ell                       0.0.1            py36h0a24ad1_0    microsoft-ell


## Step 1: Deploy a pre-trained model on a Raspberry Pi

Start by repeating the steps of the basic tutorial, [Getting Started with Image Classification on Raspberry Pi](https://notebooks.azure.com/microsoft-ell/libraries/tutorials/html/Getting%20started%20with%20image%20classification%20on%20the%20Raspberry%20Pi%20%28Part%201%29.ipynb). This time, specify the Gallery model by name, specifically one that is faster and less accurate. As before, download the model and compile it for the Raspberry Pi.

In [1]:
from ell.pretrained_model import PretrainedModel
import ell.platform

pretrained_model = PretrainedModel('d_I160x160x3NCMNCMNBMNBMNBMNBMNC1A')
pretrained_model.download('boosting', rename='model')
pretrained_model.compile(ell.platform.PI3)

compiling...
compiled model up to date


'boosting/pi3/CMakeLists.txt'

We read in all 1000 labels from the label file. All the pet labels happen to be at the beginning of the list, in scattered locations. To keep things manageable, we consider only the first 240 labels, which includes all the dogs and cat.

In [14]:
categories = [line.strip('\n') for line in open('boosting/categories.txt', 'r').readlines()]
dog_categories = categories[151:270]
dog_categories

['Chihuahua',
 'Japanese spaniel',
 'Maltese dog, Maltese terrier, Maltese',
 'Pekinese, Pekingese, Peke',
 'Shih-Tzu',
 'Blenheim spaniel',
 'papillon',
 'toy terrier',
 'Rhodesian ridgeback',
 'Afghan hound, Afghan',
 'basset, basset hound',
 'beagle',
 'bloodhound, sleuthhound',
 'bluetick',
 'black-and-tan coonhound',
 'Walker hound, Walker foxhound',
 'English foxhound',
 'redbone',
 'borzoi, Russian wolfhound',
 'Irish wolfhound',
 'Italian greyhound',
 'whippet',
 'Ibizan hound, Ibizan Podenco',
 'Norwegian elkhound, elkhound',
 'otterhound, otter hound',
 'Saluki, gazelle hound',
 'Scottish deerhound, deerhound',
 'Weimaraner',
 'Staffordshire bullterrier, Staffordshire bull terrier',
 'American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier',
 'Bedlington terrier',
 'Border terrier',
 'Kerry blue terrier',
 'Irish terrier',
 'Norfolk terrier',
 'Norwich terrier',
 'Yorkshire terrier',
 'wire-haired fox terrier',
 'Lakeland terrier',
 

## Choosing which categories are dogs and which are cats

Since the dog and cat categories are scattered around the list, we have no choice but to choose them by hand. To make this job easier, ELL provides a handy user interface for choosing subsets of long lists, using Jupyter widgets. We'll create a checkbox for each category, so we can quickly scan the list and check the categories for dogs or cats. We'll save our checked selections to a file, otherwise our the selections would be cleared whenever we reload this notebook. (To make things extra easy, we download this file from the web, so the dogs are checked from the start.) To keep things tidy on the screen, we'll organize the checkboxes into columns, with a parameter to control how many appear per column.

In [10]:
from ell.util.choose_subset import choose_subset
import urllib.request
urllib.request.urlretrieve('https://microsoft.github.io/ELL/tutorials/Boosting-classifier-accuracy-by-grouping-categories/dogs.txt', 'boosting/dogs.txt')
dog_categories = pet

A Jupyter Widget

We similiarly create checkboxes to choose the "cat"egories.

In [11]:
urllib.request.urlretrieve('https://microsoft.github.io/ELL/tutorials/Boosting-classifier-accuracy-by-grouping-categories/cats.txt', 'boosting/cats.txt')
cat_categories = choose_subset(pet_categories, 'boosting/cats.txt', 30)

A Jupyter Widget

As a sanity check, let's make sure we didn't choose a category as both a dog and a cat. That is, the intersection of the set of dog categories and cat categories should be empty.

In [12]:
not(dog_categories & cat_categories)  # should be True

True

## Step 2: Write a script 

We will write a Python script that invokes the model on a Raspberry Pi, groups the categories as described above, and takes action if a dog or cat is recognized. ** As with the previous tutorial, change the `ip` and `user` arguments to your Raspberry Pi's IP address and your user name before running the code in the cell below. **

In [None]:
%%rpi --user=pi --ip=157.54.152.78 --rpipath=/home/pi/mymodel --model=pretrained_model

import cv2
import numpy as np
import urllib.request
import tutorialHelpers as helpers
import sys
sys.path.append('build')
import model

