# Using deep features to build an image classifier

# Fire up GraphLab Create

In [31]:
import graphlab

# Load a common image analysis dataset

We will use a popular benchmark dataset in computer vision called CIFAR-10.  

(We've reduced the data to just 4 categories = {'cat','bird','automobile','dog'}.)

This dataset is already split into a training set and test set.  

In [32]:
image_train = graphlab.SFrame('image_train_data/')
image_test = graphlab.SFrame('image_test_data/')

IOError: /Users/robert/Dropbox/Courses/MachineLearningCoursera/c1w6-Deep-Learning/deep features for image classification/image_train_data not found.: unspecified iostream_category error: unspecified iostream_category error

# Exploring the image data

In [30]:
graphlab.canvas.set_target('ipynb')

In [None]:
image_train['image'].show()

# Train a classifier on the raw image pixels

We first start by training a classifier on just the raw pixels of the image.

In [None]:
raw_pixel_model = graphlab.logistic_classifier.create(image_train,target='label',
                                              features=['image_array'])

# Make a prediction with the simple model based on raw pixels

In [None]:
image_test[0:3]['image'].show()

In [None]:
image_test[0:3]['label']

In [None]:
raw_pixel_model.predict(image_test[0:3])

The model makes wrong predictions for all three images.

# Evaluating raw pixel model on test data

In [None]:
raw_pixel_model.evaluate(image_test)

The accuracy of this model is poor, getting only about 46% accuracy.

# Can we improve the model using deep features

We only have 2005 data points, so it is not possible to train a deep neural network effectively with so little data.  Instead, we will use transfer learning: using deep features trained on the full ImageNet dataset, we will train a simple model on this small dataset.

In [4]:
len(image_train)

NameError: name 'image_train' is not defined

## Computing deep features for our images

The two lines below allow us to compute deep features.  This computation takes a little while, so we have already computed them and saved the results as a column in the data you loaded. 

(Note that if you would like to compute such deep features and have a GPU on your machine, you should use the GPU enabled GraphLab Create, which will be significantly faster for this task.)

In [5]:
#deep_learning_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')
#image_train['deep_features'] = deep_learning_model.extract_features(image_train)

As we can see, the column deep_features already contains the pre-computed deep features for this data. 

In [6]:
image_train.head()

NameError: name 'image_train' is not defined

# Given the deep features, let's train a classifier

In [7]:
deep_features_model = graphlab.logistic_classifier.create(image_train,
                                                         features=['deep_features'],
                                                         target='label')

NameError: name 'image_train' is not defined

# Apply the deep features model to first few images of test set

In [8]:
image_test[0:3]['image'].show()

NameError: name 'image_test' is not defined

In [9]:
deep_features_model.predict(image_test[0:3])

NameError: name 'deep_features_model' is not defined

The classifier with deep features gets all of these images right!

# Compute test_data accuracy of deep_features_model

As we can see, deep features provide us with significantly better accuracy (about 78%)

In [10]:
deep_features_model.evaluate(image_test)

NameError: name 'deep_features_model' is not defined

# quiz 1) 
Computing summary statistics of the data: Sketch summaries are techniques for computing summary statistics of data very quickly. In GraphLab Create, SFrames and SArrays include a method:


1
.sketch_summary()
which computes such summary statistics. Using the training data, compute the sketch summary of the ‘label’ column and interpret the results. What’s the least common category in the training data? Save this result to answer the quiz at the end.

In [11]:
image_train['label'].sketch_summary()

NameError: name 'image_train' is not defined

# quiz 2)
2. Creating category-specific image retrieval models: In most retrieval tasks, the data we have is unlabeled, thus we call these unsupervised learning problems. However, we have labels in this image dataset, and will use these to create one model for each of the 4 image categories, {‘dog’,’cat’,’automobile’,bird’}. To start, follow these steps:

Split the SFrame with the training data into 4 different SFrames. Each of these will contain data for 1 of the 4 categories above. Hint: if you use a logical filter to select the rows where the ‘label’ column equals ‘dog’, you can create an SFrame with only the data for images labeled ‘dog’.
Similarly to the image retrieval notebook you downloaded, you are going to create a nearest neighbor model using the 'deep_features' as the features, but this time create one such model for each category, using the corresponding subset of the training_data. You can call the model with the ‘dog’ data the dog_model, the one with the ‘cat’ data the cat_model, as so on.
You now have a nearest neighbors model that can find the nearest ‘dog’ to any image you give it, the dog_model; one that can find the nearest ‘cat’, the cat_model; and so on.

Using these models, answer the following questions. The cat image below is the first in the test data:


You can access this image, similarly to what we did in the iPython notebooks above, with this command:



1
image_test[0:1]
What is the nearest ‘cat’ labeled image in the training data to the cat image above (the first image in the test data)? Save this result.
Hint: When you query your nearest neighbors model, it will return a SFrame that looks something like this:

query_label	reference_label	distance	rank
0	34	42.9886641167	1
0	45	43.8444904098	2
0	251	44.2634660468	3
0	141	44.377719559	4
To understand each column in this table, see this documentation. For this question, the ‘reference_label’ column will be important, since it provides the index of the nearest neighbors in the dataset used to train it. (In this case, the subset of the training data labeled ‘cat’.)

What is the nearest ‘dog’ labeled image in the training data to the cat image above (the first image in the test data)? Save this result.

In [12]:
dog_label = image_train[image_train['label'] == 'dog']

NameError: name 'image_train' is not defined

In [13]:
dog_label.head()

NameError: name 'dog_label' is not defined

In [14]:
cat_label = image_train[image_train['label'] == 'cat']

NameError: name 'image_train' is not defined

In [15]:
automobile_label = image_train[image_train['label'] == 'automobile']

NameError: name 'image_train' is not defined

In [16]:
bird_label = image_train[image_train['label'] == 'bird']

NameError: name 'image_train' is not defined

In [17]:
dog_model = graphlab.nearest_neighbors.create(dog_label,features=['deep_features'])

NameError: name 'dog_label' is not defined

In [18]:
cat_model = graphlab.nearest_neighbors.create(cat_label,features=['deep_features'])

NameError: name 'cat_label' is not defined

In [19]:
automobile_model = graphlab.nearest_neighbors.create(automobile_label,features=['deep_features'])

NameError: name 'automobile_label' is not defined

In [20]:
bird_model = graphlab.nearest_neighbors.create(bird_label,features=['deep_features'])

NameError: name 'bird_label' is not defined

In [21]:
bird_model.show()

NameError: name 'bird_model' is not defined

In [22]:
dog_nearest = dog_model.similarity_graph(k=1)

NameError: name 'dog_model' is not defined

In [23]:
dog_nearest.edges

NameError: name 'dog_nearest' is not defined

In [24]:
cat_nearest = cat_model.similarity_graph(k=1)

NameError: name 'cat_model' is not defined

In [25]:
cat_nearest.edges

NameError: name 'cat_nearest' is not defined

In [26]:
automobile_nearest = automobile_model.similarity_graph(k=1)

NameError: name 'automobile_model' is not defined

In [27]:
automobile_nearest.edges

NameError: name 'automobile_nearest' is not defined

In [28]:
bird_nearest = bird_model.similarity_graph(k=1)

NameError: name 'bird_model' is not defined

In [29]:
bird_nearest.edges

NameError: name 'bird_nearest' is not defined