# Using Convolutional Neural Networks

Welcome to the first week of the first deep learning certificate! We're going to use convolutional neural networks (CNNs) to allow our computer to see - something that is only possible thanks to deep learning.

In [None]:
%matplotlib inline

In [None]:
path = "data/dogs/"
#path = "data/dogs/sample/"

In [None]:
from __future__ import division,print_function

import os, json
from glob import glob
import numpy as np
import pandas as pd
import re
from keras.utils.data_utils import get_file
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt

We have created a file most imaginatively called 'utils.py' to store any little convenience functions we'll want to use. We will discuss these as we use them.

In [None]:
import utils; reload(utils)
from utils import plots

Our first step is simply to use a model that has been fully created for us, which can recognise a wide variety (1,000 categories) of images. We will use 'VGG', which won the 2014 Imagenet competition, and is a very simple model to create and understand. The VGG Imagenet team created botha larger, slower, slightly more accurate model (*VGG  19*) and a smaller, faster model (*VGG 16*). We will be using VGG 16 since the much slower performance of VGG19 is generally not worth the very minor improvement in accuracy.

We have created a python class, *Vgg16*, which makes using the VGG 16 model very straightforward. 

## The punchline: state of the art custom model in 7 lines of code

Here's everything you need to do to get >97% accuracy on the Dogs vs Cats dataset - we won't analyze how it works behind the scenes yet, since at this stage we're just going to focus on the minimum necessary to actually do useful work.

In [None]:
# Import our class, and instantiate
import vgg16; reload(vgg16)
from vgg16 import Vgg16

batch_size=64
vgg = Vgg16()

In [5]:
batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=3)

Found 9235 images belonging to 120 classes.
Found 987 images belonging to 120 classes.
['affenpinscher', 'afghan_hound', 'african_hunting_dog', 'airedale', 'american_staffordshire_terrier', 'appenzeller', 'australian_terrier', 'basenji', 'basset', 'beagle', 'bedlington_terrier', 'bernese_mountain_dog', 'black-and-tan_coonhound', 'blenheim_spaniel', 'bloodhound', 'bluetick', 'border_collie', 'border_terrier', 'borzoi', 'boston_bull', 'bouvier_des_flandres', 'boxer', 'brabancon_griffon', 'briard', 'brittany_spaniel', 'bull_mastiff', 'cairn', 'cardigan', 'chesapeake_bay_retriever', 'chihuahua', 'chow', 'clumber', 'cocker_spaniel', 'collie', 'curly-coated_retriever', 'dandie_dinmont', 'dhole', 'dingo', 'doberman', 'english_foxhound', 'english_setter', 'english_springer', 'entlebucher', 'eskimo_dog', 'flat-coated_retriever', 'french_bulldog', 'german_shepherd', 'german_short-haired_pointer', 'giant_schnauzer', 'golden_retriever', 'gordon_setter', 'great_dane', 'great_pyrenees', 'greater_swi

In [6]:
#Testing
batch_size=64
test_batches, predictions = vgg.test(path+'test', batch_size = batch_size * 2)

ids = list(map(lambda x: [re.search('.+\/(.+)\.jpg', x).group(1)], test_batches.filenames))
subm = np.hstack((ids, predictions))
class_str = ','.join(['id'] + vgg.classes)
N = len(vgg.classes)
format_str = ','.join(['%s'] + ['%s']*N)
submission_file_name = 'dogs_races.csv'
np.savetxt(submission_file_name, subm, fmt=format_str, header=class_str, comments='')

Found 10357 images belonging to 1 classes.


In [7]:
results_filename = path + 'weights–ft%d.h5'
vgg.model.save_weights(results_filename)

In [4]:
from IPython.display import FileLink
submission_file_name = 'dogs_races.csv'
FileLink(submission_file_name)