# Using Convolutional Neural Networks

Welcome to the first week of the first deep learning certificate! We're going to use convolutional neural networks (CNNs) to allow our computer to see - something that is only possible thanks to deep learning.

## Introduction to this week's task: 'Dogs vs Cats'

We're going to try to create a model to enter the [Dogs vs Cats](https://www.kaggle.com/c/dogs-vs-cats) competition at Kaggle. There are 25,000 labelled dog and cat photos available for training, and 12,500 in the test set that we have to try to label for this competition. According to the Kaggle web-site, when this competition was launched (end of 2013): *"**State of the art**: The current literature suggests machine classifiers can score above 80% accuracy on this task"*. So if we can beat 80%, then we will be at the cutting edge as of 2013!

## Basic setup

There isn't too much to do to get started - just a few simple configuration steps.

This shows plots in the web page itself - we always wants to use this when using jupyter notebook:

In [2]:
%matplotlib inline

Define path to data: (It's a good idea to put it in a subdirectory of your notebooks folder, and then exclude that directory from git control by adding it to .gitignore.)

In [40]:
path = "data/dogscats-new/"
#path = "data/dogscats-new/sample/"

A few basic libraries that we'll need for the initial exercises:

In [4]:
from __future__ import division,print_function

import os, json
from glob import glob
import numpy as np
import pandas as pd
from keras.utils.data_utils import get_file
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt

Using Theano backend.
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)


We have created a file most imaginatively called 'utils.py' to store any little convenience functions we'll want to use. We will discuss these as we use them.

In [5]:
import utils; reload(utils)
from utils import plots

Our first step is simply to use a model that has been fully created for us, which can recognise a wide variety (1,000 categories) of images. We will use 'VGG', which won the 2014 Imagenet competition, and is a very simple model to create and understand. The VGG Imagenet team created botha larger, slower, slightly more accurate model (*VGG  19*) and a smaller, faster model (*VGG 16*). We will be using VGG 16 since the much slower performance of VGG19 is generally not worth the very minor improvement in accuracy.

We have created a python class, *Vgg16*, which makes using the VGG 16 model very straightforward. 

## The punchline: state of the art custom model in 7 lines of code

Here's everything you need to do to get >97% accuracy on the Dogs vs Cats dataset - we won't analyze how it works behind the scenes yet, since at this stage we're just going to focus on the minimum necessary to actually do useful work.

In [1]:
# Import our class, and instantiate
import vgg16; reload(vgg16)
from vgg16 import Vgg16

batch_size=64
vgg = Vgg16()

FILES_PATH = 'http://files.fast.ai/models/'; CLASS_FILE='imagenet_class_index.json'

#fpath = get_file('vgg16.h5', FILES_PATH+'vgg16.h5', cache_subdir='models')
#vgg.model.load_weights(fpath)

batches = vgg.get_batches(path+'train', batch_size=batch_size)
val_batches = vgg.get_batches(path+'valid', batch_size=batch_size*2)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=3)

Using Theano backend.
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.


NameError: name 'path' is not defined

In [44]:
#Testing
batch_size=4
test_batches, predictions = vgg.test(path+'test', batch_size = batch_size * 2)

isdog = np.clip(predictions[:, 1], 0.05, 0.95)
ids = list(map(lambda x: int(re.search('.*?(\d+)\.jpg', x).group(1)), test_batches.filenames)) 
subm = np.stack([ids, isdog], axis=1)
np.savetxt('dogsvscats.csv', subm, fmt='%d,%.5f', header='id,label', comments='')

Found 12500 images belonging to 1 classes.
