# Lesson 1

## 1. Objectives

The main objective of this notebook is to replicate the results from the first lesson of Fast AI course (https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson1.ipynb).

I decided to do the following:
1. Create vgg16 model based on vgg16 class. It should be able to distinguish between cats and dogs.
2. Take part in the Dogs vs Cats Kaggle Competition
3. Create vgg16 model which works on some other dataset.

Data should be downloaded from http://files.fast.ai/data/dogscats.zip and put into data directory.

## 2. Setting up

In [1]:
%matplotlib inline
from __future__ import division,print_function

import os, json 
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt

In [2]:
import utils; reload(utils)
from utils import plots

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
Using Theano backend.


In [3]:
import vgg16; reload(vgg16)
from vgg16 import Vgg16

## 3. VGG16 - cats and dogs

In [4]:
# path = os.path.join("..","data","dogscats")
path = os.path.join("..","data","dogscats","sample")

In [5]:
batch_size = 64

In [6]:
network = Vgg16()

In [None]:
train_batches = network.get_batches(os.path.join(path, "train"))
validation_batches = network.get_batches(os.path.join(path, "valid"))
network.finetune(train_batches)

In [None]:
network.fit(train_batches, validation_batches)

In [None]:
batches = network.get_batches(os.path.join(path,"train"), batch_size=4)

In [None]:
imgs,labels = next(batches)
plots(imgs, titles=labels)

## 4. Cats and dogs - Kaggle competition 

In [4]:
path = os.path.join("..", "dogscats_data")

### 4.1 Setting up directories structure

To download the data follow the instructions: http://wiki.fast.ai/index.php/Kaggle_CLI and put the dat a to the dogscats_data directory.

#### Imports + helper function:

In [170]:
import glob
import shutil
def safe_mkdir(path):
    """
    Checks if a directory exists. If not, creates it.
    """
    try:
        os.makedirs(path)
    except OSError as exc:  # Python >2.5
        pass
    except AttributeError as exc:
        pass

#### Creating training and validation set.

Also, puts test data into a structure where we can use get_batches method easily.

In [24]:
cats_train_path = os.path.join(path, "train_grouped", "cats")
cats_valid_path = os.path.join(path, "valid_grouped", "cats")
safe_mkdir(cats_train_path)
safe_mkdir(cats_valid_path)


dogs_train_path = os.path.join(path, "train_grouped", "dogs")
dogs_valid_path = os.path.join(path, "valid_grouped", "dogs")
safe_mkdir(dogs_train_path)
safe_mkdir(dogs_valid_path)

filenames = glob.glob(os.path.join(path, "train", "*"))

cats_counter = 0
dogs_counter = 0
for filename in filenames:
    name = filename.split("/")[-1]
    if "cat" in name:
        if cats_counter < 3:
            shutil.copy(filename, os.path.join(cats_train_path, name))
            cats_counter += 1
        else:
            shutil.copy(filename, os.path.join(cats_valid_path, name))
            cats_counter = 0
    if "dog" in name:
        if dogs_counter < 3:
            shutil.copy(filename, os.path.join(dogs_train_path, name))
            dogs_counter += 1
        else:
            shutil.copy(filename, os.path.join(dogs_valid_path, name))
            dogs_counter = 0

#### Creating test set.

In [44]:
filenames = glob.glob(os.path.join(path, "test", "*"))
for filename in filenames:
    name = filename.split("/")[-1]
    file_id = name.split('.')[0]
    test_path = os.path.join(path, "test", file_id)
    safe_mkdir(test_path)
    shutil.move(filename, os.path.join(test_path, name))

### 4.2 Creating and training the network

In [5]:
batch_size = 64
network = Vgg16()

In [26]:
train_batches = network.get_batches(os.path.join(path, "train_grouped"), batch_size=batch_size)
validation_batches = network.get_batches(os.path.join(path, "valid_grouped"),batch_size=batch_size)
network.finetune(train_batches)

Found 18750 images belonging to 2 classes.
Found 6250 images belonging to 2 classes.


In [27]:
number_of_epochs = 3
model_name = "model_2"

In [28]:
for i in range(number_of_epochs):
    network.fit(train_batches, validation_batches)
    network.model.save_weights(model_name + "_" + str(i) + ".h5")

Epoch 1/1
Epoch 1/1
Epoch 1/1


### 4.2b Alternatively load existing model

In [6]:
network.model.load_weights("model_1_2.h5")

### 4.3 Make predictions on the testing set

In [148]:
final_classes = []
final_probabilities = []
final_ids = []
total_number_of_cases = 12500
batch_size = 64

# since get_batches takes images in the order using the method below
all_image_names = sorted(os.listdir(os.path.join(path, "test")))

test_batch = network.get_batches(os.path.join(path, "test"), shuffle=False, batch_size=batch_size)
for i in range(0, int(total_number_of_cases / batch_size) + 1):
    imgs, labels = next(test_batch)
    prediction = network.predict(imgs)
    image_ids = all_image_names[i*64:(i+1)*64]
    image_ids = [int(image_id) for image_id in image_ids]
    probabilities = prediction[0]
    classes = prediction[1]
    final_classes += list(classes)
    final_probabilities += list(probabilities)
    final_ids += image_ids
    if i % 30 == 0:
        print("progress", i*batch_size / total_number_of_cases)

results = np.transpose(np.vstack([final_ids, final_classes, final_probabilities]))
# It changes probabilities for cats to be near 0.
results[results[:,1]==0, 2] = 1 - results[results[:,1]==0, 2]


Found 12500 images belonging to 12500 classes.
0.0
0.1536
0.3072
0.4608
0.6144
0.768
0.9216


In [149]:
np.savetxt(model_name + "_results.csv", results, delimiter=",")

### 4.3b Alternatively load existing results

In [36]:
results = np.genfromtxt("results.csv",delimiter=",")

### 4.4 Check if the results make sense

Found 12500 images belonging to 12500 classes.


In [None]:
test_batch = network.get_batches(os.path.join(path, "test"), shuffle=False, batch_size=4)
imgs, labels = next(test_batch)
plots(imgs)
results[:4]

Found 12500 images belonging to 12500 classes.


array([[ 1.,  1.,  1.],
       [ 2.,  1.,  1.],
       [ 3.,  1.,  1.],
       [ 4.,  1.,  1.]])

### 4.5 Write results to the submission file

In [167]:
results = results[results[:,0].argsort()]
image_ids = results[:,0]
predictions = np.clip(results[:, 2],0.02,0.98)
with open('submission.csv', 'w') as the_file:
    the_file.write('id,label\n')
    for image_id, prediction in zip(image_ids, predictions):
        the_file.write(str(int(image_id)))
        the_file.write(",")
        the_file.write(str(prediction))
        the_file.write("\n")


In [168]:
from IPython.display import FileLink
FileLink('submission.csv')