# Lesson 1

## 1. Objectives

The main objective of this notebook is to replicate the results from the first lesson of Fast AI course (https://github.com/fastai/courses/blob/master/deeplearning1/nbs/lesson1.ipynb).

I decided to do the following:
1. Create vgg16 model based on vgg16 class. It should be able to distinguish between cats and dogs.
2. Take part in the Dogs vs Cats Kaggle Competition
3. Create vgg16 model which works on some other dataset.

Data should be downloaded from http://files.fast.ai/data/dogscats.zip and put into data directory.

## 2. Setting up

In [2]:
%matplotlib inline
from __future__ import division,print_function

import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt

In [3]:
import utils; reload(utils)
from utils import plots

Using TensorFlow backend.
  from ._conv import register_converters as _register_converters


In [4]:
import vgg16; reload(vgg16)
from vgg16 import Vgg16

In [None]:
# path = os.path.join("..","data","dogscats")
path = os.path.join("..","data","dogscats","sample")

In [None]:
batch_size = 64

## 3. VGG16 - cats and dogs

In [None]:
network = Vgg16()

In [None]:
train_batches = network.get_batches(os.path.join(path, "train"))
validation_batches = network.get_batches(os.path.join(path, "valid"))
network.finetune(train_batches)

In [None]:
network.fit(train_batches, validation_batches)

In [None]:
batches = network.get_batches(os.path.join(path,"train"), batch_size=4)

In [None]:
imgs,labels = next(batches)
plots(imgs, titles=labels)

## 4. Kaggle competition 

In [5]:
path = os.path.join("..","data","dogscats_kaggle", "sample")
# path = os.path.join("..","data","dogscats_kaggle")

To download the data follow the instructions: http://wiki.fast.ai/index.php/Kaggle_CLI and put the dat a to the dogscats_kaggle directory.

In [None]:
import glob
import shutil

In [None]:
def safe_mkdir(path):
    """
    Checks if a directory exists. If not, creates it.
    """
    try:
        os.makedirs(path)
    except OSError as exc:  # Python >2.5
        pass
    except AttributeError as exc:
        pass

Creating training and validation set.

Also, puts test data into a structure where we can use get_batches method easily.

In [None]:
cats_train_path = os.path.join(path, "train_grouped", "cats")
cats_valid_path = os.path.join(path, "valid_grouped", "cats")
safe_mkdir(cats_train_path)
safe_mkdir(cats_valid_path)


dogs_train_path = os.path.join(path, "train_grouped", "dogs")
dogs_valid_path = os.path.join(path, "valid_grouped", "dogs")
safe_mkdir(dogs_train_path)
safe_mkdir(dogs_valid_path)

filenames = glob.glob(os.path.join(path, "train", "*"))

cats_counter = 0
dogs_counter = 0
for filename in filenames:
    name = filename.split("/")[-1]
    if "cat" in name:
        if cats_counter < 4:
            shutil.copy(filename, os.path.join(cats_train_path, name))
            cats_counter += 1
        else:
            shutil.copy(filename, os.path.join(cats_valid_path, name))
            cats_counter = 0
    if "dog" in name:
        if dogs_counter < 4:
            shutil.copy(filename, os.path.join(dogs_train_path, name))
            dogs_counter += 1
        else:
            shutil.copy(filename, os.path.join(dogs_valid_path, name))
            dogs_counter = 0



In [None]:
filenames = glob.glob(os.path.join(path, "test", "*"))
for filename in filenames:
    name = filename.split("/")[-1]
    file_id = name.split('.')[0]
    test_path = os.path.join(path, "test", file_id)
    safe_mkdir(test_path)
    shutil.move(filename, os.path.join(test_path, name))

Creating and training the network

In [6]:
network = Vgg16()

In [7]:
train_batches = network.get_batches(os.path.join(path, "train_grouped"))
validation_batches = network.get_batches(os.path.join(path, "valid_grouped"))
network.finetune(train_batches)

Found 16 images belonging to 2 classes.
Found 8 images belonging to 2 classes.


In [8]:
network.fit(train_batches, validation_batches)

Epoch 1/1


Predicting on the test set

In [16]:
test_batch = network.get_batches(os.path.join(path, "test"), shuffle=False, batch_size=1)

with open('submission.csv', 'a') as the_file:
    the_file.write('id,label\n')


    for i in range(1, 12501):
        img, label = next(test_batch)
        prediction = network.predict(img)
        if prediction[1][0] == 0:
            probability = 1 - prediction[0][0]
        else:
            probability = prediction[0][0]
        image_id = np.where(label == 1)[1][0] + 1
        the_file.write(str(image_id))
        the_file.write(",")
        the_file.write(str(probability))
        the_file.write("\n")
        if i % 100 == 0:
            print(i)

Found 12500 images belonging to 12500 classes.
100
200


KeyboardInterrupt: 