
## Work through the first Lesson for Practical Deep Learning for Coders
Slight update to revert back from current FastAI verson to version compatable with coursework. The notebook is forked from [William Hortons's Fast AI Lesson 1 Notebook](https://www.kaggle.com/hortonhearsafoo/fast-ai-lesson-1)

In [None]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
%%capture
!pip install fastai==0.7.0
!pip install torchtext==0.2.3

In [None]:
# This file contains all the main external libs we'll use
from fastai.imports import *

In [None]:
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

`PATH` is the path to your data - if you use the recommended setup approaches from the lesson, you won't need to change this. `sz` is the size that the images will be resized to in order to ensure that the training runs quickly. We'll be talking about this parameter a lot during the course. Leave it at `224` for now.

In [None]:
PATH = "../input/"
TMP_PATH = "/tmp/tmp"
MODEL_PATH = "/tmp/model/"
sz=224

Check that NVIDIA GPU is setup and available for use by PyTorch. Also check for CuDNN

In [None]:
torch.cuda.is_available()

In [None]:
torch.backends.cudnn.enabled

## First look at cat pictures
* First reorganize the cat pictures from the Kaggle Datase
    * The Kaggle Cats/Dogs set is organized differently than in the Fast AI course, the labels are embedded in the file names. 
* Take a look at the initial cat pictures

In [None]:
os.listdir(PATH)

In [None]:
fnames = np.array([f'train/{f}' for f in sorted(os.listdir(f'{PATH}train'))])
labels = np.array([(0 if 'cat' in f else 1) for f in fnames])

In [None]:
img = plt.imread(f'{PATH}{fnames[3]}')
plt.imshow(img);

In [None]:
img.shape

In [None]:
img[:4,:4]

## Create a NN model in 3 lines
* Utilize a pretrained model (resnet34)
* First time running the model takes longer because model needs to be downloaded

In [None]:
arch = resnet34 #Set model archatecture
#format data using FASTAI ImageClassifierData Class
data = ImageClassifierData.from_names_and_array(
    path = PATH,
    fnames = fnames, #Directory of all image file names
    y = labels, #labels taken from filenames in previous cell
    classes = ['dogs', 'cats'], #set labels
    test_name = 'test', #test directory
    tfms = tfms_from_model(arch, sz)
)
learn = ConvLearner.pretrained(arch, data, precompute=True, tmp_name=TMP_PATH, models_name=MODEL_PATH)
learn.fit(0.01, 2) #Learning Rate set to 0.01, and n_Epochs are 2

In [None]:
learn.fit(0.01, 2)

## Our first model: quick start

We're going to use a <b>pre-trained</b> model, that is, a model created by some one else to solve a different problem. Instead of building a model from scratch to solve a similar problem, we'll use a model trained on ImageNet (1.2 million images and 1000 classes) as a starting point. The model is a Convolutional Neural Network (CNN), a type of Neural Network that builds state-of-the-art models for computer vision. We'll be learning all about CNNs during this course.

We will be using the <b>resnet34</b> model. resnet34 is a version of the model that won the 2015 ImageNet competition. Here is more info on [resnet models](https://github.com/KaimingHe/deep-residual-networks). We'll be studying them in depth later, but for now we'll focus on using them effectively.

Here's how to train and evalulate a *dogs vs cats* model in 3 lines of code, and under 20 seconds:

In [None]:
# Uncomment the below if you need to reset your precomputed activations
# shutil.rmtree(f'{PATH}tmp', ignore_errors=True)

In [None]:
arch=resnet34
data = ImageClassifierData.from_names_and_array(
    path=PATH, 
    fnames=fnames, 
    y=labels, 
    classes=['dogs', 'cats'], 
    test_name='test', 
    tfms=tfms_from_model(arch, sz)
)
learn = ConvLearner.pretrained(arch, data, precompute=True, tmp_name=TMP_PATH, models_name=MODEL_PATH)
learn.fit(0.01, 2)

## Evaluate the model that has been created in three lines above
* Take a look at some of the predictions
* Exaine some of the attributres of the ImageClassifierData
* See what kind of images exist that the model is uncertan about

In [None]:
data.classes

In [None]:
data.val_y

In [None]:
log_preds = learn.predict()
log_preds.shape

In [None]:
log_preds[:10] #take a look at the last ten log-predictions
#first column represents log-probability of dogs, second cats`

In [None]:
preds = np.argmax(log_preds, axis=1) #which column is higher? (dogs or cats)
probs = np.exp(log_preds[:,1])

In [None]:
def rand_by_mask(mask): 
    return np.random.choice(np.where(mask)[0], min(len(preds), 4), replace = False)

def rand_by_correct(is_correct):
    return rand_by_mask((preds== data.val_y)==is_correct)

def plots(ims, figsize = (12,6), rows = 1, titles = None):
    f= plt.figure(figsize = figsize)
    for i in range(len(ims)):
        sp = f.add_subplot(rows, len(ims)//rows, i+1)
        sp.axis('Off')
        if titles is not None: sp.set_title(titles[i], fontsize = 16)
        plt.imshow(ims[i])

def load_img_id(ds, idx): return np.array(PIL.Image.open(PATH+ds.fnames[idx]))

def plot_val_with_titles(idxs, title):
    imgs = [load_img_id(data.val_ds, x) for x in idxs]
    title_probs = [probs[x] for x in idxs]
    print(title)
    return plots(imgs, rows=1, titles = title_probs, figsize = (16,8)) if len(imgs)>0 else print("Not Found")

In [None]:
plot_val_with_titles(rand_by_correct(True), "Correctly Classified")

In [None]:
plot_val_with_titles(rand_by_correct(False), "Incorrectly Classified")

In [None]:
def most_by_mask(mask, mult):
    idxs = np.where(mask)[0]
    return idxs[np.argsort(mult * probs[idxs])[:4]]

def most_by_correct(y, is_correct):
    mult = -1 if (y==1)==is_correct else 1
    return most_by_mask(((preds == data.val_y)==is_correct) & (data.val_y == y), mult)

In [None]:
plot_val_with_titles(most_by_correct(0, True), "Most Cat Like Cats")

In [None]:
plot_val_with_titles(most_by_correct(1, True), "Most Dog Like Dogs")

In [None]:
plot_val_with_titles(most_by_correct(0, False), "Least Cat Like Cats")

In [None]:
plot_val_with_titles(most_by_correct(1, False), "Least Dog Like Dogs")

In [None]:
most_uncertain = np.argsort(np.abs(probs -0.5))[:4]
plot_val_with_titles(most_uncertain, "Most uncertain preds")

## How to Choose a Learning Rate
As stated in the class the learning rate is generally the most important hyperparameter for NN, particularly within the Fast AI framework

In [None]:
learn = ConvLearner.pretrained(arch, data, precompute = True, tmp_name=TMP_PATH, models_name=MODEL_PATH)

In [None]:
lrf = learn.lr_find()

In [None]:
learn.sched.plot_lr()

Can just adjust this number to get good results (fast AI internalizes hyperparameter tuning)

[Paper referencing Deep Network Learning Rates](https://arxiv.org/abs/1506.01186)

Rule of thumb: Find lowest point in learning schedule plot, and dial back a factor of ten.
- For instance below, take low point 10e-1 and set 10e-2 as initial set learning rate


In [None]:
learn.sched.plot()

## Data Augmentation

In [None]:
tfms = tfms_from_model(resnet34, sz, aug_tfms=transforms_side_on, max_zoom = 1.1)

transforms_side_on only slightly tweaks angle of image and only does a horizontal flip of the image

Next refromat the ImageClassifierData using the transforms

In [None]:
def get_augs():
    data = ImageClassifierData.from_names_and_array(
        path = PATH,
        fnames = fnames, #Directory of all image file names
        y = labels, #labels taken from filenames in previous cell
        classes = ['dogs', 'cats'], #set labels
        test_name = 'test', #test directory
        tfms = tfms,
        bs=2,
        num_workers=1
    )
    x,_ = next(iter(data.aug_dl))
    return data.trn_ds.denorm(x)[1]

In [None]:
ims = np.stack([get_augs() for i in range(6)])

In [None]:
plots(ims, rows=2)

In [None]:
data = ImageClassifierData.from_names_and_array(
    path = PATH,
    fnames = fnames, #Directory of all image file names
    y = labels, #labels taken from filenames in previous cell
    classes = ['dogs', 'cats'], #set labels
    test_name = 'test', #test directory
    tfms = tfms,
) #This reformats the data with the trasforms
learn = ConvLearner.pretrained(arch, data, precompute=True, tmp_name = TMP_PATH, models_name = MODEL_PATH)

When we create a new classifier by setting pre-comptue = True to create a linear NN layer that sits on top of the Resnet34 precomputed neural network. Since we're only training a linear layer and not actually changing the activation function in the resent34  model the data augmentation doesn't help since it's not impacting any of the activations of the NN

In [None]:
learn.fit(1e-2, 1) #Re train teh model with single epoch

Set Precompute to False to utilize data augmentation.

In [None]:
learn.precompute=False 

In [None]:
learn.fit(1e-2, 3, cycle_len=1) #takes some time to run

Accuracy isn't particularly getting better, but the amount of overfitting that occurs is reduced. 

Additionally theres the addition of the cycle_length parameter. This uses **Stochastic Gradient Descent with Restarts** to tweak learning rate as we go thorugh itterations
*     The reduction of learning rate reductino is called *learning rate annealing*
*     A good function to do learning rate annealing is the cosine function. This allows for more refinement when getting close to the ideal solution

In [None]:
learn.sched.plot_lr()

## Differential Learning Rate

In [None]:
learn.unfreeze()
lr = np.array([1e-4, 1e-3, 1e-2])

In [None]:
learn.fit(lr, 3, cycle_len = 1, cycle_mult = 2) #Re-fit model with differential learning rate. Some time is required to run the 6 epochs

In [None]:
learn.sched.plot_lr()

In [None]:
learn.save('224_all')

In [None]:
learn.load('224_all')

In [None]:
log_preds,y = learn.TTA()
probs = np.mean(np.exp(log_preds),0)

In [None]:
accuracy_np(probs, y)

## Review the easty steps to train a world class image classifier (Within the FastAI framework)
1. Ena ble data augmentation, and precompute=True
2. Use lr_find() to find the highest learning rate where loss is still clearly improving
3. Train last lyer from precomputed activations for 1-2 epochs
4. Train last layer with data augmentation (i.e. precompute = False) for 2-3 epochs with cycle_len = 1
5. Unfreeze all layers
6. Set earlier layers to 3x-10x lower learning rate than next higher layer
7. Use lr_find() again
8. Train full network with cycle_mult=2 until overfitting

## Analyze Results

In [None]:
preds = np.argmax(probs, axis=1)
probs = probs[:,1]

In [None]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y, preds)

In [None]:
plot_confusion_matrix(cm, data.classes)

In [None]:
plot_val_with_titles(most_by_correct(0, False), "Most incorrect cats")

In [None]:
plot_val_with_titles(most_by_correct(1, False), "Most incorrect dogs")

## Predict test results and submit 

In [None]:
test_pred  = learn.predict(is_test=True)

In [None]:
test_pred

In [None]:
pred = (np.argmax(test_pred, axis =1))

In [None]:
submission = pd.DataFrame({'id': os.listdir(f'{PATH}test'), 'label': pred})
submission.to_csv('submission.csv')