## Kaggle Competition for Google Quickdraw https://www.kaggle.com/c/quickdraw-doodle-recognition

"Quick, Draw!" was released as an experimental game to educate the public in a playful way about how AI works. The game prompts users to draw an image depicting a certain category, such as ”banana,” “table,” etc. The game generated more than 1B drawings, of which a subset was publicly released as the basis for this competition’s training set. That subset contains 50M drawings encompassing 340 label categories.

Sounds fun, right? Here's the challenge: since the training data comes from the game itself, drawings can be incomplete or may not match the label. You’ll need to build a recognizer that can effectively learn from this noisy data and perform well on a manually-labeled test set from a different distribution.

Your task is to build a better classifier for the existing Quick, Draw! dataset. By advancing models on this dataset, Kagglers can improve pattern recognition solutions more broadly. This will have an immediate impact on handwriting recognition and its robust applications in areas including OCR (Optical Character Recognition), ASR (Automatic Speech Recognition) & NLP (Natural Language Processing).

In [1]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import ast
import os.path

In [3]:
os.environ['CUDA_VISIBLE_DEVICES'] = '1' 

Here we import the libraries we need. We'll learn about what each does during the course.

In [4]:
# This file contains all the main external libs we'll use
from fastai.imports import *

In [5]:
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

`PATH` is the path to your data - if you use the recommended setup approaches from the lesson, you won't need to change this. `sz` is the size that the images will be resized to in order to ensure that the training runs quickly. We'll be talking about this parameter a lot during the course. Leave it at `224` for now.

In [6]:
PATH = "data2/"
sz=224

It's important that you have a working NVidia GPU set up. The programming framework used to behind the scenes to work with NVidia GPUs is called CUDA. Therefore, you need to ensure the following line returns `True` before you proceed. If you have problems with this, please check the FAQ and ask for help on [the forums](http://forums.fast.ai).

In [7]:
torch.cuda.is_available()

True

In addition, NVidia provides special accelerated functions for deep learning in a package called CuDNN. Although not strictly necessary, it will improve training performance significantly, and is included by default in all supported fastai configurations. Therefore, if the following does not return `True`, you may want to look into why.

In [8]:
torch.backends.cudnn.enabled

True

In [9]:
# os.makedirs('data/dogscats/models', exist_ok=True)

# !ln -s /datasets/fast.ai/dogscats/train {PATH}
# !ln -s /datasets/fast.ai/dogscats/test {PATH}
# !ln -s /datasets/fast.ai/dogscats/valid {PATH}

# os.makedirs('/cache/tmp', exist_ok=True)
# !ln -fs /cache/tmp {PATH}

In [10]:
# os.makedirs('/cache/tmp', exist_ok=True)
# !ln -fs /cache/tmp {PATH}

## First look at Data 

First Need to look at csv files 

# Divide Model between training and validation set 

In [11]:
from os import listdir

In [12]:
from helpers import *

In [13]:
TRAIN_IMG_PATH = 'data2/simplifiedTrainImages2k/'

In [14]:
TRAIN_IMGS="simplifiedTrainImages2k"

In [15]:
TMP_PATH = 'data/quickdraw/tmp/'
MODEL_PATH = 'data/quickdraw/model/'

In [16]:
allDirs=[]

for idx,iDir in enumerate(listdir(TRAIN_IMG_PATH)):
    joinedEntry = os.path.join(TRAIN_IMG_PATH,iDir)
    if os.path.isdir(joinedEntry):
        #print(joinedEntry)
        allDirs.append(iDir)

In [17]:
#print(allDirs)

In [18]:
labels_file = "data2/train2kLabels.txt"

In [19]:
train_dir=TRAIN_IMG_PATH
print(train_dir)
search_terms = allDirs
#print( "Search term list: '%s'" % search_terms )

f= open(labels_file,"w+")
f.write("file,label\n")
for search_term_dir in search_terms:
    #print( "search term dir: '%s'" % search_term_dir )
    path = os.path.join( train_dir, search_term_dir )
    files = os.listdir( path )
    for file in files[:200]: #list[:10]
        if file.endswith(".png"):
            #print(search_term_dir + "/" + file + " , " + search_term_dir)
            f.write(search_term_dir + "/" + file + "," + search_term_dir + "\n")
f.close()


data2/simplifiedTrainImages2k/


## Our first model: quick start

We're going to use a <b>pre-trained</b> model, that is, a model created by some one else to solve a different problem. Instead of building a model from scratch to solve a similar problem, we'll use a model trained on ImageNet (1.2 million images and 1000 classes) as a starting point. The model is a Convolutional Neural Network (CNN), a type of Neural Network that builds state-of-the-art models for computer vision. We'll be learning all about CNNs during this course.

We will be using the <b>resnet34</b> model. resnet34 is a version of the model that won the 2015 ImageNet competition. Here is more info on [resnet models](https://github.com/KaimingHe/deep-residual-networks). We'll be studying them in depth later, but for now we'll focus on using them effectively.

Here's how to train and evalulate a *dogs vs cats* model in 3 lines of code, and under 20 seconds:

In [20]:
# Uncomment the below if you need to reset your precomputed activations
# shutil.rmtree(f'{PATH}tmp', ignore_errors=True)

The *learning rate* determines how quickly or how slowly you want to update the *weights* (or *parameters*). Learning rate is one of the most difficult parameters to set, because it significantly affects model performance.

The method `learn.lr_find()` helps you find an optimal learning rate. It uses the technique developed in the 2015 paper [Cyclical Learning Rates for Training Neural Networks](http://arxiv.org/abs/1506.01186), where we simply keep increasing the learning rate from a very small value, until the loss stops decreasing. We can plot the learning rate across batches to see what this looks like.

We first create a new learner, since we want to know how to set the learning rate for a new (untrained) model.

In [21]:
TEST_PATH = "allTestImagesSimplified"

In [22]:
n = len(list(open(labels_file)))-1
val_idxs = get_cv_idxs(n,val_pct=0.2)
arch=resnet34
def get_data(sz,bs=64):
    tfms = tfms_from_model(resnet34, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
    return ImageClassifierData.from_csv(PATH,TRAIN_IMGS,
                                        labels_file,
                                        tfms=tfms,
                                        bs=bs,
                                        suffix='',
                                        val_idxs=val_idxs,
                                        test_name=TEST_PATH)

In [23]:
data = get_data(sz,bs=256)

In [24]:
learn = ConvLearner.pretrained(arch, data, tmp_name=TMP_PATH, models_name=MODEL_PATH,precompute=False)

Our `learn` object contains an attribute `sched` that contains our learning rate scheduler, and has some convenient plotting functionality including this one:

Note that in the previous plot *iteration* is one iteration (or *minibatch*) of SGD. In one epoch there are 
(num_train_samples/batch_size) iterations of SGD.

We can see the plot of loss versus learning rate to see where our loss stops decreasing:

learn.sched.plot()

The loss is still clearly improving at lr=10-1 (0.1), so that's what we use. Note that the optimal learning rate can change as we train the model, so you may want to re-run this function from time to time.

## Improving our model

### Data augmentation

If you try training for more epochs, you'll notice that we start to *overfit*, which means that our model is learning to recognize the specific images in the training set, rather than generalizing such that we also get good results on the validation set. One way to fix this is to effectively create more data, through *data augmentation*. This refers to randomly changing the images in ways that shouldn't impact their interpretation, such as horizontal flipping, zooming, and rotating.

We can do this by passing `aug_tfms` (*augmentation transforms*) to `tfms_from_model`, with a list of functions to apply that randomly change the image however we wish. For photos that are largely taken from the side (e.g. most photos of dogs and cats, as opposed to photos taken from the top down, such as satellite imagery) we can use the pre-defined list of functions `transforms_side_on`. We can also specify random zooming of images up to specified scale by adding the `max_zoom` parameter.

In [26]:
learn.precompute=False

In [27]:
learn.unfreeze()

Note that the other layers have *already* been trained to recognize imagenet photos (whereas our final layers where randomly initialized), so we want to be careful of not destroying the carefully tuned weights that are already there.

Generally speaking, the earlier layers (as we've seen) have more general-purpose features. Therefore we would expect them to need less fine-tuning for new datasets. For this reason we will use different learning rates for different layers: the first few layers will be at 1e-4, the middle layers at 1e-3, and our FC layers we'll leave at 1e-2 as before. We refer to this as *differential learning rates*, although there's no standard name for this techique in the literature that we're aware of.

In [28]:
lr=np.array([1e-3,1e-2,1e-1])

In [29]:
learn.fit(lr,  4, cycle_len=1, cycle_mult=2)

HBox(children=(IntProgress(value=0, description='Epoch', max=15, style=ProgressStyle(description_width='initia…

epoch      trn_loss   val_loss   accuracy                   
    0      2.712612   2.083984   0.529044  
    1      2.082174   1.605725   0.608235                   
    2      1.660177   1.437743   0.653529                   
    3      1.733766   1.497985   0.633456                   
    4      1.465648   1.342587   0.669265                   
    5      1.251007   1.264702   0.690515                   
    6      1.138705   1.250694   0.695368                   
    7      1.406409   1.529337   0.631176                   
    8      1.280752   1.371937   0.66375                    
    9      1.149892   1.321249   0.675368                   
    10     1.024797   1.263046   0.691618                   
    11     0.892343   1.247277   0.702574                    
    12     0.789344   1.246163   0.704853                    
    13     0.717299   1.236509   0.710441                    
    14     0.689813   1.237269   0.708603                    



[array([1.23727]), 0.7086029411764706]

Another trick we've used here is adding the `cycle_mult` parameter. Take a look at the following chart, and see if you can figure out what the parameter is doing:

In [30]:
#lr=np.array([1e-4,1e-3,1e-2])

In [31]:
#learn.fit(lr, 2, cycle_len=1, cycle_mult=2)

In [32]:
#learn.sched.plot_lr()

Note that's what being plotted above is the learning rate of the *final layers*. The learning rates of the earlier layers are fixed at the same multiples of the final layer rates as we initially requested (i.e. the first layers have 100x smaller, and middle layers 10x smaller learning rates, since we set `lr=np.array([1e-4,1e-3,1e-2])`.

In [33]:
learn.save('224_all_200_cyclic_v1')

In [34]:
learn.load('224_all_200_cyclic_v1')

There is something else we can do with data augmentation: use it at *inference* time (also known as *test* time). Not surprisingly, this is known as *test time augmentation*, or just *TTA*.

TTA simply makes predictions not just on the images in your validation set, but also makes predictions on a number of randomly augmented versions of them too (by default, it uses the original image along with 4 randomly augmented versions). It then takes the average prediction from these images, and uses that. To use TTA on the validation set, we can use the learner's `TTA()` method.

In [37]:
#log_preds,y = learn.TTA()
#probs = np.mean(np.exp(log_preds),0)

In [None]:
#accuracy_np(probs, y)

In [38]:
log_preds_test = learn.TTA(is_test=True)

                                              

In [51]:
log_preds_test_mean = np.mean(log_preds_test,0)

In [48]:
#log_preds_test2 = learn.predict(is_test=True)

In [52]:
print(log_preds_test_mean)

#probs_test=np.exp(log_preds_test)

#print(probs_test.shape)

[[[ -9.16651  -6.95624 -10.2286  ...  -8.25294  -6.14713  -7.73004]
  [ -9.14502  -7.95072  -8.65218 ...  -6.69108  -8.61476  -8.25965]
  [ -8.4553   -6.60737  -7.68432 ...  -5.47277  -8.24699  -7.40684]
  ...
  [-10.9798  -11.46378 -11.8281  ... -12.39786  -9.4729  -10.99495]
  [ -8.01676  -8.24194  -6.46498 ...  -7.68303  -8.4523   -7.93002]
  [ -7.00658  -7.48649  -7.73623 ...  -8.90704  -7.06623  -8.58311]]

 [[ -9.33689  -7.02735 -10.42415 ...  -8.90454  -6.29876  -7.94663]
  [ -8.17896  -7.37478  -7.6134  ...  -6.08325  -8.38057  -7.89706]
  [ -7.88029  -6.35378  -7.97466 ...  -5.4484   -8.62132  -7.10884]
  ...
  [-10.62456 -11.02429 -11.3657  ... -12.19413  -9.1532  -11.00989]
  [ -7.8396   -8.37054  -6.02667 ...  -7.5818   -8.09399  -8.12833]
  [ -6.44351  -6.90163  -7.52472 ...  -8.22441  -7.15462  -7.91979]]

 [[ -8.21342  -6.46033  -9.39077 ...  -7.24766  -5.29425  -7.22015]
  [ -8.63576  -6.85628  -7.97683 ...  -6.35627  -7.53703  -7.93232]
  [ -7.98657  -6.31079  -7.30575

In [53]:
probs_test=np.exp(log_preds_test_mean)
probs_test

array([[[0.0001 , 0.00095, 0.00004, ..., 0.00026, 0.00214, 0.00044],
        [0.00011, 0.00035, 0.00017, ..., 0.00124, 0.00018, 0.00026],
        [0.00021, 0.00135, 0.00046, ..., 0.0042 , 0.00026, 0.00061],
        ...,
        [0.00002, 0.00001, 0.00001, ..., 0.     , 0.00008, 0.00002],
        [0.00033, 0.00026, 0.00156, ..., 0.00046, 0.00021, 0.00036],
        [0.00091, 0.00056, 0.00044, ..., 0.00014, 0.00085, 0.00019]],

       [[0.00009, 0.00089, 0.00003, ..., 0.00014, 0.00184, 0.00035],
        [0.00028, 0.00063, 0.00049, ..., 0.00228, 0.00023, 0.00037],
        [0.00038, 0.00174, 0.00034, ..., 0.0043 , 0.00018, 0.00082],
        ...,
        [0.00002, 0.00002, 0.00001, ..., 0.00001, 0.00011, 0.00002],
        [0.00039, 0.00023, 0.00241, ..., 0.00051, 0.00031, 0.0003 ],
        [0.00159, 0.00101, 0.00054, ..., 0.00027, 0.00078, 0.00036]],

       [[0.00027, 0.00156, 0.00008, ..., 0.00071, 0.00502, 0.00073],
        [0.00018, 0.00105, 0.00034, ..., 0.00174, 0.00053, 0.00036],
    

### Getting Test output 

In [56]:
print(probs_test[0])

[[0.0001  0.00095 0.00004 ... 0.00026 0.00214 0.00044]
 [0.00011 0.00035 0.00017 ... 0.00124 0.00018 0.00026]
 [0.00021 0.00135 0.00046 ... 0.0042  0.00026 0.00061]
 ...
 [0.00002 0.00001 0.00001 ... 0.      0.00008 0.00002]
 [0.00033 0.00026 0.00156 ... 0.00046 0.00021 0.00036]
 [0.00091 0.00056 0.00044 ... 0.00014 0.00085 0.00019]]


In [57]:
ss = np.argsort(-probs_test)

In [None]:
#probs_test.shape

In [59]:
ss[0,:,:3]

array([[ 99, 138, 253],
       [237, 284,  57],
       [147,  39, 153],
       ...,
       [238, 257,  83],
       [ 42, 174,  50],
       [ 70, 310, 219]])

In [60]:
prob_test_top_3 = np.argsort(-probs_test)[0,:,:3]

In [61]:
prob_test_top_3.shape

(112199, 3)

In [None]:
#data.classes

In [62]:
f= open("submission_QuickDraw_224_all_200_cyclic_tta_v1.csv","w+")
f.write("key_id,word\n")

12

In [None]:
#data.test_ds.fnames

In [63]:
key_ids = [ x.split('/')[1].split('.png')[0] for x in data.test_ds.fnames] 

In [64]:
key_ids[0]

'9483104542098626'

In [65]:
labels = []

for i in range(prob_test_top_3.shape[0]):
    #print(i)
    f.write(key_ids[i]+",")
    for j in range(prob_test_top_3.shape[1]):
        #print(j)
        #print(prob_test_top_3[i][j])
        #print(data.classes[prob_test_top_3[i][j]])
        
        f.write(data.classes[prob_test_top_3[i][j]]+ " ")
        #print(data.classes[j])
    f.write("\n")
#labels
f.close()

In [66]:
from IPython.display import FileLink, FileLinks


In [67]:
FileLink('submission_QuickDraw_224_all_200_cyclic_tta_v1.csv')

In [None]:
#labels_probs_test_top_3