<a href="https://colab.research.google.com/github/nainatejani/Doodle-Recognition/blob/master/Doodle_Recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Doodle Recognition

My project is called Doodle Recognition. I use the [`Quick! Draw! Dataset`](https://colab.research.google.com/drive/106A67Drzr8NyUo9nqFi_s0MS4N293dxu#scrollTo=4yeCbrsLIqmU&line=3&uniqifier=1), which is the world's largest doodling dataset containing over 50 million drawings each belonging to one of the 345 categories. 

**Problem:** Recognize handmade drawings from a collection of 50 million drawings and place them in one of the 345 image categories, which include among other things, airplanes, books, body parts, etc. Analyse the accuracy of the model performance compared to human performance, and create a model that uses transfer learning from the doodles to recognize real objects.

**The Importance of the Problem:** Important applications in computer vision and pattern recognition because doodles are high noise datasets.


**Simplification of the Problem:**

While ideally I would want to train my model for classification using a large amount of data, it is not very computationally feasible. For this reason, in this project, I restrict myself to a much smaller data sample. Moreover, I reduce the problem to a further extent by only considering 2 categories- ambulance and an angel. An ambulance is denoted by a 1 in my labels while 0 represents an angel. Given a doodle object which is either an ambulance or an angel, my classifier predicts which one it is likely to be.

**Goals For The Project:** 

1. Classify doodles using the supervised ML algorithm called K-Nearest Neighbors. I analyze the accuracy and compare it with chance performace.
2. Use a simple CNN to classify the doodles
2. Use a pre-trained convolutional neural network to classify the doodles and report on the accuracy of the model
3. Fine-tune the pretrained model on the doodle dataset and report on the accuracy of the model 


In [0]:
# making necessary imports
import cv2
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision
import torchvision.datasets as datasets
from torch.autograd import Variable
import matplotlib.pyplot as plt
# from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import os


# Preparing the Dataset

The `Quick! Draw! Dataset` can be downloaded in many formats e.g ndjson and numpy as well as raw versions. Since I am using pytorch and numpy, I found it best to use the numpy files.

A note on the structure of the numpy files: each numpy file corresponds to all the doodles of one category. The entire numpy file is a list and each doodle is a 1D array in the list. As is clearly evitable, this calls for a lot of dataset preparation since there is no individual file for each datapoint. I prepare the dataset by using the following steps:
1. Create a folder in the google drive containing 2 files for the 2 categories I am considering for my project. 
2. Using a python script, create a train_doodles_data and test_doodles_data as well as train_doodles_labels and test_doodles_labels.
3. Use a class DoodleDataset inherited from the pytorch Dataset class to define train_load and test_load.

The following loads data from the folder "logistic dataset" from the drive. It loads the first 10000 doodles for each category as the train data and loads the doodles from index 15000 to 17000 for each category as the test data.

In [0]:
# After some initial testing of the data, I found that each datapoint is a 1D array of size 784.
doodles_train_data = np.array(np.zeros(784,))
doodles_test_data = np.array(np.zeros(784))
for root, dirs, files in os.walk("/content/drive/My Drive/logistic dataset"):
  for filename in files:
    with open("/content/drive/My Drive/logistic dataset/" + filename, "rb" ) as f:
      print("First file to load is " + str(filename))
      data = np.load(f)
      for i in range(10000):
        doodles_train_data = np.vstack([doodles_train_data, data[i]])

      for i in range(15000,17000):
        doodles_test_data = np.vstack([doodles_test_data, data[i]])

 # Since I initialized the datasets with empty numpy array, I need to remove these 
 # first np arrays from my dataset which I do by the following line of code.      
doodles_train_data = doodles_train_data[1:,:]
doodles_test_data = doodles_test_data[1:,:]

print(doodles_train_data.shape)
print(doodles_test_data.shape)


In [0]:
# depending on how you want to label the classes. I am labelling 1 as ambulance(since I loaded in the ambulance dataset first
#) and 0 as the angel.
doodles_train_labels = np.concatenate((np.ones((10000,)), np.zeros((10000,))))
doodles_test_labels = np.concatenate((np.ones((2000,)), np.zeros((2000,))))

# K-Nearest Neighbor Implementation


In [0]:
 # I simplify the names for the KNN implementation.
 train_set_x, train_set_y, test_set_x, test_set_y= doodles_train_data, doodles_train_labels, doodles_test_data, doodles_test_labels


In [0]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(train_set_x)

train_set_x = scaler.transform(train_set_x)
test_set_x = scaler.transform(test_set_x)

In [0]:
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=6)
classifier.fit(train_set_x, train_set_y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=6, p=2,
                     weights='uniform')

In [0]:
y_pred = classifier.predict(test_set_x)

In [0]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(test_set_y, y_pred))
print(classification_report(test_set_y, y_pred))

The following are the results I received from implementing KNN.

[[1913   87],
[  67 1933]]

              precision    recall  f1-score   support

         0.0       0.97      0.96      0.96      2000
         1.0       0.96      0.97      0.96      2000

    accuracy                           0.96      4000
    macro avg       0.96      0.96     0.96      4000
    weighted avg    0.96      0.96     0.96      4000


[Citation link text](https://stackabuse.com/k-nearest-neighbors-algorithm-in-python-and-scikit-learn/)





In [0]:
# Note that this takes a long time to run and is optional. I ran this code and printed the error values in the following cell
# so this can be skipped.
error = []

# Calculating error for K values between 1 and 40
for i in range(1, 40):
    print(i)
    knn = KNeighborsClassifier(n_neighbors=i)
    knn.fit(train_set_x, train_set_y)
    pred_i = knn.predict(test_set_x)
    error.append(np.mean(pred_i != test_set_y))

In [0]:
error = [0.0465, 0.04225, 0.0385, 0.03575, 0.0385, 0.035, 0.035, 0.036, 0.035, 0.0365, 0.0355, 0.0345, 0.0375, 0.03725, 0.0375, 0.0385, 0.03825, 0.0385, 0.03875]

In [0]:
plt.figure(figsize=(12, 6))
plt.plot(range(1, 20), error, color='red', linestyle='dashed', marker='o',
         markerfacecolor='blue', markersize=10)
plt.title('Error Rate K Value')
plt.xlabel('K Value')
plt.ylabel('Mean Error')

# Data Preprocessing

In [0]:
#Transformation for image
transform_ori = transforms.Compose([    #flipping the image horizontally
                                    transforms.ToTensor(),                 #convert the image to a Tensor
                                    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])])  #normalize the image

In [0]:
class DoodleDataset(torch.utils.data.Dataset):
    """Face Landmarks dataset."""

    def __init__(self, doodles, labels, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.doodles = doodles
        self.labels = labels
        self.transform = transform_ori 


    def __len__(self):
        return len(self.doodles)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        sample = [self.doodles[idx], self.labels[idx]]
        # {'doodle': self.doodles[idx], 'label': self.labels[idx]}

        return sample

train_dataset = DoodleDataset(doodles_train_data, doodles_train_labels,transforms.Compose([   
                                    transforms.ToTensor(),                 #convert the image to a Tensor
                                    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])]) )
test_dataset = DoodleDataset(doodles_test_data,doodles_test_labels,transforms.Compose([   
                                    transforms.ToTensor(),                 #convert the image to a Tensor
                                    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])]))
doodles_train_data = doodles_train_data.reshape((20000,1,28,28))
doodles_test_data = doodles_test_data.reshape((4000,1,28,28))
print(doodles_test_data.shape)
print(doodles_train_data.shape)


(4000, 1, 28, 28)
(20000, 1, 28, 28)


In [0]:
batch_size = 32
train_load = torch.utils.data.DataLoader(dataset = train_dataset, 
                                         batch_size = batch_size,
                                         shuffle = True)      #Shuffle to create a mixed batches of 100 of cat & dog images

test_load = torch.utils.data.DataLoader(dataset = test_dataset, 
                                         batch_size = batch_size,
                                         shuffle = False)

In [0]:
# Visualizing the data
# Pass in any index to view the doodle. 
print(doodles_train_data.shape)
image = doodles_train_data[150].reshape((28,28))
plt.imshow(image, cmap = 'gray')

# Basic CNN Model

The following code on basic CNN model is inspired by [this article.](https://nextjournal.com/gkoehler/pytorch-mnist)

In [0]:
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

In the next cell, I define my CNN model using inbuilt nn module in Pytorch

In [0]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=5)
        self.conv2 = nn.Conv2d(32, 32, kernel_size=5)
        self.conv3 = nn.Conv2d(32,64, kernel_size=5)
        self.fc1 = nn.Linear(3*3*64, 256)
        self.fc2 = nn.Linear(256, 2)

    def forward(self, x):
        x = F.relu(self.conv1(x.float()))
        #x = F.dropout(x, p=0.5, training=self.training)
        x = F.relu(F.max_pool2d(self.conv2(x), 2))
        x = F.dropout(x, p=0.5, training=self.training)
        x = F.relu(F.max_pool2d(self.conv3(x),2))
        x = F.dropout(x, p=0.5, training=self.training)
        x = x.view(-1,3*3*64 )
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)
 
cnn = CNN()
print(cnn)

it = iter(train_load)
X_batch, y_batch = next(it)
print(cnn.forward(X_batch).shape)

In the next cell, I define the fit for my model

In [0]:
import tensorflow as tf
BATCH_SIZE = 32
def fit(model, train_loader):
    optimizer = torch.optim.Adam(model.parameters())#,lr=0.001, betas=(0.9,0.999))
    error = nn.CrossEntropyLoss()
    EPOCHS = 5
    model.train()
    for epoch in range(EPOCHS):
        correct = 0
        for batch_idx, (X_batch, y_batch) in enumerate(train_loader):
            var_X_batch = Variable(X_batch).float()
            var_y_batch = Variable(y_batch).long()
            optimizer.zero_grad()
            output = model(var_X_batch)
            loss = error(output, var_y_batch)
            loss.backward()
            optimizer.step()

            # Total correct predictions
            predicted = torch.max(output.data, 1)[1] 
            correct += (predicted == var_y_batch).sum()
            #print(correct)
            if batch_idx % 50 == 0:
              print(loss.data.shape)
              loss.item()
              print('Epoch : {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}\t Accuracy:{:.3f}%'.format(
                  epoch, batch_idx*len(X_batch), len(train_loader.dataset), 100.*batch_idx / len(train_loader), loss.item(), float(correct*100) / float(BATCH_SIZE*(batch_idx+1))))
                

In [0]:
fit(cnn,train_load)

I evaluate the model using the following code.

In [0]:
def evaluate(model):
#model = mlp
    correct = 0 
    for test_imgs, test_labels in test_load:
        #print(test_imgs.shape)
        test_imgs = Variable(test_imgs).float()
        output = model(test_imgs)
        predicted = torch.max(output,1)[1]
        correct += (predicted == test_labels).sum()
    print("Test accuracy:{:.3f}% ".format( float(correct) / (len(test_load)*BATCH_SIZE))*100)

In [0]:
evaluate(cnn)

Test set accuracy is 97%.

# Using Pre-trained CNN and fine-tuning it

The following code on pre-trained CNN using Resnet is inspired by a Pytoch [tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html) on transfer learning 

In [0]:
doodles_train = doodles_train_data
doodles_test = doodles_test_data

In [0]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
import torchvision
from torchvision import datasets, models, transforms
import torch.optim as optim
from torch.optim import lr_scheduler
import time
import copy

In the next 2 cells, I resize all my data in order to get 28,28,3 shaped doodles since the model Resnet18 only accepts 3 channels as inputs. 

In [0]:
# This chunk might take a while to run.
dim = np.zeros((28,28))
doodles_train_resize = [np.array(np.zeros((3,28,28)))]
i = 0
for image in doodles_train_data:
  print(i)
  img = image.reshape((28,28))
  resized = [np.stack((img,dim,dim), axis = 0)]
  doodles_train_resize = np.vstack([doodles_train_resize, resized])
  i+=1
doodles_train_resize = np.array(doodles_train_resize)
doodles_train_resize = doodles_train_resize[1:,:]
print(doodles_train_resize.shape)


In [0]:
dim = np.zeros((28,28))
doodles_test_resize = [np.array(np.zeros((3,28,28)))]
# doodles_train_resize.shape
i = 0
for image in doodles_test_data:
  print(i)
  img = image.reshape((28,28))
  resized = [np.stack((img,dim,dim), axis = 0)]
  doodles_test_resize = np.vstack([doodles_test_resize, resized])

  i+=1

doodles_test_resize = np.array(doodles_test_resize)
doodles_test_resize = doodles_test_resize[1:,:]
print(doodles_test_resize.shape)

I define my new Dataset Objects

In [0]:
train_dataset_resize = DoodleDataset(doodles_train_resize, doodles_train_labels)
test_dataset_resize = DoodleDataset(doodles_test_resize, doodles_test_labels)

In the following chunk, i define dataloaders which will be used in my model.

In [0]:

datasets = {'train':train_dataset_resize, 'val':test_dataset_resize}
dataloaders = {x: torch.utils.data.DataLoader(datasets[x], batch_size=32,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}
dataset_sizes = {x: len(datasets[x]) for x in ['train', 'val']}
# class_names = image_datasets['train'].classes

The following code trains the model and also runs it on my test data to determine train as well as test accuracy.

In [0]:
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs.float())
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels.long())

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

I define the model in the following chunk.

In [0]:
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model_ft.fc = nn.Linear(num_ftrs, 2)

model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

In the next cell, I run my model and get my results. i have copied the results just for reference.

In [0]:
# training and prediction both happens in the same line of code.
model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=5)

Epoch 0/4
----------
train Loss: 0.0381 Acc: 0.9858
val Loss: 0.0491 Acc: 0.9790

Epoch 1/4
----------
train Loss: 0.0243 Acc: 0.9900
val Loss: 0.0535 Acc: 0.9800

Epoch 2/4
----------
train Loss: 0.0197 Acc: 0.9926
val Loss: 0.0515 Acc: 0.9788

Epoch 3/4
----------
train Loss: 0.0144 Acc: 0.9942
val Loss: 0.0683 Acc: 0.9768

Epoch 4/4
----------
train Loss: 0.0138 Acc: 0.9946
val Loss: 0.0588 Acc: 0.9803

Training complete in 1m 57s
Best val Acc: 0.980250

# Using Pre-Trained CNN without fine-tuning

In [0]:
model_conv = torchvision.models.resnet18(pretrained=True)
for param in model_conv.parameters():
    param.requires_grad = False

# Parameters of newly constructed modules have requires_grad=True by default
num_ftrs = model_conv.fc.in_features
model_conv.fc = nn.Linear(num_ftrs, 2)

model_conv = model_conv.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that only parameters of final layer are being optimized as
# opposed to before.
optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

In [0]:
model_conv = train_model(model_conv, criterion, optimizer_conv,
                         exp_lr_scheduler, num_epochs=25)

Epoch 0/24
----------
train Loss: 0.3882 Acc: 0.8272
val Loss: 0.3354 Acc: 0.8580

Epoch 1/24
----------
train Loss: 0.3561 Acc: 0.8457
val Loss: 0.3587 Acc: 0.8465

Epoch 2/24
----------
train Loss: 0.3508 Acc: 0.8502
val Loss: 0.3121 Acc: 0.8700

Epoch 3/24
----------
train Loss: 0.3424 Acc: 0.8526
val Loss: 0.3223 Acc: 0.8742

Epoch 4/24
----------
train Loss: 0.3471 Acc: 0.8527
val Loss: 0.3347 Acc: 0.8650

Epoch 5/24
----------
train Loss: 0.3461 Acc: 0.8529
val Loss: 0.3377 Acc: 0.8563

Epoch 6/24
----------
train Loss: 0.3408 Acc: 0.8563
val Loss: 0.3262 Acc: 0.8708

Epoch 7/24
----------
train Loss: 0.3300 Acc: 0.8609
val Loss: 0.3215 Acc: 0.8685

Epoch 8/24
----------
train Loss: 0.3184 Acc: 0.8655
val Loss: 0.3070 Acc: 0.8788

Epoch 9/24
----------
train Loss: 0.3198 Acc: 0.8663
val Loss: 0.3141 Acc: 0.8705

Epoch 10/24
----------
train Loss: 0.3197 Acc: 0.8651
val Loss: 0.3252 Acc: 0.8702

Epoch 11/24
----------
train Loss: 0.3238 Acc: 0.8634
val Loss: 0.3147 Acc: 0.8730

Epoch 12/24
----------
train Loss: 0.3204 Acc: 0.8659
val Loss: 0.3168 Acc: 0.8708

Epoch 13/24
----------
train Loss: 0.3188 Acc: 0.8620
val Loss: 0.3085 Acc: 0.8732

Epoch 14/24
----------
train Loss: 0.3188 Acc: 0.8651
val Loss: 0.3150 Acc: 0.8678

Epoch 15/24
----------
train Loss: 0.3152 Acc: 0.8659
val Loss: 0.3141 Acc: 0.8668

Epoch 16/24
----------
train Loss: 0.3188 Acc: 0.8649
val Loss: 0.3190 Acc: 0.8710

Epoch 17/24
----------
train Loss: 0.3201 Acc: 0.8649
val Loss: 0.3140 Acc: 0.8720

Epoch 18/24
----------
train Loss: 0.3163 Acc: 0.8631
val Loss: 0.3168 Acc: 0.8715

Epoch 19/24
----------
train Loss: 0.3163 Acc: 0.8663
val Loss: 0.3060 Acc: 0.8775

Epoch 20/24
----------
train Loss: 0.3186 Acc: 0.8676
val Loss: 0.3068 Acc: 0.8765

Epoch 21/24
----------
train Loss: 0.3138 Acc: 0.8665
val Loss: 0.3089 Acc: 0.8700

Epoch 22/24
----------
train Loss: 0.3173 Acc: 0.8653
val Loss: 0.3083 Acc: 0.8690

Epoch 23/24
----------
train Loss: 0.3142 Acc: 0.8659
val Loss: 0.3160 Acc: 0.8702

Epoch 24/24
----------
train Loss: 0.3116 Acc: 0.8692
val Loss: 0.3136 Acc: 0.8748

Training complete in 3m 20s
Best val Acc: 0.878750