# **Image Classification using Hierarchical Layers**
Marcus Karr, Teja Kalavakolanu, Nagasai Chandra, Roxanne Miller, Justin 
Morgan 


---





**Attention**: refactored a lot of the code to make things cleaner, and because of issues with misaligned cells, redefined functions, and mutated data. The original, more notebook-y workspace version with pictures is [here](https://drive.google.com/open?id=19iPc5Gq0hg3yVuc2ItWfiTftXflM-eaN). Be aware it's not currently set up for all cells to run in sequence.


**Overview:**
This project aims to improve the classification of images by using layers that encode hierarchical information. Traditional convolutional neural networks are trained as flat N-way classifiers, which ignores the hierarchical structure of categories. The new architecture will take the output of ResNet18 and feed it into two separate linear layers. The information from one of these layers will go through subsequent layers and be used to predict the parent class. Information the other will be combined with the output of the parent class prediction in order to predict the child class. 

The goal of the project is to have the neural network trained on a variety of parent and child classes so that it can easily learn new child classes. To add a new child class we would retrain only a few layers on a minimal number of images. Ideally we could take a some pictures of someone's personal item and retrain our neural net in a relatively short time to recognize that specific item. We believe with this architecture we can achieve a higher level of acccuracy on identifying child classes than current architectures achieve. We also think that we will be able to speed up the training process, use a small number of training images, and still achieve high accuracy with sample efficiency.

The architecture can be seen using the links in the "Changes in Design" section

**Technologies, Tools, Languages, Environment:** 
We are using pytorch to build the proposed neural nework architecture. Pytorch is a library built for machine learning and makes it easy to implement a variety of neural network architectures. In addition we plan to use the FastAI library for scraping data. FastAI is built on pytorch and abstracts away some of the more technical details of pytorch. So far, we have primarily used it for getting the images of cats and dogs this prototype model was trained on. In the future we may use it to find additional datasets when expanding our model to recognize more than jsut dogs and cats. We are using the Google Colab environment to run our neural network model. The programming language we use is python and pytorch is built on python. In addition python libraries like matplotlib and numpy are also used. Nvida cuda GPU is used to train the model. 

**User interaction**
run all the cells in the colab in the  order to execute and fetch the results. if downloading resenet 18 weights fail then download it manually and upload to colab to continue the execution. The final demo might be a web app where you can enter a sub class of an object and its label and run the model. The model will be able to predict the new sub class with less images there by achieving sample efficiency as well as the higher accuracy. To train the model you require a GPU but to make predictions CPU is good enough. The minimum RAM required is 2GB . 

**Changes in design**

Our old system architecture can be seen here:
https://drive.google.com/open?id=1XiwC_Q1bpNBWvgcXEqAOed1ArGzv_g41

Our new system architecture can be seen here:
https://drive.google.com/open?id=1rRdKpSmX46py3v1-CTqjtOrDXQcA25hv

The major differences between our initial and current designs are that with the initial design the layers were not the right shapes to be combined for predicting the subclasses. We added extra layers so as to not lose information from preceding layers and to make the layers the right shapes.

# Hierarchical classifier first steps

Let's get a CNN up and running. We want to load a pretrained model as *backbone*, specify custom classifier as *head*, and fine-tune the classifier on our data. The following are the libraries we are going to use in this project: numpy, matplotlib, torch, torchvision.

In [0]:
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch import tensor
from torchvision import datasets, models

np.random.seed(2) # for reproducibility

## Get the data

Load images and labels. We got the dataset from one of the fastai datasets. This dataset contains images of 37 breeds of dogs and cats.

In [0]:
def load_data():
    from fastai.vision import untar_data, URLs, get_image_files
    from PIL import Image
    path = untar_data(URLs.PETS)
    path_img = path/'images'
    filenames = get_image_files(path_img)
    images = [Image.open(filename).convert('RGB') for filename in filenames]

    def get_labels(filenames):
        import re
        pat = r'([^/]+)_\d+.jpg$'
        pat = re.compile(pat)
        return [pat.search(str(name)).group(1) for name in filenames]

    return images, get_labels(filenames)


Wrote custom splitter because `sklearn`'s version was using tons of RAM for no apparent reason. 

In [0]:
def train_valid_split(x, y, valid_pct=0.33):
    assert len(x) == len(y), 'len(x) != len(y)'
    cutoff = int((1-valid_pct) * len(x))
    x_train, y_train = x[:cutoff], y[:cutoff]
    x_valid, y_valid = x[cutoff:], y[cutoff:]
    return x_train, y_train, x_valid, y_valid

Now we process the images and transform them into tensors. The train and validation sets will be treated slightly differently.

In [0]:
def transform_images(x_train, x_valid):
    from torchvision import transforms

    def transform_train_images(x_train):
        mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
        transform = transforms.Compose([
            transforms.RandomResizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize(mean, std)
        ])
        return torch.stack([transform(image) for image in x_train]).cuda()

    def transform_valid_images(x_valid):
        mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
        transform = transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(mean, std)
        ])
        return torch.stack([transform(image) for image in x_valid]).cuda()

    return transform_train_images(x_train), transform_valid_images(x_valid)

Now convert class names to ints.

In [0]:
def names_to_ints(y_train, y_valid, class_names):
    class_dict = dict((class_name, idx) for idx, class_name in enumerate(class_names))

    def _names_to_ints(class_dict, labels):
        tuples = []
        for label in labels:
            breed_int = class_dict[label]
            kind_int = 0 if label[0].isupper() else 1 # 0 for cats, 1 for dogs
            tuples.append((breed_int, kind_int))
        return torch.tensor(tuples).cuda()
    
    return _names_to_ints(class_dict, y_train), _names_to_ints(class_dict, y_valid)

`DataLoader` will manage batches and do the shuffling for us. Note that since we aren't training the validation set, we don't need to store the gradients. We have 2x the memory available, so we can double the batch size and speed up training.

In [0]:
def get_dataloaders(x_train, y_train, x_valid, y_valid, batchsize=64):
    from torch.utils.data import TensorDataset, DataLoader
    bs = batchsize
    train_ds = TensorDataset(x_train, y_train)
    valid_ds = TensorDataset(x_valid, y_valid)
    train_dl = DataLoader(train_ds, bs, shuffle=True, drop_last=True, pin_memory=False)
    valid_dl = DataLoader(valid_ds, bs*2, shuffle=False, pin_memory=False)
    return train_dl, valid_dl

In [0]:
class DataBunch:
    def __init__(self, train_dl, valid_dl, class_names):
        self.train_dl = train_dl
        self.valid_dl = valid_dl
        self.class_names = class_names

def get_data():
    images, labels = load_data()
    class_names = sorted(set(labels))
    x_train, y_train, x_valid, y_valid = train_valid_split(images, labels)
    x_train, x_valid = transform_images(x_train, x_valid)
    y_train, y_valid = names_to_ints(y_train, y_valid, class_names)
    train_dl, valid_dl = get_dataloaders(x_train, y_train, x_valid, y_valid)
    return DataBunch(train_dl, valid_dl, class_names)

data = get_data()

### Load and run the model

This training loop is adapted from FastAI's Lesson 8 notebook. It's very simple, lacking regularization and visualization tools.

In [0]:
def accuracy(out, yb): return (torch.argmax(out, dim=1)==yb).float().mean()

def fit(epochs, model, loss_func, opt, train_dl, valid_dl): 
    for epoch in range(epochs):
        # Handle batchnorm / dropout here
        model.train()
        for images, labels in train_dl:
            child_labels = labels[:, 0]
            parent_labels = labels[:, 1]
            
            images.to(device)
            child_labels.to(device)
            parent_labels.to(device)
            child_pred, parent_pred = model(images)

            dist = torch.t(torch.stack(37*[parent_labels]))
            dist = torch.where(dist > 0, dogs64, cats64)

            loss = (loss_func(child_pred, child_labels) +
                    loss_func(parent_pred, parent_labels) +
                    torch.kl_div(child_pred, dist))
            loss.backward()
            opt.step()
            opt.zero_grad()

        model.eval()
        with torch.no_grad():
            tot_loss, child_acc, parent_acc = 0.,0.,0.
            for images, labels in valid_dl:
                child_labels = labels[:, 0]
                parent_labels = labels[:, 1]
                
                images.to(device)
                child_labels.to(device)
                parent_labels.to(device)
                child_pred, parent_pred = model(images)

                loss = (loss_func(child_pred, child_labels) +
                        loss_func(parent_pred, parent_labels))

                tot_loss   += loss
                child_acc  += accuracy(child_pred, child_labels)
                parent_acc += accuracy(parent_pred, parent_labels)
        nv = len(valid_dl)
        print(epoch, tot_loss/nv, child_acc/nv, parent_acc/nv)
    return tot_loss/nv, child_acc/nv, parent_acc/nv

### Defining the model

Let's start with modifying this so it outputs two predictions instead of one: child and parent.

In [0]:
# Source:
# https://pytorch.org/docs/stable/_modules/torchvision/models/resnet.html#resnet18
from torchvision.models.resnet import ResNet, BasicBlock
#from torch.hub import load_state_dict_from_url

class HierResNet(ResNet):  
    def __init__(self, block=BasicBlock, layers=[2,2,2,2], **kwargs):
        super().__init__(block, layers, **kwargs)
        # these lines no longer work because SSL certificate issues
        #URL = 'https://download.pytorch.org/models/resnet18-5c106cde.pth'
        #state_dict = load_state_dict_from_url(URL, progress=True)
        PATH = 'resnet18-5c106cde.pth'
        self.load_state_dict(torch.load(PATH))
        
        # freeze the CNN
        for param in self.parameters():
            param.requires_grad = False
        
        num_ftrs = self.fc.in_features
        
        # child layers
        self.fc = nn.Linear(num_ftrs, 512) # child branch
        self.fc2 = nn.Linear(640, 37)
        
        # parent layers
        self.fcp = nn.Linear(num_ftrs, 2) # parent class
        self.fcp2 = nn.Linear(2, 128)

    def _forward_impl(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)

        # child branch
        child = self.fc(x)               # 512 outs
        
        # parent branch
        parent_res = self.fcp(x)         # 2 outs
        parent = self.fcp2(parent_res)   # 128 outs
        
        # merge branches
        both = torch.cat((child, parent), 1)
        child_res = self.fc2(both)       # 37 outs

        return child_res, parent_res

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = HierResNet()

# load onto GPU
model = model.to(device)

# define loss
loss_func = F.cross_entropy

# only update weights in fully connected layers
params = []
layers_to_update = [model.fc, model.fc2, model.fcp, model.fcp2]
for layer in layers_to_update:
    params.extend(list(layer.parameters()))
optimizer = optim.Adam(params)

In [0]:
loss,child_acc,parent_acc = fit(5, model, loss_func, optimizer, data.train_dl, data.valid_dl)

In [0]:
for images, labels in data.train_dl:
    child_labels = labels[:, 0]
    parent_labels = labels[:, 1]

    print(child_labels)
    print(parent_labels)

    images.to(device)
    child_labels.to(device)
    parent_labels.to(device)
    child_pred, parent_pred = model(images)
    print(child_pred[0])
    print(parent_pred[0])
    break

In [0]:
print(child_pred[1])
print(parent_pred[1])

In [0]:
for images, labels in data.train_dl:
            child_labels = labels[:, 0]
            parent_labels = labels[:, 1]
            
            images.to(device)
            child_labels.to(device)
            parent_labels.to(device)
            child_pred, parent_pred = model(images)

            dist = torch.t(torch.stack(37*[parent_labels]))
            dist = torch.where(dist > 0, dogs64, cats64)

            loss = torch.kl_div(child_pred, dist)
            print(loss)
            break

In [0]:
child_labels[0]

In [0]:
child_pred[0]

In [0]:
loss[0]

In [0]:
child_pred[0].softmax(0)

In [0]:
child_pred[0]

In [0]:
child_pred[0].log_softmax(0)