<a href="https://colab.research.google.com/github/erickmu1/Image-Segmentation/blob/E/ImageSegmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# APS360: Image Segmentation #

### **Team 5** ###
- Bonnie He
- Erick Mejia Uzeda
- Hannah Lee

## Project Description ##

TODO

# Imports + Global Variables #

Here we import all required libraries and define any useful variables.

In [8]:
# Pytorch
import torch
import torch.nn as nn
import torch.optim as optim

# Math
import numpy as np
import matplotlib.pyplot as plt

# Data storage/loading
import os
import pickle
from google.colab import drive

# Global Variables
DRIVE_ROOT = '/content/drive'
COLAB_ROOT = 'My Drive/Colab Notebooks'

ROOT = os.path.join(*[ DRIVE_ROOT, COLAB_ROOT, 'Image-Segmentation' ])

BACKGROUND_ID = 0 # TODO: actually determine a unique ID for 'background'

In [None]:
# Link to Google Drive
drive.mount(DRIVE_ROOT)

Mounted at /content/drive


# Data Loading #

In [None]:
# Code for loading data

# Data Pre-processing #

## Pre-computing Features from other ML models ##

We may have parts of the full ML pipeline implemented and others not. To speed up the training process, it is beneficial to precompute features resulting from a model and save them.

**Note:** Features are saved in such a format that once re-loaded, they can be passed directly into a `DataLoader`.

In [None]:
# Compute and Save Features from 'model'
def save_features(model, data_loader, file_name, dir=ROOT, use_cuda=False):
  features = []

  for input, label in data_loader:
    # Enable CUDA
    if use_cuda and torch.cuda.is_available():
      input = input.cuda()
      label = label.cuda()
    
    # Compute features
    with torch.no_grad():
      output = model(input)

    # Cache resulting features
    features.extend(output.cpu())
  
  # Save computed features using pickle
  save_path = os.path.join(dir, file_name + '.pickle')

  with open(save_path, 'wb+') as f:
    pickle.dump(features, f)

In [None]:
# Load features
def load_features(file_name, dir=ROOT):
  file_path = os.join.path(dir, file_name + '.pickle')

  # Load features using pickle
  with open(file_path, 'rb') as f:
    features = pickle.load(f)
  
  return features

## Grouping Background Segments ##

Our dataset has many labels for different categories. Since our goal is to segment non-background items that are *distinct*, we will pre-process the raw segmentation maps to group relevant labels/categories that could be considered as *background*.

The following are grouped together: (Background)
- `floor`
- `wall`
- `ceiling`
- `window`

In [None]:
# Group all 'background' segments into one category
def group_background(seg_maps):
  # ASSUME: seg_maps is a np.array with dimensions (num_samples x [image_dims])
  # NOTE: this function modifies seg_maps itself!

  relevant_labels = [ 'floor', 'wall', 'ceiling', 'window' ]
  relevant_ids = [ 0, 1, 2, 3 ]  # TODO: get *actual* numerical IDs

  # Retrieve indices (pixels) of relevant labels
  indices = np.isin(seg_maps, relevant_ids)
  
  # Associate such indices (pixels) as 'background'
  seg_maps[indices] = BACKGROUND_ID

## Dealing with Label Permutation and Number of Segments ##

Our model will not particularly care what label is associated to segments, hence we prefer to refer to a segment by its segment ID. To re-iterate, our model does not care about what the value of the segment ID is for any given segment, what will dictate if our model is working if it can properly distinguish two **distinct** segments. Thus given an outputted segmentation map, any same segmentation map but with the segment IDs permutted is equally valid. Moreover, our model (in theory) should be invariant to more than permutation, that is any value can be assigned to a segment, as long as distinct segments have distinct IDs.

Next, the number of segments depends generally on the number of objects in the scene, so it not known a priori for new samples. This suggests the use of a recurrent architecture but once again, the order in which the segments would be generated is not of importance. In general, dealing with a variable number of possible segments must be accounted for in the model!

Ideas:
- Use a permutation invariant loss function:https://openreview.net/pdf?id=rJxpuoCqtQ
  - A major issue is that the number of segments produced for an image is not fixed, whereas this paper assumes the number of features `F` is known.
  - Should we not upper bound the outputs and round each pixel value to the nearest integer as to define its *segment ID*? How sensitive to small weight variantions would our model be? How dependent on the segment ID values temselves will the model be?
- Should we use a Recurrent Architecture?
  - That is: for each image
    1. extract the "largest" label
    2. remove all pixels from input that correspond to the segment meant to be removed
    3. feed in new image and repeat from step 1.
  - Note that we will need to either pre-process many images during training (which will make it slow) or cache multiple variants of an image which remove one segment at a time (can grow large if image has many segments)
    - Also need to pre-process and "order" segments by how "big" they are (need to compute pixel area of each segment and order them before making "inputs" that have the relevant segments removed)
  - We could possibly apply a *Recurrent Pipeline* to our baseline model (k-means) too!
- Say we generate all segments from the image in one-pass of the model (simple autoencoder architecture)
  - How do we deal with segment ID permutations (and even just different IDs in general, as long as the correct distinct segments have distinct IDs)?
  - How do we encode each segment ID?

**KEY IDEA:** What if we followed the convention that the largest segment (other than the background) must have the largest (smallest?) ID? Could this additional rule be learned by the model?
- If yes, then we resolved the issue of segment ID permutation
  - What happens for segments of comparable size? (to think about later)
- Could we then also apply *integer thresholding* (or some other thresholding) to identify each distinct segment? This would avoid the need to use a recurrent architecture

In [None]:
# Code (if any) for pre-processing dataset to account for segment ID permutation

## Data Augmentation (Optional) ##

See if we can apply homographies on RGBD + ground truth segmentation map and if we can add noise to the RGBD images.

In [None]:
# Code for homographic transformation

In [None]:
# Code for adding noise to {RGB, Depth}

# Baseline model: k-means Algorithm #

In [None]:
# Code for k-means algorithm

# Training Pipeline #

In [None]:
# Code for the training pipeline

## Compute Accuracy ##

Computes accuracy of a model given a `DataLoader`. Note that the notion of accuracy is not straightforward for tasks other than classification and thus must handled with care as we attempt to quantify how well our models perform!

In [11]:
def get_classifier_accuracy(model, data_loader, use_cuda=False):
  correct, total = 0, 0

  for inputs, targets in data_loader:
    # Enable CUDA
    if use_cuda and torch.cuda.is_available():
      inputs = inputs.cuda()
      targets = targets.cuda()
    
    # Compute prediction
    outputs = model(inputs)
    preds = outputs.max(1, keepdim=True)[1]

    # Tally
    correct += preds.eq(targets.view_as(preds)).sum().item()
    total += inputs.size(0)
  
  return float(correct) / total

## Train Models ##

The training pipeline is rather consistent among feedforward classifier models. Functions for training will wrap extra desired functionality to accomodate model requirements.

In [None]:
# TODO: add model defined hyperparameters
def model_path(model, batch_size, learning_rate, epoch, root=''):
  path = 'model_{0}_bs{1}_lr{2}_epoch{3}'.format(model.name,
                                                 batch_size,
                                                 learning_rate,
                                                 epoch)
  return os.path.join(dir, path)

In [12]:
def train_classifier(model, train_loader, val_loader, use_cuda=False, 
                     learning_rate=1e-3, num_epochs=20):
  torch.manual_seed(1000)

  # Optimization settings
  criterion = nn.CrossEntropyLoss()
  optimizer = optim.Adam(model.parameters(), lr=learning_rate)

  # Cache loss/accuracy
  train_loss, val_loss = [], []
  train_acc, val_acc = [], []

  print('Training Start!')

  for epoch in range(num_epochs):
    ### TRAIN model ###
    running_loss = 0

    for inputs, targets in train_loader:
      # Enable CUDA
      if use_cuda and torch.cuda.is_available():
        inputs = inputs.cuda()
        targets = targets.cuda()
      
      # Forward pass
      outputs = model(inputs)
      loss = criterion(outputs, targets)

      # Backward pass
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()

      # Update running loss
      running_loss += loss.item() * inputs.size(0)
    
    train_loss.append(running_loss / len(train_loader.dataset))

    ### Compute VALIDATION loss ###
    running_loss = 0

    for inputs, targets in val_loader:
      # Enable CUDA
      if use_cuda and torch.cuda.is_available():
        inputs = inputs.cuda()
        targets = targets.cuda()
      
      # Forward pass
      with torch.no_grad():
        outputs = model(inputs)
      loss = criterion(outputs, targets)

      # Update running loss
      running_loss += loss.item() * inputs.size(0)
    
    val_loss.append(running_loss / len(val_loader.dataset))

    # Compute accuracy
    train_acc.append(get_classifier_accuracy(model, train_loader, use_cuda))
    val_acc.append(get_classifier_accuracy(model, val_loader, use_cuda))

    print('Epoch {} (accuracy): {} train, {} val'.format(epoch, train_acc[-1], val_acc[-1]))

    # Checkpoint model
    # Q: do we want to save checkpoints to our Drive?
    path = model_path(model, train_loader.batch_size, learning_rate, epoch)
    torch.save(model.state_dict(), path)

  ### Plot Training Curves ###
  fig, axs = plt.subplots(2)

  # Loss
  axs[0].plot(train_loss)
  axs[0].plot(val_loss)

  axs[0].legend(('Train', 'Validation'))
  axs[0].set_title('Train / Validation Loss')
  axs[0].set(xlabel='Epochs', ylabel='Loss')

  # Accuracy
  axs[1].plot(train_acc)
  axs[1].plot(val_acc)

  axs[1].legend(('Train', 'Validation'))
  axs[1].set_title('Train / Validation Accuracy')
  axs[1].set(xlabel='Epochs', ylabel='Accuracy')

  plt.tight_layout()

# Model Implementation #

In [None]:
# Code for implementing the propsed model