# Assignment 4 - Part2

In this assignment you will train a semantic segmentation module, and then come up with your own segmentation model.

You can refer to https://github.com/CSAILVision/semantic-segmentation-pytorch for more codes.


# Setup Code

First, download the miniplaces images folder (which we did for last assignment). Zip it, upload it to the assignment folder, and unzip it below.
(The unzipping process will take about 5 minutes.)

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
import os

# TODO: Fill in the Google Drive path where you uploaded the assignment
# Example: If you create a 188 folder and put all the files under Assignment1 folder, then '188/Assignment1'
# GOOGLE_DRIVE_PATH_AFTER_MYDRIVE = '188/Assignment1'
GOOGLE_DRIVE_PATH_AFTER_MYDRIVE = '188/Assignment4'
GOOGLE_DRIVE_PATH = os.path.join('drive', 'My Drive', GOOGLE_DRIVE_PATH_AFTER_MYDRIVE)
print(os.listdir(GOOGLE_DRIVE_PATH))

In [None]:
import sys
sys.path.append(GOOGLE_DRIVE_PATH)

Now we are going to untar the actions folder. Don't worry! This time the file is much smaller.

In [None]:
!tar -xvf "/content/drive/My Drive/188/Assignment4/annotations.tar.xz" -C "/content/drive/My Drive/188/Assignment4/"

In [None]:
!tar -xvf "/content/drive/My Drive/188/Assignment4/images.tar.xz" -C "/content/drive/My Drive/188/Assignment4/"

### Train ResNet + UPerNet
First fill the codes in model.py and train the semantic segmentation module

In [None]:
import os
import time
import random

from model import *
from dataset import *
from train import *
import torch
import torch.nn as nn
from tqdm import tqdm

In [None]:
net_encoder = Resnet().cuda()
net_decoder = UPerNet().cuda()

crit = nn.NLLLoss(ignore_index=-1)

dataset_train = ADEDataset(GOOGLE_DRIVE_PATH, 'training')
dataset_val = ADEDataset(GOOGLE_DRIVE_PATH, 'validation')

train_dataloader = torch.utils.data.DataLoader(
    dataset_train,
    batch_size=8,
    shuffle=True,
    num_workers=2)

val_dataloader = torch.utils.data.DataLoader(
    dataset_val,
    batch_size=8,
    shuffle=True,
    num_workers=2)

optimizer_encoder = torch.optim.SGD(
    group_weight(net_encoder),
    lr=0.02,
    momentum=0.9,
    weight_decay=1e-4)

optimizer_decoder = torch.optim.SGD(
    group_weight(net_decoder),
    lr=0.02,
    momentum=0.9,
    weight_decay=1e-4)

for epoch in range(10): #choose smaller number if you are out of GPU
    loss, acc = train_seg(train_dataloader, crit, net_encoder, net_decoder, optimizer_encoder, optimizer_decoder)

    print ("Epoch %d, trainnig acc %f, training loss %f"%(epoch, acc, loss))

    val_loss, val_acc = val_seg(val_dataloader, crit, net_encoder, net_decoder)
    print ("Epoch %d, validation acc %f, validation loss %f"%(epoch, val_acc, val_loss))

I can get 64% val acc. what about you?

### Visualizaton

In [None]:
# System libs
import os, csv, torch, numpy, scipy.io, PIL.Image, torchvision.transforms

def colorEncode(labelmap, colors, mode='RGB'):
    labelmap = labelmap.astype('int')
    labelmap_rgb = np.zeros((labelmap.shape[0], labelmap.shape[1], 3),
                            dtype=np.uint8)
    for label in numpy.unique(labelmap):
        if label < 0:
            continue
        labelmap_rgb += (labelmap == label)[:, :, np.newaxis] * \
            np.tile(colors[label],
                    (labelmap.shape[0], labelmap.shape[1], 1))

    if mode == 'BGR':
        return labelmap_rgb[:, :, ::-1]
    else:
        return labelmap_rgb
    
colors = scipy.io.loadmat('color150.mat')['colors']
names = {}
with open('objectInfo150.txt') as f:
    lines = f.readlines()

    for line in lines[1:]:
        row = line.strip().split()
        names[int(row[0])] = row[4]

def visualize_result(img, pred, index=None):
    # filter prediction class if requested
    if index is not None:
        pred = pred.copy()
        pred[pred != index] = -1
        print(f'{names[index+1]}:')
        
    # colorize prediction
    pred_color = colorEncode(pred, colors).astype(numpy.uint8)

    # aggregate images and save
    im_vis = numpy.concatenate((img, pred_color), axis=1)
    display(PIL.Image.fromarray(im_vis))

In [None]:
pil_to_tensor = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(
        mean=[0.485, 0.456, 0.406], # These are RGB mean+std values
        std=[0.229, 0.224, 0.225])  # across a large photo dataset.
])
pil_image = PIL.Image.open('ADE_val_00001519.jpg').convert('RGB')
img_original = numpy.array(pil_image)
img_original = imresize(img_original, (128, 128))
img_original = PIL.Image.fromarray(img_original)
img_data = pil_to_tensor(img_original)

singleton_batch = img_data[None].cuda()
output_size = img_data.shape[1:]

In [None]:
net_decoder.use_softmax = True
pred = net_decoder(net_encoder(singleton_batch), segSize=output_size)


In [None]:
_, pred = torch.max(pred, dim=1)
pred = pred.cpu()[0].numpy()

visualize_result(img_original, pred)

## Build your own model

you can replace the encoder and decoder with any model you find in https://github.com/CSAILVision/semantic-segmentation-pytorch.

Train the model.

Compare the results.