# Mitigating Class Imbalance in Aerial Images Segmentation

In [None]:
%%html
<style>
.output_wrapper, .output {
    height:auto !important;
    max-height:9000px;
}
.output_scroll {
    box-shadow:none !important;
    webkit-box-shadow:none !important;
}
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}
.p-Widget.jp-RenderedImage.jp-mod-trusted.jp-OutputArea-output { 
    display: table-cell;
    text-align: center; 
    vertical-align: middle;
}
</style>

In [None]:
%matplotlib inline
import os
import sys
import re
from tqdm import tqdm
import numpy as np
import sklearn.metrics
import scipy.ndimage
import imgviz
from sankey import sankey
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import torch
import torch.nn.functional as F
import torchvision.transforms as transforms
import warnings

warnings.filterwarnings('ignore')
cwd = os.getcwd()
pwd = cwd[:cwd.rfind('/')]
sys.path.append(pwd)
from model import *
from logger import Logger, ModelLogger
from dataloader import DatasetTrainClean, DatasetClean, DatasetVal

## Introduction

Processing aerial images needs to figure out the features in given outdoor scenes including stuff and things. In this project, we specifically look into a very ubiquitous issue in the context of aerial image semantic segmentation, that is, **class imbalance**. 

As their names suggest, aerial images are shot from the sky. So things such as humans and cars in the image become much smaller than everyday photos and thence hard to identify. Such extremely imbalanced distribution is difficult for training neural networks, and the model tends to overlook parts or entire things in images and regards them as unimportant stuffs, which is considered as false negatives. 
This phenomenon is harmful in many realistic applications of semantic segmentation. For example, drones based on the misclassified output may ignore humans or obstacles in the way, and cause bodily injury or property damage. Hence, mitigating the class imbalance issue is an urgent and important task for aerial image segmentation.

<!-- Several techniques have been proposed to address the imbalance issue, which can be mainly divided into two categories. Re-sampling methods modifies the sampling process in the training stage to achieve a less biased instance distribution that are more suitable for the neural networks to learn. Another category is the cost-sensitive approach, which aims to fit the cost of classifying unevenly for classes of different frequencies.  -->
Based on the nature of the aerial dataset, we choose to address the imbalance problem mainly with loss functions designed to differentiate the cost of learning classes. We choose five losses which cover a broad range of designs and are commonly used in classification and segmentation tasks, they are: Cross Entropy, Focal Loss, Dice Loss, IoU Loss, and Tversky Loss. Cross Entropy and Focal Loss are distribution-based loss, while Dice Loss, IoU Loss, and Tversky Loss are region-based loss. Cross Entropy Loss is the baseline that does not consider class imbalance at all. 

In [None]:
issave = True
data_path = '../data/'
trn_dataset = DatasetTrainClean(data_path)
val_dataset = DatasetVal(data_path)
cln_dataset = DatasetClean(data_path)
trn_dataloader = torch.utils.data.DataLoader(trn_dataset, batch_size=1, shuffle=False, num_workers=1)
val_dataloader = torch.utils.data.DataLoader(val_dataset, batch_size=2, shuffle=False, num_workers=1)
cln_dataloader = torch.utils.data.DataLoader(cln_dataset, batch_size=1, shuffle=False, num_workers=1)
cln_dataloader2 = torch.utils.data.DataLoader(cln_dataset, batch_size=2, shuffle=False, num_workers=1)
labs = ['bckgrnd', 'person', 'bike', 'car', 'drone', 'boat', 'animal', 'obstacle', 'constrn', 'plant', 'road', 'sky']
cmap = np.array([
        (  0,   0,   0),  #  Background
        (255, 127,  14),  #  Person
        (  0, 128,   0),  #  Bike
        (152,  78, 163),  #  Car
        (128,   0,   0),  #  Drone
        (  0,   0, 128),  #  Boat
        (192,   0, 128),  #  Animal
        (192,   0,   0),  #  Obstacle
        (192, 128,   0),  #  Construction
        (  0,  64,   0),  #  Plant
        (128, 128,   0),  #  Road
        (  0, 128, 128)   #  Sky
    ])

## Dataset Analysis

In this project, we utilize the AeroScapes dataset published in 2018. The dataset comprises of more than 3,000 images shot by a drone at a relatively low altitude. It contains 11 classes, which mainly belongs to 2 categories: stuff such as vegetation, roads, and sky; and things such as person, bikes, and cars. 

### Example Images

Firstly, we here demonstrate a set of example images from the dataset. From left to right, they are raw images, label maps, and overlayed labels. It can be intuitively observed from the examples that, the things instances are relatively small in most images, only accounting for a small fraction of the total area, which explains why the pixel distribution is imbalanced. 

In [None]:
def plot_rgb(img, label_img):
    img_base = imgviz.color.rgb2gray(img)
    labelviz = imgviz.label2rgb(
            label=label_img, image=img_base, 
            label_names=labs, colormap=cmap, font_size=25, loc="centroid")
    return labelviz

text_kwargs = dict(ha='center', va='center', fontsize=6, color='w')
for i, idx in enumerate([0, 1100]):
    img, label_img = trn_dataset[idx]
    img = img.astype('uint8')
    labelmap = imgviz.label2rgb(label=label_img, colormap=cmap)
    labelviz = plot_rgb(img, label_img)

    plt.figure(dpi=400)
    plt.subplot(1, 3, 1)
    plt.imshow(img)
    plt.text(1200, 50, 'Raw', **text_kwargs)
    plt.axis("off")
    plt.subplot(1, 3, 2)
    plt.imshow(labelmap)
    plt.text(1200, 50, 'Label', **text_kwargs)
    plt.axis("off")
    plt.subplot(1, 3, 3)
    plt.imshow(labelviz)
    plt.text(1200, 50, 'True', **text_kwargs)
    plt.axis("off")
    plt.tight_layout()

### Dataset Distribution

To quantitatively evaluate the imbalance level of the dataset, we demonstrate the relative distribution by both pixel and image occurrence of each class. Here we state an important observation: **The class imbalance is highly dependent on the stuff vs things frequency gap**, that the 5 stuff classes composite the absolute majority with regard to proportion of pixels, while things all have ratios under 0.5%. 

In [None]:
frqn_ins = np.zeros((12), dtype=int)        # Frequency by instances
frqn_pxl = np.zeros((12), dtype=np.float64) # Frequency by pixels
frqn_pos = np.zeros((720, 1280, 12))        # Frequency in position
frqv_ins = np.zeros((12), dtype=int)
frqv_pxl = np.zeros((12), dtype=np.float64)
frqv_pos = np.zeros((720, 1280, 12)) 

if issave:
    mat = np.load('../saved_models/frqn.npz')
    frqn_ins, frqn_pxl, frqn_pos = mat['arr_0'], mat['arr_1'], mat['arr_2']
    mat = np.load('../saved_models/frqv.npz')
    frqv_ins, frqv_pxl, frqv_pos = mat['arr_0'], mat['arr_1'], mat['arr_2']
else:
    for batch_i, (x, y) in enumerate(tqdm(trn_dataloader)):
        y = torch.Tensor(y).long()
        frq = F.one_hot(y, num_classes=12)
        frq = torch.sum(frq, axis=0)
        for c in range(12):
            s = torch.sum(frq[:,:,c]).numpy()
            if s > 0:
                frqn_ins[c] += 1
                frqn_pxl[c] += s
        frqn_pos += frq.numpy()
    np.savez('../saved_models/frqn.npz', frqn_ins, frqn_pxl, frqn_pos)
    for batch_i, (x, y) in enumerate(tqdm(cln_dataloader)):
        y = torch.Tensor(y).long()
        frq = F.one_hot(y, num_classes=12)
        frq = torch.sum(frq, axis=0)
        for c in range(12):
            s = torch.sum(frq[:,:,c]).numpy()
            if s > 0:
                frqv_ins[c] += 1
                frqv_pxl[c] += s
        frqv_pos += frq.numpy()
    np.savez('../saved_models/frqv.npz', frqv_ins, frqv_pxl, frqv_pos)

rtn_ins = frqn_ins / len(trn_dataset)       # Ratio by instances on train
rtn_pxl = frqn_pxl / np.sum(frqn_pxl)       # Ratio by pixels on train
rtv_ins = frqv_ins / len(cln_dataset)       # Ratio by instances on validation
rtv_pxl = frqv_pxl / np.sum(frqv_pxl)       # Ratio by pixels on validation
rt_pxl = (frqn_pxl + frqv_pxl) / np.sum(frqn_pxl + frqv_pxl)
frq_pos = frqn_pos + frqv_pos

In [None]:
fig, ax_pxl = plt.subplots(figsize=(9, 4))
cm = plt.cm.tab20(np.arange(0, 20))
ax_ins = ax_pxl.twinx()

width = 0.2
x = np.array([np.where(np.argsort(rt_pxl)[::-1] == i) for i in range(12)]).reshape(-1)
p1 = ax_pxl.bar(x-width,     rtn_pxl, width=width, color=cm[0], align='center', label='train pixel')
p2 = ax_ins.bar(x,           rtn_ins, width=width, color=cm[1], align='center', label='train image')
p3 = ax_pxl.bar(x+(width*1), rtv_pxl, width=width, color=cm[8], align='center', label='val pixel')
p4 = ax_ins.bar(x+(width*2), rtv_ins, width=width, color=cm[9], align='center', label='val image')

lst = [p1,p2,p3,p4]
ax_pxl.legend(handles=lst, ncol=2, loc=0,
            columnspacing=0.9, handlelength=1.4, handletextpad=0.5,)
ax_pxl.set_xticks(x, labs, fontsize=10)
ax_pxl.set_xlim([-0.5, 11.8])
ax_pxl.set(ylim=[0, 0.5], ylabel='Ratio by Pixel', xlabel='Class')
ax_ins.set(ylim=[0, 1.05], ylabel='Ratio by Image')
fig.tight_layout()

### Average Image 

Despite the analysis from the frequency level, we further investigate the statistics of classes in the image space. To characterize the spacial distribution of things in the image, we calculate the average image per thing class as the heatmap. Compared with average images of other tasks such as road segmentation which show distinct patterns, it is hard to discover any specific hierarchies in the figure. We conclude **that aerial images are in lack of prior knowledge of thing classes**, which is relevant with the diverse viewpoints and scenes of UAVs.

In [None]:
def avgimg_class(lst):
    nclass = len(lst)
    frq_thg = frq_pos[:, :, lst]
    rtn_thg = frq_thg / (len(trn_dataset) + len(cln_dataset))
    rtn_max = np.sum(rtn_thg, axis=2).max()
    rtn_thg = rtn_thg / rtn_max
    img_thg = np.zeros((720, 1280, 3), dtype=np.float64) 
    for c in range(nclass):
        img_thg += rtn_thg[:,:,c][:, :, None] * cmap[lst[c]]  
    lab_thg = np.ones((720, 1280), dtype=int)
    lab_thg[0, :nclass] = lst
    return img_thg, lab_thg, rtn_max

img_thg, lab_thg, rtn_max = avgimg_class(np.arange(1, 8))
labelviz = imgviz.label2rgb(
            label=lab_thg, image=img_thg.astype('uint8'), alpha=0.0,
            label_names=labs, colormap=cmap, font_size=30, loc="rb")

bmap = mpl.cm.copper
norm = mpl.colors.Normalize(vmin=0, vmax=rtn_max)
fig, ax = plt.subplots(dpi=200)
img = plt.imshow(labelviz)
fig.colorbar(mpl.cm.ScalarMappable(norm=norm, cmap=bmap), 
             ax=ax, fraction=0.046*720/1280, pad=0.04, label='Ratio')
ax.axis("off")
fig.tight_layout()

## Training Analysis

We then evaluate the five loss functions on the AeroScapes dataset with three popular segmentation models. The models we use are UNet, DeepLabV3, and DeepLabV3+, all are commonly-used mainstream convolutional segmentation models. We use the five losses to train the models and evaluate the performance by mean Intersection Over Union score. It is notable that all classes contribute to the average score of mIoU. So minority and majority classes are equally important regardless of their pixel number distribution. 

### Training Curves

Since the model training is time-consuming, we here directly load the trained model weights for evaluation. We use DeepLabV3+ trained with Tversky loss as the representative. First we plot the curves during training epochs. It can be seen that the model reaches a highest mIoU of 67%. 

In [None]:
def plot_dual(los_trn, los_val, acc_val):
    fig, ax_acc = plt.subplots(figsize=(8, 5))
    cm = plt.cm.tab20(np.arange(0, 20))

    cl_acc = cm[6]
    l2 = ax_acc.plot(np.arange(1, 31), acc_val, '-', color=cl_acc, label='Val IoU')
    ax_acc.set_xlabel('Train Epoch')
    ax_acc.set_xlim([1, 30])
    ax_acc.set_ylabel('Accuracy (\%)', color=cl_acc)
    ax_acc.set_ylim([0, 75])
    ax_acc.tick_params(axis='y', labelcolor=cl_acc)
    ax_acc.grid(True, ls='--')
    # ax_omg.set_title('$\\theta$ and $\omega$ over time')

    ax_los = ax_acc.twinx()
    cl_los = cm[0]
    l3 = ax_los.plot(np.arange(1, 31), los_trn, '--', color=cl_los, label='Train Loss')
    l4 = ax_los.plot(np.arange(1, 31), los_val, '-', color=cl_los, label='Val Loss')
    ax_los.set_ylabel('Loss', color=cl_los)
    ax_los.set_ylim([0, max(los_trn + los_val)])
    ax_los.tick_params(axis='y', labelcolor=cl_los)
    ax_los.grid(False)
    
    acc_b = max(acc_val)
    epoch_b = acc_val.index(acc_b)
    loss_b = los_val[epoch_b]
    ax_acc.plot([epoch_b, epoch_b], [-5, 105], ':', color=cm[14])
    ax_acc.plot([epoch_b], [acc_b], 'o', color=cl_acc)
    ax_los.plot([epoch_b], [loss_b], 'o', color=cl_los)
    textstr = '\n'.join((
        r'Epoch: %d' % (epoch_b, ),
        r'IoU: %.2f' % (acc_b, ),
        r'Loss: %.2f' % (loss_b, )))
    props = dict(boxstyle='square', ec=[0.8]*3, facecolor='white')
    if epoch_b/40 < 0.8:
        x_txt = epoch_b/40 + 0.03
    else:
        x_txt = epoch_b/40 - 0.18
    ax_acc.text(x_txt, 0.58, textstr, transform=ax_acc.transAxes,
        verticalalignment='center', bbox=props, fontsize=12)
    
    lst = l2 + l3 + l4
    lbst = [l.get_label() for l in lst]
    ax_acc.legend(lst, lbst, ncol=1, 
                  columnspacing=0.9, handlelength=1.4, handletextpad=0.5,
                  fontsize=12, loc=4)

    fig.tight_layout()
    plt.show()
    return fig

def plot_dual_call(dir_save, dir_flag):
    file_log = os.path.join(dir_save, dir_flag, 'log.txt')
    with open(file_log, 'r') as f:
        s = f.read()

    it_tra = re.finditer(
        r' trn loss:(\d+\.\d+), trn acc:', s)
    loss_tra = []
    for i in it_tra:
        loss_tra.append(float(i.group(1)))

    it_val = re.finditer(
        r' val loss:(\d+\.\d+), val iou:(\d+\.\d+)', s)
    loss_val, acc_val = [], []
    for i in it_val:
        loss_val.append(float(i.group(1)))
        acc_val.append(100 * float(i.group(2)))
    
    fig = plot_dual(loss_tra[:30], loss_val[:30], acc_val[:30])
    return fig

In [None]:
dir_save = '../saved_models/'
dir_flag = 'deeplabv3plus/tversky_0.70_0.30_1.00_1118'
fig = plot_dual_call(dir_save, dir_flag)

### Class-wise IoU 

To further investigate how the loss guides the model to learn each class, we track the class-wise IoU during training. It indicates that the criterion has difficulties in recognizing minor classes, especially things with great variety such as animal and obstacle, while some minority classes with distinct characteristics such as car and boat are easier to learn. 
Typically, Tversky losses put more effort in learning minority classes, and their segmentation performances are less entangled with the imbalanced class size. It converges faster on minority labels during training and achieves better accuracy in classes drone and animal. Hence Tversky loss is powerful in balancing learning among classes and achieves the best mIoU. 

In [None]:
def plot_class(iou_lst):
    iou_mat = np.array(iou_lst).T
    fig, ax = plt.subplots(figsize=(8, 5))

    img = plt.imshow(iou_mat, interpolation='nearest', 
                    cmap=plt.cm.coolwarm_r, vmin=0.0, vmax=1.0)
    # set labels
    n_cls, n_epoch = iou_mat.shape
    fig.colorbar(img, ax=ax, fraction=0.046*n_cls/n_epoch, pad=0.04)
    labs = ['bckgrnd', 'person', 'bike', 'car', 'drone', 'boat', 'animal', 'obstacle', 'cnstn', 'plant', 'road', 'sky']
    ax.set(yticks=np.arange(n_cls), xticks=np.arange(4, 31, 5),
            yticklabels=labs, xticklabels=np.arange(5, 31, 5),
            ylabel='Val IoU by Class', xlabel='Train Epoch')

    fig.tight_layout()
    plt.show()
    return fig

def plot_class_call(dir_flag, sav_flag):
    file_log = os.path.join('../saved_models/', dir_flag, 'log.txt')
    with open(file_log, 'r') as f:
        s = f.read()

    it_epoch = re.finditer(
        r'iou class:' + ','.join([r'(\d+\.\d+)' for _ in range(12)]), s)
    iou_lst = []
    for e in it_epoch:
        iou_lst.append([float(e.group(c)) for c in range(1, 13)])

    fig = plot_class(iou_lst)
    return fig

In [None]:
fig = plot_class_call('deeplabv3plus/tversky_0.70_0.30_1.00_1118', 'iou')

## Imbalance Analysis

We then perform particular analysis on how the segmentation perform among the imbalanced classes when trained with different loss functions. We mainly look into the statistics of false negatives and false positives in this section.

### Confusion Matrix

Firstly, we plot the confusion matrix of the model trained with Tversky loss. Its diagonal is the correctly classified pixels, which is inline with the previous plot. 
It is significant to observe the false negative effect in imbalanced learning, that the model tends to regard minority thing pixels as more frequent stuff classes, as they are more possible to appear in the data distribution. Hence the entries of stuff columns above the diagonal are relatively high. 

Our conclusion is that, **Tversky is good at mitigate false negative predictions**. It achieves better performance on things especially those with lowest IoUs including bike, animal and obstacle. It is as well effective in reducing the false positive background pixels. 

In [None]:
mname = 'deeplabv3plus'
flag_run = "tversky_0.70_0.30_1.00_1118"
logger = Logger(save_path='../saved_models/', prj_name=mname, flag_run=flag_run)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def restore_model(mname, logger):
    assert logger.path_existed, f"Path {logger.dir_save} not found"
    model_logger = ModelLogger(logger, state_only=True)
    model_logger.metric_name = 'iou'
    model = get_model(mname)
    model = model_logger.load_model('best', model=model).to(device)
    model.eval()
    return model

model = restore_model(mname, logger)
if issave:
    cfms = np.load(logger.path_join('conf_mat.npy'))
else:
    cfms = np.zeros((12, 12), dtype=np.float64)
    for batch_i, (xy_val, xy_cln) in enumerate(zip(val_dataloader, cln_dataloader)):
        with torch.no_grad():
            xs, _ = xy_val
            pred_mask = model(xs.to(device))
        pred_mask = torch.softmax(pred_mask, dim=1)
        y_pred = torch.argmax(pred_mask, dim=1)
        y_pred = y_pred.cpu()
        # transfer to label map
        h, w = y_pred.shape[1:]
        y_pred = transforms.functional.crop(y_pred, 8, 0, h-16, w)
        # get true label
        _, y_true = xy_cln

        y_pred = y_pred.reshape(-1).numpy()
        y_true = y_true.reshape(-1).numpy()
        cfm = sklearn.metrics.confusion_matrix(y_true, y_pred, labels=range(12))
        cfms += cfm
    np.save(logger.path_join('conf_mat.npy'), cfms)

def plot_confmat(cfm_rn):
    fig, ax = plt.subplots(figsize=(8, 5))

    img = plt.imshow(cfm_rn, interpolation='nearest', 
                    cmap=plt.cm.Blues, vmin=0.0, vmax=1.0)
    # set labels
    n_cls = cfm_rn.shape[0]
    fig.colorbar(img, ax=ax, fraction=0.046, pad=0.04)
    labs = ['bckgrnd', 'person', 'bike', 'car', 'drone', 'boat', 'animal', 'obstacle', 'constrn', 'plant', 'road', 'sky']
    ax.set(yticks=np.arange(n_cls), yticklabels=labs,
            ylabel='True Label', xlabel='Predicted Label')
    ax.set_xticks(np.arange(n_cls), labs, rotation='vertical')

    fig.tight_layout()
    plt.show()
    return fig

In [None]:
cfm_rn = cfms / cfms.sum(axis=1, keepdims=True)
fig = plot_confmat(cfm_rn)

### Sankey Plot

The confusion matrix is only concerned with the statistical IoU among classes, and cannot dive into the pixel level. Hence, we draw the Sankey plot as an approach to study how pixels in the aerial images are classified by the segmentation model. It shows the absolute value distribution of pixels misclassified to each class by the model. 
It can be observed that classes with more complex patterns such as obstacle, construction and plant are more likely to be misclassified, while labels of less distinct patterns including road and sky tend to receive more wrong predictions. 
the advantages of Tversky loss is represented by that it **reduces the number of misclassified pixels as things in ground truth**. 

In [None]:
row, col = [], []
cfm_err = np.zeros(12*11)
for r in range(12):
    for c in range(12):
        if r != c:
            row.append(labs[r])
            col.append(labs[c])
            cfm_err[r*11+c] = cfms[r, c]
cdict = {labs[i]: np.append(cmap[i] / 255, 1.) for i in range(12)}
cdict['bckgrnd'] = np.array((128/255, 128/255, 128/255, 1.))
labbak  = ['person', 'bike', 'car', 'drone', 'boat', 'animal', 'obstacle', 'constrn', 'plant', 'road', 'sky', 'bckgrnd']
fig = sankey(left=row, right=col, leftWeight=cfm_err, colorDict=cdict, aspect=10,
             leftLabels=list(reversed(labbak)), rightLabels=list(reversed(labbak)), fontsize=8)

## Case Study

### Good Cases

Finally we conduct case studies. For good cases, we state that region-based losses are better in deciding the boundaries of classes while identifying small things in the image. **Tversky loss is particularly good in figuring out small things**, such as the obstacle-person-bike triplet. 

In [None]:
def plot_case(idx):
    # get image
    xs, ys = [], []
    for i in idx:
        x, y = val_dataset[i]
        xs.append(x)
        ys.append(y)
    xs = torch.stack(xs)
    ys = torch.stack(ys)
    # get prediction
    with torch.no_grad():
        pred_mask = model(xs.to(device))
    pred_mask = torch.softmax(pred_mask, dim=1)
    pred = torch.argmax(pred_mask, dim=1)
    pred = pred.cpu()
    # transfer to label map
    h, w = pred.shape[1:]
    pred = transforms.functional.crop(pred, 8, 0, h-16, w)

    for batch_i, i in enumerate(idx):
        img, label_img = cln_dataset[i]
        img = img.astype('uint8')
        labelviz = plot_rgb(img, label_img)
        label_pred = pred[batch_i].numpy()
        predviz = plot_rgb(img, label_pred)

        plt.figure(dpi=400)
        plt.subplot(1, 2, 1)
        plt.imshow(labelviz)
        plt.text(1200, 50, 'True', **text_kwargs)
        plt.axis("off")
        plt.subplot(1, 2, 2)
        plt.imshow(predviz)
        plt.text(1200, 50, 'Predict', **text_kwargs)
        plt.axis("off")
        plt.tight_layout()

In [None]:
plot_case([300, 400, ])

### Bad Cases

There are also some cases where loss functions fail, mostly images with complicated semantics or bad ground truth labeling. For example, in the first row, the loss cannot distinguish the border between road, plant, and background. It also generate false positive class as well due to the noisy patterns in the image. 
In the second example, as the scene is too complex to understand, it fails in identifying the entire structure of animal and obstacle objects in the image. Additionally, it presents certain levels of false positive that expand plant pixels to the background. We suggest this may be caused by the green color of water background in the picture.

In [None]:
plot_case([100, 200, ])

## Discussion

As conclusion, we state the pros and cons of our method: 

* To the best of our knowledge, we are the first work of introducing various loss functions to particularly address the class imbalance problem in aerial image dataset. Compared to solutions such as ensemble learning, our approach does not require additional efforts on specialized model structure, while being simple and effective. It can also be easily generalized and integrated with other methods. 

* We perform comprehensive analysis on dataset, loss functions, and the experimental results. We provide insightful explanations on why Tversky loss performs the best in mIoU. 

* In the future, more carefully-designed loss functions can be explored, and model architectural improvements can be introduced, to further improve performance on imbalanced datasets.