Objective3:
# DeepLab3+ Model
DeepLabv3+ utilizes an encoder-decoder structure to perform image segmentation. The encoder extracts shallow and high-level semantic information from the image, while the decoder combines low-level and high-level features to improve the accuracy of segmentation boundaries and classify the semantic information of different pixels [Chen et al., (2018)](https://link.springer.com/content/pdf/10.1007/978-3-030-01234-2_49.pdf?pdf=inline%20link).

This project is based on the improved classis DeepLabv3+ network model proposed by [Chen et al.,(2023)](https://link.springer.com/content/pdf/10.1007/s40747-023-01304-z.pdf).

## Architecture of improved DeepLabv3+ with MobileNetv2 backbone

**`A. Encoder`**
 1. `Backbone` : lightweight network `MobileNetv2` [Sandler et al., (2019)](https://arxiv.org/pdf/2111.12419) in place of Xception.
 2. `ASPP` : `Hybrid Dialted Convolution` (HDC) module to alleviate the gridding effect. In addition,  `Strip Pooling Module` is used instead of spatial mean pooling to improve th elocal segmentation effect.
 3. `Normalization-based Attention Module` (NAM): This lightweight attention mechanism is also applied to the stacked compressed high-level feature maps to help improve the segmentation accuracy of the image.

**`B. Decoder`**
1. `NAM`: The seventh layer feature with `NAM` attention [Liu et al., (2021)](https://arxiv.org/pdf/1801.04381v4) is upsampled to the same size as the fourth layer feature after fusion and channel adjustment.
2. `ResNet50`: This module is added to obtain riccher low-level target feature information.
3. `Concatenate`: The **deep features** and **shallow features** are concatenated as in the original model.
4. `Upsampling`: After a 3 X 3 convolution and 4 X `upsampling`, the image is restored to its original size.


 [Architecture Image]()

# 1. Data Preparation

In [42]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [13]:
# install required packages

%%capture
!pip install rasterio


In [15]:
# Import required packages
import os
from pathlib import Path

from datetime import datetime, timedelta
import tqdm # Adds a smart progress meter to any iterable or file operation

import math
import random
import pandas as pd
import numpy as np


import cv2
import rasterio
#  defines a rectangular area within the raster using four properties
# xoff, yoff, width, height
from rasterio.windows import Window


import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torch.autograd import Variable
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import _LRScheduler
from torch.utils.tensorboard import SummaryWriter

import torchvision.models as pretrainmodels

import logging
import pickle
from datetime import datetime
import itertools


from IPython.core.debugger import set_trace # Insert a breakpoint into the code
from IPython.display import Image

import matplotlib.pyplot as plt

## Utility Functions

In [16]:
class InputError(Exception):
    '''
    Exception raised for errors in the input
    '''
    def __init__(self, message):
        '''
        Params:
            message (str): explanation of the error
        '''
        self.message = message
    def __str__(self):
        '''
        Define message to return when error is raised
        '''
        if self.message:
            return 'InputError, {} '.format(self.message)
        else:
            return 'InputError'
# =============================================================================
def load_data(data_path, usage="train", window=None, norm_stats_type=None,
              is_label=False):
    '''
    Read geographic data into numpy array
    Params:
        data_path : str
            Path of data to load
        usage : str
            Usage of the data: "train", "validate", or "predict"
        window : tuple
            The view onto a rectangular subset of the data, in the format of
            (column offsets, row offsets, width in pixel, height in pixel)
        norm_stats_type : str
            How the normalization statistics is calculated.
        is_label : binary
            Decide whether to saturate data with tested threshold
    Returns:
        narray
    '''
    # Open the data file using the 'rasterio' library
    with rasterio.open(data_path, "r") as src:
      # Check if the data is a label (segmentation mask)
        if is_label:
            if src.count != 1:  # Ensure the label has a single channel
                raise InputError("Label shape not applicable: \
                                expected 1 channel")
            img = src.read(1)  # Read the single channel of the label data

        else:
        # Store the value representing 'no data' in the image
            nodata = src.nodata
            # Verify normalization type is valid
            assert norm_stats_type in ["local_per_tile", "local_per_band",
                                      "global_per_band"]

            if norm_stats_type == "local_per_tile":
              # Apply per-tile normalization
                img = mmnorm1(src.read(), nodata=nodata)
            elif norm_stats_type == "local_per_band":
              # Per-band normalization, clipping values
                img = mmnorm2(src.read(), nodata=nodata, clip_val=1.5)
            elif norm_stats_type == "global_per_band":
              # Global per-band normalization, clipping values
                img = mmnorm3(src.read(), nodata=nodata, clip_val=1.5)

            # For 'train' or 'validate' subsets
            if usage in ['train', 'validate']:
              # Extract a specific window from the image
                img = img[:, max(0, window[1]): window[1] + window[3],
                          max(0, window[0]): window[0] + window[2]]

    return img  # Return the processed image or label data
# ==============================================================================

def get_stacked_img(img_paths, usage, norm_stats_type="local_per_tile",
                    window=None):
    '''
    Read geographic data into numpy array
    Params:
        gsPath :str
            Path of growing season image
        osPath : str
            Path of off season image
        img_paths : list
            List of paths for imgages
        usage : str
            Usage of the image: "train", "validate", or "predict"
        norm_stats_type : str
            How the normalization statistics is calculated.
        window : tuple
            The view onto a rectangular subset of the data, in the
            format of (column offsets, row offsets, width in pixel, height in
            pixel)
    Returns:
        ndarray
    '''

    if len(img_paths) > 1:  # If there are multiple image paths:
      img_ls = [load_data(m, usage, window, norm_stats_type) for m in img_paths]
      # Load data for each image path, potentially applying normalization
      img = np.concatenate(img_ls, axis=0).transpose(1, 2, 0)
      # Combine the loaded data into a single array and rearrange dimensions
    else:  # If there's only a single image path:
      # Load data for the single image path and rearrange dimensions
      img = load_data(img_paths[0], usage, \
                      window, norm_stats_type).transpose(1, 2, 0)

    # For 'train' or 'validate' subsets:
    if usage in ["train", "validate"]:
      # Extract window parameters
      col_off, row_off, col_target, row_target = window
      row, col, c = img.shape  # Get image dimensions

      # Check if image is smaller than the target window
      if row < row_target or col < col_target:
          row_off = abs(row_off) if row_off < 0 else 0  # Adjust offsets
          col_off = abs(col_off) if col_off < 0 else 0

          # Create a larger blank canvas
          canvas = np.zeros((row_target, col_target, c))
          # Place image onto canvas
          canvas[row_off: row_off + row, col_off : col_off + col, :] = img
          return canvas  # Return the canvas with the padded image

      else:
          return img  # The image fits the window, so return it directly

    elif usage == "predict":  # For prediction purposes:
      return img  # Return the image as is

    else:
      raise ValueError  # Invalid 'usage' value

# ==============================================================================
def get_buffered_window(src_path, dst_path, buffer):
    '''
    Get bounding box representing subset of source image that overlaps with
    bufferred destination image, in format of (column offsets, row offsets,
    width, height)

    Params:
        src_path : str
            Path of source image to get subset bounding box
        dst_path : str
            Path of destination image as a reference to define the
            bounding box. Size of the bounding box is
            (destination width + buffer * 2, destination height + buffer * 2)
        buffer :int
            Buffer distance of bounding box edges to destination image
            measured by pixel numbers

    Returns:
        tuple in form of (column offsets, row offsets, width, height)
    '''

    with rasterio.open(src_path, "r") as src:
        gt_src = src.transform

    with rasterio.open(dst_path, "r") as dst:
        gt_dst = dst.transform
        w_dst = dst.width
        h_dst = dst.height

    col_off = round((gt_dst[2] - gt_src[2]) / gt_src[0]) - buffer
    row_off = round((gt_dst[5] - gt_src[5]) / gt_src[4]) - buffer
    width = w_dst + buffer * 2
    height = h_dst + buffer * 2

    return col_off, row_off, width, height

# ==============================================================================

def get_meta_from_bounds(file, buffer):
    '''
    Get metadata of unbuffered region in given file
    Params:
        file (str):  File name of a image chip
        buffer (int): Buffer distance measured by pixel numbers
    Returns:
        dictionary
    '''

    with rasterio.open(file, "r") as src:

        meta = src.meta
        dst_width = src.width - 2 * buffer
        dst_height = src.height - 2 * buffer

        window = Window(buffer, buffer, dst_width, dst_height)
        win_transform = src.window_transform(window)

    meta.update({
        'width': dst_width,
        'height': dst_height,
        'transform': win_transform,
        'count': 1,
        'nodata': -128,
        'dtype': 'int8'
    })

    return meta


# ==============================================================================
def display_hist(img):
    '''
    Display data distribution of input image in a histogram
    Params:
        img (narray): Image in form of (H,W,C) to display data distribution
    '''

    img = mmnorm1(img)
    im = np.where(img == 0, np.nan, img)

    plt.hist(img.ravel(), 500, [np.nanmin(im), img.max()])
    plt.figure(figsize=(20, 20))
    plt.show()

# ==============================================================================
def mmnorm1(img, nodata):
    '''
    Data normalization with min/max method
    Params:
        img (narray): The targeted image for normalization
    Returns:
        narrray
    '''

    img_tmp = np.where(img == nodata, np.nan, img)
    img_max = np.nanmax(img_tmp)
    img_min = np.nanmin(img_tmp)
    normalized = (img - img_min) / (img_max - img_min)
    normalized = np.clip(normalized, 0, 1)

    return normalized

# ------------------------------------------------------------------------------
def mmnorm2(img, nodata, clip_val=None):
    r"""
    Normalize the input image pixels to [0, 1] ranged based on the
    minimum and maximum statistics of each band per tile.
    Arguments:
            img : numpy array
                Stacked image bands with a dimension of (C,H,W).
            nodata : str
                Value reserved to represent NoData in the image chip.
            clip_val : int
                Defines how much of the distribution tails to be cut off.
    Returns:
            img : numpy array
                Normalized image stack of size (C,H,W).
    Note 1: If clip then min, max are calculated from the clipped image.
    """

    # filter out zero pixels in generating statistics.
    nan_corr_img = np.where(img == nodata, np.nan, img)
    nan_corr_img = np.where(img == 0, np.nan, img)

    if clip_val > 0:
        left_tail_clip = np.nanpercentile(nan_corr_img, clip_val)
        right_tail_clip = np.nanpercentile(nan_corr_img, 100 - clip_val)

        left_clipped_img = np.where(img < left_tail_clip, left_tail_clip, img)
        clipped_img = np.where(left_clipped_img > right_tail_clip,
                               right_tail_clip, left_clipped_img)

        normalized_bands = []
        for i in range(img.shape[0]):
            band_min = np.nanmin(clipped_img[i, :, :])
            band_max = np.nanmax(clipped_img[i, :, :])
            normalized_band = (clipped_img[i, :, :] - band_min) /\
                (band_max - band_min)
            normalized_bands.append(np.expand_dims(normalized_band, 0))
        normal_img = np.concatenate(normalized_bands, 0)

    elif clip_val == 0 or clip_val is None:
        normalized_bands = []
        for i in range(img.shape[0]):
            band_min = np.nanmin(nan_corr_img[i, :, :])
            band_max = np.nanmax(nan_corr_img[i, :, :])
            normalized_band = (nan_corr_img[i, :, :] - band_min) /\
                (band_max - band_min)
            normalized_bands.append(np.expand_dims(normalized_band, 0))
        normal_img = np.concatenate(normalized_bands, 0)

    else:
        raise ValueError("clip must be a non-negative decimal.")

    normal_img = np.clip(normal_img, 0, 1)
    return normal_img

# ------------------------------------------------------------------------------
def mmnorm3(img, nodata, clip_val=None):
    hardcoded_stats = {
        "mins": np.array([331.0, 581.0, 560.0, 1696.0]),
        "maxs": np.array([1403.0, 1638.0, 2076.0, 3652.0])
    }

    num_bands = img.shape[0]
    mins = hardcoded_stats["mins"]
    maxs = hardcoded_stats["maxs"]

    if clip_val:
        normalized_bands = []
        for i in range(num_bands):
            nan_corr_img = np.where(img[i, :, :] == nodata, np.nan,
                                    img[i, :, :])
            nan_corr_img = np.where(img[i, :, :] == 0, np.nan, img[i, :, :])
            left_tail_clip = np.nanpercentile(nan_corr_img, clip_val)
            right_tail_clip = np.nanpercentile(nan_corr_img, 100 - clip_val)
            left_clipped_band = np.where(img[i, :, :] < left_tail_clip,
                                         left_tail_clip, img[i, :, :])
            clipped_band = np.where(left_clipped_band > right_tail_clip,
                                    right_tail_clip, left_clipped_band)
            normalized_band = (clipped_band - mins[i]) / (maxs[i] - mins[i])
            normalized_bands.append(np.expand_dims(normalized_band, 0))
        img = np.concatenate(normalized_bands, 0)

    else:
        for i in range(num_bands):
            img[i, :, :] = (img[i, :, :] - mins[i]) / (maxs[i] - mins[i])

    img = np.clip(img, 0, 1)
    return img

# ==============================================================================
def get_chips(img, dsize, buffer):
    '''
    Generate small chips from input images and the corresponding index of each
    chip The index marks the location of corresponding upper-left pixel of a
    chip.
    Params:
        img (narray): Image in format of (H,W,C) to be crop, in this case it is
            the concatenated image of growing season and off season
        dsize (int): Cropped chip size
        buffer (int):Number of overlapping pixels when extracting images chips
    Returns:
        list of cropped chips and corresponding coordinates
    '''

    h, w, _ = img.shape
    x_ls = range(0,h - 2 * buffer, dsize - 2 * buffer)
    y_ls = range(0, w - 2 * buffer, dsize - 2 * buffer)

    index = list(itertools.product(x_ls, y_ls))

    img_ls = []
    for i in range(len(index)):
        x, y = index[i]
        img_ls.append(img[x:x + dsize, y:y + dsize, :])

    return img_ls, index


# ==============================================================================
def display(img, label, mask):

    '''
    Display composites and their labels
    Params:
        img (torch.tensor): Image in format of (C,H,W)
        label (torch.tensor): Label in format of (H,W)
        mask (torch.tensor): Mask in format of (H,W)
    '''

    gsimg = (comp432_dis(img, "GS") * 255).permute(1, 2, 0).int()
    osimg = (comp432_dis(img, "OS") * 255).permute(1, 2, 0).int()


    _, figs = plt.subplots(1, 4, figsize=(20, 20))

    label = label.cpu()

    figs[0].imshow(gsimg)
    figs[1].imshow(osimg)
    figs[2].imshow(label)
    figs[3].imshow(mask)

    plt.show()

# ==============================================================================
# color composite
def comp432_dis(img, season):
    '''
    Generate false color composites
    Params:
        img (torch.tensor): Image in format of (C,H,W)
        season (str): Season of the composite to generate, be  "GS" or "OS"
    '''

    viewsize = img.shape[1:]

    if season == "GS":

        b4 = mmnorm1(img[3, :, :].cpu().view(1, *viewsize),0)
        b3 = mmnorm1(img[2, :, :].cpu().view(1, *viewsize),0)
        b2 = mmnorm1(img[1, :, :].cpu().view(1, *viewsize),0)

    elif season == "OS":
        b4 = mmnorm1(img[7, :, :].cpu().view(1, *viewsize), 0)
        b3 = mmnorm1(img[6, :, :].cpu().view(1, *viewsize), 0)
        b2 = mmnorm1(img[5, :, :].cpu().view(1, *viewsize), 0)

    else:
        raise ValueError("Bad season value")

    img = torch.cat([b4, b3, b2], 0)

    return img

# ==============================================================================
def make_reproducible(seed=42, cudnn=True):
    """Make all the randomization processes start from a shared seed"""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.random.manual_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    if cudnn:
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
# ==============================================================================

def pickle_dataset(dataset, file_path):
    with open(file_path, "wb") as fp:
              pickle.dump(dataset, fp)

# ------------------------------------------------------------------------------
def load_pickle(file_path):
    return pd.read_pickle(file_path)

# ==============================================================================
def progress_reporter(msg, verbose, logger=None):
    """Helps control print statements and log writes
    Parameters
    ----------
    msg : str
      Message to write out
    verbose : bool
      Prints or not to console
    logger : logging.logger
      logger (defaults to none)

    Returns:
    --------
        Message to console and or log
    """

    if verbose:
        print(msg)

    if logger:
        logger.info(msg)

# ==============================================================================
def setup_logger(log_dir, log_name, use_date=False):
    """Create logger
    """
    if use_date:
        dt = datetime.now().strftime("%d%m%Y_%H%M")
        log = "{}/{}_{}.log".format(log_dir, log_name, dt)
    else:
        log = "{}/{}.log".format(log_dir, log_name)

    for handler in logging.root.handlers[:]:
        logging.root.removeHandler(handler)
    log_format = (
        f"%(asctime)s::%(levelname)s::%(name)s::%(filename)s::"
        f"%(lineno)d::%(message)s"
    )
    logging.basicConfig(filename=log, filemode='w',
                        level=logging.INFO, format=log_format)

    return logging.getLogger()

# Data Preparation
1. Find the suitable `image dataset` to apply `improved DeepLabv3+ model` for the image segmentation process.
  - For this task, we used image dataset that was used in `S. Khallaghi, (2024) ch. 2`.
2. Prepare the `labels (pixel-wise annotations)` that are compatible with selected image dataset.
  - For this task, we filtered [all_class_cataloge](/content/gdrive/MyDrive/adleo/project_data/label_catalog_allclasses.csv) using methods and functions given in [notebook](https://github.com/paudelsushil/labelcombinations/blob/main/Make_Labels_ADLEO_Final.ipynb) and prepared final [filtered cataloge](/content/gdrive/MyDrive/adleo/project_data/label-catalog-filtered.csv) to get our pixel-wise annotations as a [lable images](/content/gdrive/MyDrive/adleo/project_data/labels).


In [17]:
# Check for GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using {device} device")

Using cpu device


## Defining the Dataset for training, validating, and testing the model

In [18]:
# Define the datasets source and workingFolder source
src_dir = "/content/gdrive/MyDrive/adleo/project_data"

WorkingFolder = "/content/gdrive/MyDrive/adleo/project_data"


# Define the path for images, lables, and cataloge
# Image path
img_paths = list(Path(os.path.join(src_dir, "images")).glob("*.tif"))

# Label path
lbl_paths = list(Path(os.path.join(src_dir, "labels")).glob("*.tif"))

# Label cataloge
cataloge = pd.read_csv(os.path.join(src_dir, "label-catalog-filtered.csv"))

# Check if all paths are valid lists
if not all(isinstance(path_list, list) for path_list in (img_paths, lbl_paths)):
    raise ValueError("Both image_paths and label_paths must be lists.")

# Prints valid number of images and labels
print("No. of images:",len(img_paths), "\n",
      "No. of labels:", len(lbl_paths),"\n",
      "No. of rows in cataloge:", len(cataloge))



  cataloge = pd.read_csv(os.path.join(src_dir, "label-catalog-filtered.csv"))


No. of images: 33873 
 No. of labels: 33756 
 No. of rows in cataloge: 33746


## Pre-process Datasets:
1. `Resize images` to a standard size suitable for model.

2. `Normalize pixel values`(e.g., scale to range 0-1 or subtract mean).

3. `Image Augmentation`  (e.g., random flipping, cropping).


### Image Normalization

In [19]:
def min_max_normalize_image(image, dtype=np.float32):
    """
    image_path(str) : Absolute path to the image patch.
    dtype (numpy datatype) : data type of the normalized image default is
    "np.float32".
    """

    # Calculate the minimum and maximum values for each band
    min_values = np.nanmin(image, axis=(1, 2))[:, np.newaxis, np.newaxis]
    max_values = np.nanmax(image, axis=(1, 2))[:, np.newaxis, np.newaxis]

    # Normalize the image data to the range [0, 1]
    normalized_img = (image - min_values) / (max_values - min_values)

    # Return the normalized image data
    return normalized_img

### Image Augmentation

In [20]:
def flip_image_and_label(image, label, flip_type):
    """
    Applies horizontal or vertical flip augmentation to an image patch and label

    Args:
        image (numpy array) : The input image patch as a numpy array.
        label (numpy array) : The corresponding label as a numpy array.
        flip_type (string) : Based on the direction of flip. Can be either
            'hflip' or 'vflip'.

    Returns:
        A tuple containing the flipped image patch and label as numpy arrays.
    """
    if flip_type == 'hflip':
        # Apply horizontal flip augmentation to the image patch
        flipped_image = cv2.flip(image, 1)

        # Apply horizontal flip augmentation to the label
        flipped_label = cv2.flip(label, 1)

    elif flip_type == 'vflip':
        # Apply vertical flip augmentation to the image patch
        flipped_image = cv2.flip(image, 0)

        # Apply vertical flip augmentation to the label
        flipped_label = cv2.flip(label, 0)

    else:
        raise ValueError("Flip direction must be 'horizontal' or 'vertical'.")

    # Return the flipped image patch and label as a tuple
    return flipped_image.copy(), flipped_label.copy()


def rotate_image_and_label(image, label, angle):
    """
    Applies rotation augmentation to an image patch and label.

    Args:
        image (numpy array) : The input image patch as a numpy array.
        label (numpy array) : The corresponding label as a numpy array.
        angle (lost of floats) : If the list has exactly two elements they will
            be considered the lower and upper bounds for the rotation angle
            (in degrees) respectively. If number of elements are bigger than 2,
            then one value is chosen randomly as the roatation angle.

    Returns:
        A tuple containing the rotated image patch and label as numpy arrays.
    """
    if isinstance(angle, tuple) or isinstance(angle, list):
        if len(angle) == 2:
            rotation_degree = random.uniform(angle[0], angle[1])
        elif len(angle) > 2:
            rotation_degree = random.choice(angle)
        else:
            raise ValueError("Parameter degree needs at least two elements.")
    else:
        raise ValueError(
            "Rotation bound param for augmentation must be a tuple or list."
        )

    # Define the center of the image patch
    center = tuple(np.array(label.shape)/2.0)

    # Define the rotation matrix
    rotation_matrix = cv2.getRotationMatrix2D(center, rotation_degree, 1.0)

    # Apply rotation augmentation to the image patch
    rotated_image = cv2.warpAffine(image, rotation_matrix, image.shape[:2],
                                   flags=cv2.INTER_LINEAR)

    # Apply rotation augmentation to the label
    rotated_label = cv2.warpAffine(label, rotation_matrix, label.shape[:2],
                                   flags=cv2.INTER_NEAREST)

    # Return the rotated image patch and label as a tuple
    return rotated_image.copy(), np.rint(rotated_label.copy())

## Get center index fo each smaller chips

In [21]:
def patch_center_index(cropping_ref, patch_size, overlap, usage,
                       positive_class_threshold=None, verbose=True):
    """
    Generate index to divide the scene into small chips.
    Each index marks the location of corresponding chip center.
    Arguments:
        cropping_ref (list) : Reference raster layers, to be used to generate
            the index. In our case, it is study area binary mask and label mask.
        patch_size (int) : Size of each clipped patches.
        overlap (int) : amount of overlap between the extracted chips.
        usage (str) : Either 'train', 'val'. Chipping strategy is different for
            different usage.
        positive_class_threshold (float) : A real value as a threshold for the
            proportion of positive class to the total areal of the chip. Used to
            decide if the chip should be considered as a positive chip in the
            sampling process.
    verbose (binary) : If set to True prints on screen the detailed list of
            center coordinates of the sampled chips.
    Returns:
        proportional_patch_index : A list of index recording the center of
        patches to extract from the input
    """

    assert usage in ["train", "validation", "inference"]

    if usage == "inference":
        mask = cropping_ref
    else:
        mask, label = cropping_ref

    half_size = patch_size // 2
    step_size = patch_size - 2 * overlap

    proportional_patch_index = []
    non_proportional_patch_index = []
    neg_patch_index = []

    # Get the index of all the non-zero elements in the mask.
    x = np.argwhere(mask)

    # First col of x shows the row indices (height) of the mask layer
    # (iterate over the y axis or latitude).
    x_min = min(x[:, 0]) + half_size
    x_max = max(x[:, 0]) - half_size
    # Second col of x shows the column indices (width) of the mask layer
    # (iterate over the x axis or longitude).
    y_min = min(x[:, 1]) + half_size
    y_max = max(x[:, 1]) - half_size

    # Generate index for the center of each patch considering the proportion of
    # each category falling into each patch.
    for j in range(y_min, y_max + 1, step_size):

        for i in range(x_min, x_max + 1, step_size):

            # Split the mask and label layers into patches based on the index of
            # the center of the patch
            mask_ref = mask[i - half_size: i + half_size,
                            j - half_size: j + half_size]
            if usage != "inference":
                label_ref = label[i - half_size: i + half_size,
                                  j - half_size: j + half_size]

            if (usage == "train") and mask_ref.all():

                if label_ref.any() != 0:
                    pond_ratio = np.sum(label_ref == 1) / label_ref.size
                    if pond_ratio >= positive_class_threshold:
                        proportional_patch_index.append([i, j])
                else:
                    neg_patch_index.append([i, j])

            if (usage == "validation") and (label_ref.any() != 0) \
                and mask_ref.all():
                non_proportional_patch_index.append([i, j])

            if (usage == "inference") and (mask_ref.any() != 0):
                non_proportional_patch_index.append([i, j])

    if usage == "train":

        num_negative_samples = min(
            math.ceil(0.2 * len(proportional_patch_index)), 15
        )
        neg_samples = random.sample(neg_patch_index, num_negative_samples)

        proportional_patch_index.extend(neg_samples)

    # For test set use the indices generated from mask without
    # considering the class proportions.
    if usage in ["validation", "inference"]:
        proportional_patch_index = non_proportional_patch_index

    if verbose:
        print("Number of patches:", len(proportional_patch_index))
        print("Patched from:\n{}".format(proportional_patch_index))

    return proportional_patch_index

## Active dataset loading pipeline


In [24]:
class datasetloader(Dataset):
    def __init__(self, src_dir, usage, dataset_name=None,
                 apply_normalization=False, transform=None, csv_name=None,
                 patch_size=None, overlap=None, catalog_index=None):
        r"""
        src_dir (str or path): Root of resource directory.
        dataset_name (str): Name of the training/validation dataset containing
                              structured folders for image, label
        usage (str): Either 'train' or 'validation'.
        transform (list): Each element is string name of the transformation to
            be used.
        """
        self.src_dir = src_dir
        self.dataset_name = dataset_name
        self.csv_name = csv_name
        self.apply_normalization = apply_normalization
        self.transform = transform
        self.patch_size = patch_size
        self.overlap = overlap

        self.usage = usage
        assert self.usage in ["train", "validation", "inference"], \
            "Usage is not recognized."

        if self.usage in ["train", "validation"]:
            assert self.dataset_name is not None
            img_dir = Path(src_dir) / self.dataset_name / self.usage / "bands"
            img_fnames = [Path(dirpath) / f
                          for (dirpath, dirnames, filenames) in os.walk(img_dir)
                          for f in filenames if f.endswith(".tif")]
            img_fnames.sort()

            lbl_dir = Path(src_dir) / self.dataset_name / self.usage / "labels"
            lbl_fnames = [Path(dirpath) / f
                          for (dirpath, dirnames, filenames) in os.walk(lbl_dir)
                          for f in filenames if f.endswith(".tif")]
            lbl_fnames.sort()

            self.img_chips = []
            self.lbl_chips = []

            for img_path, lbl_path in tqdm.tqdm(zip(img_fnames, lbl_fnames),
                                                total=len(img_fnames)):
                img_chip = load_data(
                    img_path, is_label=False,
                    apply_normalization=self.apply_normalization
                )
                img_chip = img_chip.transpose((1, 2, 0))

                lbl_chip = load_data(lbl_path, is_label=True)

                self.img_chips.append(img_chip)
                self.lbl_chips.append(lbl_chip)

            print('--------------{} patches cropped--------------'\
                  .format(len(self.img_chips)))

        # This part handles prediction dataset
        else:
            assert self.csv_name is not None

            ##### Add your code to read the "csv" file. (Expected 1 line)
            catalog = pd.read_csv(os.path.join(self.src_dir, self.csv_name))

            ##### use "iloc" and "catalog_index" to grab one line of catalog.

            self.catalog = catalog.iloc[catalog_index]

            self.tile = (self.catalog["wrs_path"], self.catalog["wrs_row"])

            img_path_ls = [self.catalog["img_dir"]]
            mask_path_ls = [self.catalog["mask_dir"]]

            self.meta = get_meta_from_bounds(Path(src_dir) / img_path_ls[0])

            half_size = self.patch_size // 2

            self.img_chips = []
            self.coor = []

            for img_path, mask_path in zip(img_path_ls, mask_path_ls):

                ###### Add your code to load the image and assign it to a
                ###### variable called "img".
                ###### Use the "load_data" function, provided in the utility

                img = load_data(os.path.join(self.src_dir, img_path),
                                is_label = False,
                                apply_normalization = self.apply_normalization)

                img = np.transpose(img, (1, 2, 0))

                ##### Load your mask again using "load_data" function.

                mask = load_data(os.path.join(self.src_dir, mask_path),
                                 is_label=True)

                crop_ref = mask

                index = patch_center_index(crop_ref, self.patch_size,
                                           self.overlap, self.usage)

                for i in range(len(index)):
                    x = index[i][0]
                    y = index[i][1]

                    self.img_chips.append(img[x - half_size: x + half_size,
                                              y - half_size: y + half_size, :])
                    self.coor.append([x, y])



            print('--------------{} patches cropped--------------'\
                  .format(len(self.img_chips)))


    def __getitem__(self, index):

        if self.usage in ["train", "validation"]:
            image_chip = self.img_chips[index]
            label_chip = self.lbl_chips[index]

            if self.usage == "train" and self.transform:
                trans_flip_ls = [m for m in self.transform if "flip" in m]
                if random.randint(0, 1) and len(trans_flip_ls) > 1:
                    trans_flip = random.sample(trans_flip_ls, 1)[0]
                    image_chip, label_chip = flip_image_and_label(
                        image_chip, label_chip, trans_flip
                    )

                if random.randint(0, 1) and "rotate" in self.transform:
                    img_chip, lbl_chip = rotate_image_and_label(
                        image_chip, label_chip, angle=[0,90]
                    )

            # Convert numpy arrays to torch tensors.
            # Image chips should be: CHW if not transpose to correct order of
            # dimensions.
            image_tensor = torch.from_numpy(image_chip.transpose((2, 0, 1)))\
                .float()
            label_tensor = torch.from_numpy(np.ascontiguousarray(label_chip))\
                .long()

            return image_tensor, label_tensor
        else:
            coor = self.coor[index]
            img_chip = self.img_chips[index]
            image_tensor = torch.from_numpy(img_chip.transpose((2, 0, 1)))\
                .float()

            return image_tensor, coor


    def __len__(self):
        return len(self.img_chips)

# Model Building
Deeplab3+ based on [Chen et al., 2024](https://link.springer.com/content/pdf/10.1007/s40747-023-01304-z.pdf)


## Basic Convolutional Neural Blocks

In [25]:
class Conv3x3_bn_relu(nn.Module):
    def __init__(self, inch, outch, padding = 0, stride =1, dilation = 1, groups = 1, relu = True):
        super(Conv3x3_bn_relu, self).__init__()
        self.applyRelu = relu

        self.conv = nn.Sequential(nn.Conv2d(inch, outch, 3, \
                                            padding = padding, stride = stride,\
                                            dilation = dilation,
                                            groups = groups),
                                  nn.BatchNorm2d(outch))
        if self.applyRelu:
            self.relu = nn.ReLU(True)

    def forward(self, x):
        out = self.conv(x)
        if self.applyRelu:
            out = self.relu(out)
        return out

class Conv1x1_bn_relu(nn.Module):
    def __init__(self, inch, outch, stride = 1, padding = 0, dilation = 1,\
                 groups = 1, relu = True):
        super(Conv1x1_bn_relu, self).__init__()
        self.applyRelu = relu
        self.conv = nn.Sequential(nn.Conv2d(inch, outch, 1, stride = stride,\
                                            padding = padding, \
                                            dilation = dilation,
                                            groups = groups),
                                  nn.BatchNorm2d(outch))

        if self.applyRelu:
            self.relu = nn.ReLU(True)
    def forward(self, x):
        x = self.conv(x.clone())
        if self.applyRelu:
            x = self.relu(x)
        return x


# Consecutive 2 convolution with batch normalization and ReLU activation
class doubleConv(nn.Module):
    def __init__(self, inch, outch):
        super(doubleConv, self).__init__()
        self.conv1 = Conv3x3_bn_relu(inch, outch, padding = 1)
        self.conv2 = Conv3x3_bn_relu(outch, outch, padding = 1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        return x


# skip module
# basic unit of resnet
class basicBlock(nn.Module):
    expansion = 1
    def __init__(self, inch, outch, dilation = 1, firstStride = 1, *kwargs):
        super(basicBlock, self).__init__()
        self.firstBlock = (inch != outch)
        transch = outch

        if self.firstBlock:
            self.conv0 = Conv1x1_bn_relu(inch, outch, stride = firstStride, \
                                         relu = False)


        # 1st 3x3 Conv
        self.conv1 = Conv3x3_bn_relu(inch, transch, stride =  firstStride,\
                                     padding = dilation,
                                     dilation = dilation)
        # 2nd 3x3 Conv
        self.conv2 = Conv3x3_bn_relu(transch, outch, padding = dilation,\
                                     dilation = dilation,
                                        relu = False)
        self.relu = nn.ReLU(True)

    def forward(self,x):
        res = self.conv1(x)
        res = self.conv2(res)
        if self.firstBlock:
            x = self.conv0(x)

        out = self.relu(res + x)
        return out


class bottleNeck(nn.Module):
    expansion = 4
    def __init__(self, inch, outch, dilation = 1, firstStride = 1, groups = 1,\
                 base_width = 64):
        super(bottleNeck, self).__init__()

        self.firstBlock = (inch != outch)
        transch = int(outch / (self.expansion * groups * base_width / 64))

        # downsample in first 1x1 convolution
        if self.firstBlock:
            self.conv0 = Conv1x1_bn_relu(inch, outch, stride=firstStride,\
                                         relu=False)

        # 1x1 conv
        self.conv1 = Conv1x1_bn_relu(inch, transch, stride=firstStride)
        # 3x3 conv
        self.conv2 = Conv3x3_bn_relu(transch, transch, padding = dilation, \
                                     dilation = dilation, groups = groups)
        # 1x1 conv
        self.conv3 = Conv1x1_bn_relu(transch, outch, relu=False)
        self.relu = nn.ReLU()

    def forward(self, x):
        res = self.conv1(x)
        res = self.conv2(res)
        res = self.conv3(res)

        if self.firstBlock:
            x = self.conv0(x)

        out = self.relu(res + x)
        return out


class ConvBlock(nn.Module):
    """This module creates a user-defined number of conv+BN+ReLU layers.
    Args:
        in_channels (int): number of input features.
        out_channels (int): number of output features.
        num_conv_layers (int): Number of conv+BN+ReLU layers in the block.
        drop_rate (float): dropout rate at the end of the block.
    """

    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1,
                 padding=1, dilation=1, num_conv_layers=2, drop_rate=0):
        super(ConvBlock, self).__init__()

        layers = [nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size,
                            stride=stride, padding=padding, dilation=dilation,\
                            bias=False),
                  nn.BatchNorm2d(out_channels),
                  nn.ReLU(inplace=True), ]

        # This part has a dynamic size regarding the number
        # of conv layers in the block.
        layers += [nn.Conv2d(out_channels, out_channels, \
                             kernel_size=kernel_size,
                             stride=stride, padding=padding, \
                             dilation=dilation, bias=False),
                   nn.BatchNorm2d(out_channels),
                   nn.ReLU(inplace=True), ] * (num_conv_layers - 1)

        if drop_rate > 0 and num_conv_layers > 1:
            layers += [nn.Dropout(drop_rate)]

        self.block = nn.Sequential(*layers)

    def forward(self, inputs):
        outputs = self.block(inputs)
        return outputs


class SeLayer(nn.Module):

    def __init__(self, in_channels, reduction):
        super(SeLayer, self).__init__()

        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(in_channels, in_channels // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(in_channels // reduction, in_channels, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)


class ErrCorrBlock(nn.Module):
    def __init__(self, in_channels, out_channels, reduction=16):
        super(ErrCorrBlock, self).__init__()

        self.conv0 = nn.Sequential(nn.Conv2d(in_channels, \
                                             out_channels, kernel_size=1, \
                                             padding=0, bias=False),
                                   nn.BatchNorm2d(out_channels))

        middle_ch = in_channels // reduction

        self.triple_conv = nn.Sequential(

            nn.Conv2d(in_channels, middle_ch, kernel_size=1, padding=0, \
                      stride=1, bias=False),
            nn.BatchNorm2d(middle_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(middle_ch, middle_ch, kernel_size=3, padding=1, \
                      stride=1, bias=False),
            nn.BatchNorm2d(middle_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(middle_ch, out_channels, kernel_size=1, padding=0,\
                      stride=1, bias=False),
            nn.BatchNorm2d(out_channels)
        )

        self.relu = nn.ReLU()
        self.se = SeLayer(out_channels, reduction)

    def forward(self, x):
        residual = self.conv0(x)

        out = self.triple_conv(x)
        #out = self.se(out)

        out = self.relu(out + residual)

        return out

## Model Backbone
- **Supported backbones:**
- `densenet` family:
                densenet121, densenet161, densenet169, densenet201
- `efficientnet` family:
                efficientnet_b0, efficientnet_b1, efficientnet_b2,
                efficientnet_b3, efficientnet_b4, efficientnet_b5,
                efficientnet_b6, efficientnet_b7,
                efficientnet_v2_l, efficientnet_v2_m, efficientnet_v2_s,
- `resnext` family:
                resnext101_32x8d, resnext101_64x4d, `resnext50_32x4d`

## Load pre-train models for backbone

In [26]:
# # Load pre-trained models from ResNext Family
# # resnext50_32x4d
# resnext50_32x4d = pretrainmodels.resnext50_32x4d(pretrained=True)
# # resnext101_32x8d
# resnext101_32x8d = pretrainmodels.resnext101_32x8d(pretrained=True)
# # resnext101_64x4d
# resnext101_64x4d = pretrainmodels.resnext101_64x4d(pretrained=True)
# #-------------------------------------------------------------------------------

# # Load pre-trained models from efficientnet Family
# # efficientnet_b0
# efficientnet_b0 = pretrainmodels.efficientnet_b0(pretrained=True)
# # efficientnet_b1
# efficientnet_b1 = pretrainmodels.efficientnet_b1(pretrained=True)
# # efficientnet_b2
# efficientnet_b2 = pretrainmodels.efficientnet_b2(pretrained=True)
# # efficientnet_b3
# efficientnet_b3 = pretrainmodels.efficientnet_b3(pretrained=True)
# # efficientnet_b4
# efficientnet_b4 = pretrainmodels.efficientnet_b4(pretrained=True)
# # efficientnet_b5
# efficientnet_b5 = pretrainmodels.efficientnet_b5(pretrained=True)
# # efficientnet_b6
# efficientnet_b6 = pretrainmodels.efficientnet_b6(pretrained=True)
# # efficientnet_b7
# efficientnet_b7 = pretrainmodels.efficientnet_b7(pretrained=True)
# # efficientnet_v2_m
# efficientnet_v2_m = pretrainmodels.efficientnet_v2_m(pretrained=True)
# # efficientnet_v2_s
# efficientnet_v2_s = pretrainmodels.efficientnet_v2_s(pretrained=True)
# #-----------------------------------------------------------------------------

# # Load pre-train models from densenet family
# # densenet121
# densenet121 = pretrainmodels.densenet121(pretrained=True)
# # densenet161
# densenet161 = pretrainmodels.densenet161(pretrained=True)
# # densenet169
# densenet169 = pretrainmodels.densenet169(pretrained=True)
# # densenet201
# densenet201 = pretrainmodels.densenet201(pretrained=True)
# #-----------------------------------------------------------------------------


# # Load pre-trained MobileNetv2 model
mobilenet_v2 = pretrainmodels.mobilenet_v2(pretrained=True)

# # Move the model to GPU if available
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# mobilenet_v2.to(device)
# resnext50_32x4d.to(device)
# resnext101_32x8d.to(device)
# resnext101_64x4d.to(device)

# densenet121.to(device)
# densenet161.to(device)
# densenet169.to(device)
# densenet201.to(device)

# efficientnet_b0.to(device)
# efficientnet_b1.to(device)
# efficientnet_b2.to(device)
# efficientnet_b3.to(device)
# efficientnet_b4.to(device)
# efficientnet_b5.to(device)
# efficientnet_b6.to(device)
# efficientnet_b7.to(device)
# efficientnet_v2_m.to(device)
# efficientnet_v2_s.to(device)



Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
100%|██████████| 13.6M/13.6M [00:00<00:00, 36.4MB/s]


## Backbone Class

In [27]:
class Backbone(nn.Module):
    def __init__(self, backbone_name, num_classes, num_input_channels,
                 weights=None, weight_handler="copy"):
        """
        A PyTorch module for creating a custom model using an optionally
        pre-trained torchvision model as a backbone. The model will have
        modified input and output layers, depending on the parameters given.

        Args:
            backbone_name : str
                The name of the pre-trained torchvision model to use as a
                backbone.
            num_classes : int
                The number of output classes.
            num_input_channels :int
                The number of input channels.
            weights : str, optional
                The weights to initialize the backbone with. If not given, the
                backbone will be initialized randomly.
            weight_handler : str, optional
                Specifies how to handle the weights for extra input channels if
                num_input_channels > 3. Available options are: 'copy': copy
                weights from the already initialized channels. 'random':
                initialize weights randomly. Defaults to 'copy'.
        Supported backbones:
            densenet family:
                densenet121, densenet161, densenet169, densenet201
            efficientnet family:
                efficientnet_b0, efficientnet_b1, efficientnet_b2,
                efficientnet_b3, efficientnet_b4, efficientnet_b5,
                efficientnet_b6, efficientnet_b7,
                efficientnet_v2_l, efficientnet_v2_m, efficientnet_v2_s,
            resnext family:
                resnext101_32x8d, resnext101_64x4d, resnext50_32x4d
        """
        super(Backbone, self).__init__()

        assert weight_handler in ["copy", "random"], \
            "Unrecognized 'weight_handler'."

        # Functions to get the first and last layer names based on model type
        def get_first_layer(backbone_name):
            if 'efficientnet' in backbone_name:
                return 'features[0][0]'
            elif 'densenet' in backbone_name:
                return 'features[0]'
            elif 'resnext' in backbone_name:
                return 'conv1'
            else:
                raise ValueError('Unrecognized backbone architecture.')

        def get_last_layer(backbone_name):
            if 'efficientnet' in backbone_name:
                return 'classifier[1]'
            elif 'densenet' in backbone_name:
                return 'classifier'
            elif 'resnext' in backbone_name:
                return 'fc'
            else:
                raise ValueError('Unrecognized model type')

        # Load the backbone model
        self.backbone = getattr(pretrainmodels, backbone_name)(weights=weights)

        # Modify the first convolution layer
        original_conv = eval('self.backbone.' + get_first_layer(backbone_name))
        new_conv = nn.Conv2d(num_input_channels, original_conv.out_channels,
                             kernel_size=original_conv.kernel_size,
                             stride=original_conv.stride,
                             padding=original_conv.padding,
                             bias=original_conv.bias)

        # weight handling
        if weights is not None:
            new_conv.weight.data[:,:3,:,:] = original_conv.weight.data
            if num_input_channels > 3:
                if weight_handler == "copy":
                    new_conv.weight.data[:,3:,:,:] = original_conv.weight.data[
                        :,:num_input_channels-3,:,:
                    ]
                else:
                    nn.init.kaiming_normal_(new_conv.weight.data[:,3:,:,:])

        # Replace the first convolution layer
        exec('self.backbone.' + get_first_layer(backbone_name) + '= new_conv')

        # Modify the classifier layer
        original_fc = eval('self.backbone.' + get_last_layer(backbone_name))
        new_fc = nn.Linear(original_fc.in_features, num_classes)

        # Replace the classifier layer
        exec('self.backbone.' + get_last_layer(backbone_name) + '= new_fc')

    def forward(self, x):
        x = self.backbone(x)
        return x

## ASPP Structure

In [28]:
# ASPP structure
class ASPP(nn.Module):
    def __init__(self, inch, rates, stagech):
        super(ASPP, self).__init__()
        '''
        This class generates the ASPP module introduced in
        DeepLabv3: https://arxiv.org/pdf/1706.05587.pdf, which
         concatenates 4 parallel atrous spatial pyramid pooling and the
         image level features. For more detailed
         information, please refer to the paper of DeepLabv3

         Args:
            inch -- (int) Depth of the input tensor
            rates -- (list) A list of rates of the parallel atrous convolution,
                      including that for the 1x1 convolution
            stagech -- (int) Depth of output tensor for each of the parallel
                        atrous convolution

         Returns:
            A tensor after a 1x1 convolution of the concatenated ASPP features
        '''

        # create stages
        self.rates = rates
        self.inch = inch
        self.stagech = stagech

        self.stages = self.makeStages()
        # global feature
        self.globe = nn.Sequential(nn.AdaptiveAvgPool2d((1,1)), \
                                   Conv1x1_bn_relu(inch, stagech, relu = False))
        # self.conv1x1 = Conv1x1_bn_relu(inch, stagech, relu = False)
        # self.conv = Conv3x3_bn_relu(inch * 2, inch, padding = 1)
        self.conv = Conv1x1_bn_relu(stagech*(len(rates) + 1), stagech,\
                                    relu = False)

    def makeStages(self):
        outch = self.stagech
        inch = self.inch
        stages = []
        for rate in self.rates:
            if rate == 1:
                stage = Conv1x1_bn_relu(inch, outch, relu = False)

            else:
                stage = Conv3x3_bn_relu(inch, outch, padding =rate, \
                                        dilation = rate, relu = False)

            stages.append(stage)
        return nn.ModuleList(stages)

    def forward(self, x):
        x_size = x.size()
        # x1 = [F.interpolate(stage(x), size=x_size[-2:], mode="bilinear",\
        #  align_corners=True) for stage in self.stages]
        x0 = [stage(x) for stage in self.stages]

        # global feature
        x1 = self.globe(x)
        x1 = F.interpolate(x1, size = x_size[-2:], mode = "bilinear", \
                           align_corners = True)

        x = torch.cat(x0 + [x1], 1)
        x = self.conv(x)

        return x

## Model (DeepLabv3+)

In [29]:
ASPPInchByBackbone = {
    "resnet": 2048,
    "Xception": 2048
}

quaterOutchByBackbone = {
    "resnet": 256,
    "Xception": 128
}

class deeplabv3plus2(nn.Module):
    def __init__(self, inch, classNum, backbone = mobilenet_v2, \
                 outStride = 16, rates = [1, 6, 12, 18]):
        super(deeplabv3plus2, self).__init__()

        # backbone
        self.backbone = backbone(inch, outStride = outStride)

        # ASPP
        ASPPinch = ASPPInchByBackbone[backbone.__name__.rstrip('0123456789')]
        ASPPoutch = ASPPinch // 8
        self.ASPP = ASPP(ASPPinch, rates=rates, stagech=ASPPoutch)

        # decoder
        quaterOutch = quaterOutchByBackbone[backbone.__name__.rstrip('0123456789')]
        self.conv0 = Conv1x1_bn_relu(quaterOutch, ASPPoutch) # 1/4 of origin
        self.up1 = nn.ConvTranspose2d(256, 256, 6, stride=4, padding=1)
        self.last_conv = nn.Sequential(Conv3x3_bn_relu(ASPPoutch*2, 256, 1),
                                       Conv3x3_bn_relu(256, 256, 1),
                                       nn.Conv2d(256, classNum, 1))
        self.up2 = nn.ConvTranspose2d(classNum, classNum, 6, stride=4, padding=1)


    def forward(self, x):

        x0, x = self.backbone(x)
        x = self.ASPP(x)

        # decoder
        x0 = self.conv0(x0)
        x = self.up1(x)
        x = torch.cat([x, x0], 1)
        x = self.last_conv(x)
        x = self.up2(x)

        return x

# Model Fitting (Training and Validating)

## Model Training

In [30]:
def train(trainData, model, optimizer, criterion, device, train_loss=[]):
    """
        Train the model using provided training dataset.
        Params:

                custom dataset (AquacultureData).
            model -- Choice of segmentation model.
            optimizer -- Chosen optimization algorithm to update model
                parameters.
            criterion -- Chosen function to calculate loss over training
                samples.
            gpu (bool, optional) -- Decide whether to use GPU, default is True.
            train_loss (empty list, optional) -- ???????????????????????????
    """

    model.train()

    # Mini batch iteration
    train_epoch_loss = 0
    train_batches = len(trainData)

    for img_chips, labels in trainData:

        img = img_chips.to(device)
        label = labels.to(device)

        optimizer.zero_grad()

        pred = model(img)

        loss = eval(criterion)(pred, label)
        train_epoch_loss += loss.item()

        loss.backward()
        optimizer.step()

    train_loss.append(train_epoch_loss / train_batches)
    print('Training loss: {:.4f}'.format(train_epoch_loss / train_batches))

## Model Validation

In [31]:
def validate(valData, model, criterion, device, val_loss=[]):
    """
        Evaluate the model on separate Landsat scenes.
        Params:
            valData (DataLoader object) -- Batches of image chips from PyTorch
                custom dataset(AquacultureData)
            model -- Choice of segmentation Model.
            criterion -- Chosen function to calculate loss over validation
                samples.
            buffer: Buffer added to the targeted grid when creating dataset.
                This allows loss to calculate at non-buffered region.
            gpu (binary,optional): Decide whether to use GPU, default is True
            valLoss (empty list): To record average loss for each epoch
    """

    model.eval()

    # mini batch iteration
    eval_epoch_loss = 0

    for img_chips, labels in valData:

        img = Variable(img_chips, requires_grad=False)
        label = Variable(labels, requires_grad=False)

        img = img_chips.to(device)
        label = labels.to(device)

        pred = model(img)

        loss = eval(criterion)(pred, label)
        eval_epoch_loss += loss.item()

    print('validation loss: {}'.format(eval_epoch_loss / len(valData)))

    if val_loss != None:
        val_loss.append(float(eval_epoch_loss / len(valData)))

### Epoch Iterator for model training and validation process

In [32]:
def epochIterater(trainData, valData, model, criterion, WorkingFolder,
                  initial_lr, num_epochs):
    """
    Epoch iteration for train and evaluation.

    Arguments:
    trainData (dataloader object): Batch grouped data to train the model.
    evalData (dataloader object): Batch grouped data to evaluate the model.
    model (pytorch.nn.module object): initialized model.
    initial_lr(float): The initial learning rate.
    num_epochs (int): User-defined number of epochs to run the model.

    """

    train_loss = []
    val_loss = []

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    if device.type == "cuda":
        print('----------GPU available----------')
        model = model.to(device)
    else:
        print('----------No GPU available, using CPU instead----------')
        model = model

    writer = SummaryWriter(WorkingFolder)
    optimizer = optim.Adam(model.parameters(),
                           lr=initial_lr,
                           betas=(0.9, 0.999),
                           eps=1e-08,
                           weight_decay=5e-4,
                           amsgrad=False)

    scheduler = optim.lr_scheduler.StepLR(optimizer,
                                          step_size=3,
                                          gamma=0.90)

    for t in range(num_epochs):
        print("Epoch [{}/{}]".format(t + 1, num_epochs))
        start_epoch = datetime.now()

        train(trainData, model, optimizer, criterion, device,
              train_loss=train_loss)
        validate(valData, model, criterion, device, val_loss=val_loss)

        scheduler.step()
        print("LR: {}".format(scheduler.get_last_lr()))

        writer.add_scalars("Loss",
                           {"train loss": train_loss[t],
                            "validation loss": val_loss[t]},
                           t + 1)

    writer.close()

    duration_in_sec = (datetime.now() - start_epoch).seconds
    duration_format = str(timedelta(seconds=duration_in_sec))
    print("--------------- Training finished in {} ---------------"\
          .format(duration_format))

## Model Evaluation and Accuracy Metrics


In [33]:
class Evaluator(object):
    def __init__(self, num_class):
        self.num_class = num_class
        self.confusion_matrix = np.zeros((self.num_class,)*2)

    def Pixel_Accuracy(self):
        Acc = np.diag(self.confusion_matrix).sum() / self.confusion_matrix.sum()
        return Acc

    def Pixel_Accuracy_Class(self):
        Acc = np.diag(self.confusion_matrix) / self.confusion_matrix.sum(axis=1)
        Acc = np.nanmean(Acc)
        return Acc

    def Mean_Intersection_over_Union(self):
        MIoU = np.diag(self.confusion_matrix) / (
                    np.sum(self.confusion_matrix, axis=1) +
                    np.sum(self.confusion_matrix, axis=0) -
                    np.diag(self.confusion_matrix))
        MIoU = np.nanmean(MIoU)
        return MIoU

    def Frequency_Weighted_Intersection_over_Union(self):
        freq = np.sum(self.confusion_matrix, axis=1) /\
            np.sum(self.confusion_matrix)
        iu = np.diag(self.confusion_matrix) / (
                    np.sum(self.confusion_matrix, axis=1) +
                    np.sum(self.confusion_matrix, axis=0) -
                    np.diag(self.confusion_matrix)
                )

        FWIoU = (freq[freq > 0] * iu[freq > 0]).sum()
        return FWIoU

    def _generate_matrix(self, gt_image, pre_image):
        mask = (gt_image >= 0) & (gt_image < self.num_class)
        label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]
        count = np.bincount(label, minlength=self.num_class**2)
        confusion_matrix = count.reshape(self.num_class, self.num_class)
        return confusion_matrix

    def add_batch(self, gt_image, pre_image):
        assert gt_image.shape == pre_image.shape
        self.confusion_matrix += self._generate_matrix(gt_image, pre_image)

    def reset(self):
        self.confusion_matrix = np.zeros((self.num_class,) * 2)

##==============================================================================

def do_accuracy_evaluation(model, dataloader, num_classes):
    evaluator = Evaluator(num_classes)

    model.eval()
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    with torch.no_grad():
        for data in dataloader:
            images, labels = data
            images = images.to(device)
            labels = labels.to(device)

            outputs = model(images)
            _, preds = torch.max(outputs.data, 1)

            # add batch to evaluator
            evaluator.add_batch(labels.cpu().numpy(), preds.cpu().numpy())

    # calculate evaluation metrics
    pixel_accuracy = evaluator.Pixel_Accuracy()
    mean_accuracy = evaluator.Pixel_Accuracy_Class()
    mean_IoU = evaluator.Mean_Intersection_over_Union()
    frequency_weighted_IoU = evaluator\
        .Frequency_Weighted_Intersection_over_Union()

    return pixel_accuracy, mean_accuracy, mean_IoU, frequency_weighted_IoU

# Running Model through the pipeline

## Defining model parameters

In [35]:
src_dir = "/content/gdrive/MyDrive/adleo/assignment5/A5_resources"
dataset_name = "Global"
transform = ["hflip", "vflip", "rotate"]

n_classes = 2
in_channels = 6
filter_config = (32, 64, 128, 256, 512, 1024)
dropout_rate = 0.1

criterion = "BalancedTverskyFocalCELoss()"
WorkingFolder = "/content/gdrive/MyDrive/adleo/adleo_project_test"
initial_lr = 0.15
epochs = 10

## Create a `train_dataset` object
Using `datasetloader` class to create the train_dataset object.



In [37]:
train_dataset = datasetloader(src_dir,
                                usage = "train",
                                dataset_name = dataset_name,
                                apply_normalization = True,
                                transform = transform)

  0%|          | 0/1188 [00:00<?, ?it/s]


TypeError: load_data() got an unexpected keyword argument 'apply_normalization'

Create a `PyTorch` data loader called `train_loader` that loads data from the `train_dataset`, splits it into batches, convert is to tensor and moves the data to GPU if available.

In [38]:
train_loader = DataLoader(train_dataset,
                          batch_size = 16,
                          shuffle = True)

NameError: name 'train_dataset' is not defined

## Create a Validation dataset
Create a `validation_dataset` object using the `datasetloader` class.

In [None]:
validation_dataset = AquacultureData(src_dir,
                                     usage="validation",
                                     dataset_name=dataset_name,
                                     apply_normalization=False)

val_loader = DataLoader(validation_dataset,
                        batch_size = 1,
                        shuffle = False)

## Initialize the model

In [None]:
deeplabv3plus2_model = deeplabv3plus2_model()

## Fit the `deeplabv3plus2_model` using `epochIterater`

In [None]:
epochIterater(train_loader,
              val_loader,
              model,
              criterion,
              WorkingFolder,
              initial_lr,
              epochs)

## Save the model parameters

In [None]:
torch.save(model.state_dict(),
           os.path.join(Path(WorkingFolder), "trained_unet_final_state.pth"))

## Evaluate model performance

In [None]:
model = deeplabv3plus2_model
pixel_acc, mean_acc, mean_iou, fw_iou = do_accuracy_evaluation(model,
                                                               val_loader,
                                                               num_classes = 10)
print("Pixel Accuracy:", pixel_acc,"\n",
      "Mean Accuracy:", mean_acc, "\n",
      "Mean IoU:", mean_iou, "\n",
      "Frequency Weighted IoU:", fw_iou)

# Prediction using DeepLabv3+2 Model

In [39]:
def do_prediction(testData, model, overlap, device, save_dir):
    """
    Use train model to predict on unseen data.
    Arguments:
            testData (custom iterator) -- Batches of image chips from PyTorch
                                          custom dataset.
            model (ordered Dict) -- trained model.
            overlap (int) -- amount of overlap between prediction chips.
            device (str) -- Either "cpu" or "cuda".
            save_dir (str) -- Directory to save the prediction output.
    """

    # Create directories to save the predicted output
    save_dir_hard = Path(save_dir) / "HardScore"
    save_dir_soft = Path(save_dir) / "SoftScore"

    os.makedirs(save_dir_hard, exist_ok=True)
    os.makedirs(save_dir_soft, exist_ok=True)

    # Start inference on test data
    print("--------------------- Start Inference(Test) ---------------------")
    start = datetime.now()

    # Get the test data, metadata, and tile information
    # add your code here
    testData, meta, tile = testData

    # Define the output file names and metadata for the hard and soft scores
    name_prob = "prob_c{}_r{}".format(tile[0], tile[1])
    name_crisp = "crisp_c{}_r{}.rst".format(tile[0], tile[1])

    meta_hard = meta.copy()
    meta_hard.update({
        "dtype": "uint8",
        "count": 1,
    })

    meta_soft = meta.copy()
    meta_soft.update({
        "dtype": "float32",
        "count": 1,
    })

    model = model.to(device)

    ##### Add your code to put the model in evaluation mode. (1 line)
    model.eval()





    ##### Create a canvas (call it "h_canvas") with the same height, width and
    ##### datatype from "meta_hard" to hold the score values and initialize it
    ##### to zeros. Add your code here. (Expected 1 line)
    h_canvas = np.zeros((1, meta_hard["height"], meta_hard["width"]),
                        dtype=meta_hard["dtype"])

    canvas_score_ls = []


    # Loop over batches of image chips and indices.
    for img_chips, index_batch in testData:
        img = Variable(img_chips, requires_grad=False)
        img = img_chips.to(device) # size: B X in_channels X W X H

        ##### Forward pass through the model to get the predictions and assign
        ####  it to a variable called "pred".
        ##### Add your code here. (Expected 1 line)
        pred = model(img)

        ##### Normalize the model output using "softmax" And assign it to a
        ##### variable called "pred_prob".
        ##### Add your code here (Expected 1 line)
        pred_prob = F.softmax(pred, dim=1)

        # Get the dimensions of the prediction
        batch, n_class, height, width = pred_prob.size()

        # Calculate the score width and score height based on the overlap
        # parameter
        score_width = (width // 2) - overlap
        score_height = (height // 2) - overlap

        # Loop over the batch and assign the predicted scores to the canvas
        for i in range(batch):

            # creating a new tuple index containing the coordinates, which makes
            # it easier to index into the "h_canvas" and arrays in the
            # "canvas_score_ls" later on in the code.
            index = (index_batch[0][i], index_batch[1][i])

            # Get the hard scores by taking the argmax of the prediction
            prediction_hard = pred_prob.max(dim=1)[1][
                :, overlap:-overlap, overlap:-overlap
            ].cpu().numpy()[i, :, :]

            # add the batch dimension to the "prediction_hard" array and
            # convert its data types.
            prediction_hard = np.expand_dims(prediction_hard, axis=0)\
            .astype(meta_hard["dtype"])

            # The "prediction_hard" values are assigned to a slice of
            # "h_canvas", effectively updating the pixels in the original image
            # corresponding to the current image chip in the batch with the
            # predicted values for that chip.
            ##### Add your code here. (Expected 1 line)
            h_canvas[
                :, index[0] - score_width : index[0] + score_width,
                index[1] - score_height : index[1] + score_height
            ] = prediction_hard


            for n in range(1, n_class):
                # Extract probability map for class n from predicted
                # probabilities tensor
                prediction_soft = pred_prob[:, n, :, :]\
                    .data[i][overlap:-overlap, overlap:-overlap]\
                    .cpu().numpy() * 100
                # Add an extra dimension to the probability map to match the
                # expected shape
                prediction_soft = np.expand_dims(prediction_soft, axis=0)\
                    .astype(meta_soft["dtype"])

                try:
                    # Update existing canvas for class n w/new probability map
                    canvas_score_ls[n][
                        :, index[0] - score_width : index[0] + score_width,
                        index[1] - score_height : index[1] + score_height
                    ] = prediction_soft
                except:
                    # Create a new canvas for class n and initialize it with
                    # zeros
                    canvas_score_single = np.zeros(
                        (1, meta_soft['height'], meta_soft['width']),
                        dtype=meta_soft['dtype']
                    )

                    # Update the new canvas with the new probability map slice
                    # by slice
                    canvas_score_single[
                        :, index[0] - score_width: index[0] + score_width,
                        index[1] - score_height: index[1] + score_height
                    ] = prediction_soft

                    # Add the new canvas to the list of canvases for all classes
                    canvas_score_ls.append(canvas_score_single)

    # write the hard classification results to an output raster.
    ##### Use "save_dir_hard", "name_crisp" and "meta_hard".
    ##### Add your code here. (Expected 2 line)
    with rasterio.open(os.path.join(save_dir_hard, name_crisp)\
                       ,'w', **meta_hard) as rstr:
                       rstr.write(h_canvas)


    # loop through each class (excluding the background class) and creates a
    # new raster file for each class.
    ##### Add your code here. (Expected 4 line)
    ##### hint: use this code to get a proper name for the prediction output
    ##### for each class: name_prob_updated = f"{name_prob}_Cat_{n}.tif"

    for n in range(1, n_class):
        name_prob_updated = f"{name_prob}_Cat_{n}.tif"
        with rasterio.open(os.path.join(save_dir_soft, name_prob_updated), \
                           'w', **meta_soft) as rstr:
                           rstr.write(canvas_score_ls[n])



    duration_in_sec = (datetime.now() - start).seconds
    duration_format = str(timedelta(seconds=duration_in_sec))
    print("---------------- Inference finished in {} seconds ----------------"\
          .format(duration_format))

## Define the Initial Parameters for the prediction

In [None]:
src_dir = "/content/gdrive/MyDrive/adleo/assignment5/A5_resources"
dataset_name = "Fine_tune_dataset"
transform = ["hflip", "vflip", "rotate"]

n_classes = 2
in_channels = 6
filter_config = (32, 64, 128, 256, 512, 1024)
dropout_rate = 0.15

criterion = "BalancedTverskyFocalCELoss()"
optimizer = optim.Adam(Unet_model.parameters(), lr=initial_lr)
WorkingFolder = "/content/gdrive/MyDrive/adleo/assignment5"
initial_lr = 0.01
epochs = 10

## Load the train_dataset

In [None]:
train_dataset = datasetloader(src_dir,
                                usage="train",
                                dataset_name=dataset_name,
                                apply_normalization=False,
                                transform=transform)

train_loader = DataLoader(train_dataset,
                          batch_size = 16,
                          shuffle = True)

## Load validation_dataset

In [None]:
validation_dataset = AquacultureData(src_dir,
                                     usage="validation",
                                     dataset_name=dataset_name,
                                     apply_normalization=False)

val_loader = DataLoader(validation_dataset,
                        batch_size = 1,
                        shuffle = False)

In [None]:
def load_data_pred(usage, csv_name, patch_size, overlap, catalog_index):
    pred_dataset = datasetloder(src_dir,
                                   usage = usage,
                                   apply_normalization=False,
                                   csv_name = csv_name,
                                   patch_size = patch_size,
                                   overlap = overlap,
                                   catalog_index=catalog_index)

    data_loader = DataLoader(pred_dataset, batch_size=1, shuffle=False)
    meta = pred_dataset.meta
    tile = pred_dataset.tile

    return data_loader, meta, tile

## Make the prediction

In [None]:
tile_count = len(pd.read_csv(os.path.join(src_dir, csv_name)))
for i in range(tile_count):
  pred_data = load_data_pred("inference", csv_name, patch_size, overlap, i)
  do_prediction(pred_data, model, overlap, device, save_dir)

## Plot the predicted image

In [None]:
# Define the image location
img_src = "/predictions/SoftScore/prob_c11_r60_Cat_1.tif"
WorkingFolder = WorkingFolder
rast_img_file = (WorkingFolder + img_src)
# Load the image
with rasterio.open(rast_img_file) as src:
  pred_img1 = src.read(1)

  plt.imshow(pred_img1, cmap="tab10")
  plt.title("Predicted by Unet")
  plt.colorbar()
  plt.show()

# References


Chen LC et al., (2018)
Encoder-decoder with atrous separable convolution for semantic
image segmentation, In: Proceedings of the European conference
on computer vision (ECCV). 801–818