# Semantic Segmentation Workshop

## Agenda

1. Environment Setup
2. Overview of Semantic Segmentation
3. Labeling
4. Data
5. Modeling
6. Running on AML
7. Configuration Options
8. MLFlow Parameter, Metrics, and Artifact Tracking
9. Cleanup
10. Q&A

## Environment Setup

The following environments have been tested for this workshop

- Linux (Ubuntu 18.04)
- Azure Cloud Shell

NOTE: This will replace the Azure ML CLI v1 extension with the public preview v2.

### Linux Setup

Run the `setup.sh` script via

```bash
bash setup.sh
```

If you run into permissions issues, you may have to run as root user via

```bash
sudo bash setup.sh
```

### Azure Cloud Shell

If you cannot run the bash script on your environment, you can use the Azure Cloud Shell available in the Azure Portal.

![Azure Cloud Shell](assets/AzureCloudShell.png)

Copy and paste the content of `setup.sh` onto the terminal and your environment should be setup.

## Labeling

### Types of Segmentation

![Semantic vs. Instance Segmentation](https://i.stack.imgur.com/MEB9F.png)

For semantic segmentation, we are only interested in labeling a pixel as one of the $C$ available classes. For instance segmentation, an additional consideration on objects of the same class being distinct entities is also considered.

The highlighted labeled region is referred to as the "segmentation mask".

### Label Formats

The following are some examples of different label formats you may encounter for the segmentation domain.

1. Full Image Array (as an Image file .png, .jpg or in a tabular format .csv, .npy)
   1. RGB / Class Label Based
2. Compressed Format (such as Run-Length Encoding (RLE))
3. Polygon Vertices (list of $x$, $y$ coordinates $[x_1, y_1, x_2, y_2, ..., x_n, y_n]$)

### Labels for Modeling

The standard form we would like to transform our data into is the following:

- Input: Images with PyTorch shape convention of $(Ch, H, W) = \text{(Channels, Height, Width)}$
- Ground Truth: Masks with shape $ (H, W) = \text{(Height, Width)} $.
  - Each entry should be in the range $[0, C]$ where $ C = \text{Number of Classes}$
  - The entry $0$ is reserved for the "background" class (the absence of a class we are interested in)
- Model Output: Predicted masks with shape $(C + 1, H, W) = \text{(Classes, Height, Width)} $
  - The $+1$ in $C + 1$ is due to the background class


   


In [None]:
%load_ext autoreload
%autoreload 2

## Data

We will be using data from http://dronedataset.icg.tugraz.at to demonstrate the semantic segmentation codebase.

The masks are provided as RGB images. To utilize the masks as the ground truth, we will need to convert each pixel into a class label instead. 

In [None]:
from os.path import join
from src.datasets.mask_labels import MaskLabelsDataset

dataset = MaskLabelsDataset(
    join("workshop_data", "train_labels/labels.csv"), 
    mask_format="rgb", 
    class_dict_path=join("workshop_data", "class_dict.csv")
)
image, mask = dataset[0]

In [None]:
# (Height, Width, Ch)
# This will be later rearranged to (Ch, Height, Width) per PyTorch convention
image.shape, mask.shape

In [None]:
def display_segmentation_item(image: np.ndarray, mask: np.ndarray):
    fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(18, 12))
    axs[0][0].imshow(image)
    axs[0][1].imshow(mask)
    axs[1][0].imshow(image)
    axs[1][0].imshow(mask, alpha=0.35)
    
display_segmentation_item(image, mask)

In [None]:
from config.augmentation import preprocessing, augmentation

In [None]:
# Preprocessing
# Deterministic Transformations onto our image for modeling
# Usable as a one and done step via batch processing
results = preprocessing(image=image, mask=mask)
preprocessed_image, preprocessed_mask = results["image"], results["mask"]
display_segmentation_item(preprocessed_image, preprocessed_mask)

In [None]:
# Augmentation
# Stochastic and recomputed again for each batch
# We are using albumentations for this
results = augmentation(image=preprocessed_image, mask=preprocessed_mask)
augmented_image, augmented_mask = results["image"], results["mask"]
display_segmentation_item(augmented_image, augmented_mask)

## Metrics

Common metrics in the segmentation space are

- Intersection over Union (IoU)
  - Intuition: How much does my prediction overlap with the ground truth?
- Average Precision and Mean Average Precision (AP, mAP)
- Standard Classification Metrics
  - Precision
  - Recall
  - F1-Score
  
### IoU

![IoU](https://pyimagesearch.com/wp-content/uploads/2016/09/iou_examples.png)

Properties to think about:
- Either the ground truth or prediction could encapsulate the other, even with low IoU scores

### Average Precision

![Average Precision](https://kharshit.github.io/img/map_bboxes.png)

![Average Precision](https://kharshit.github.io/img/map_gt.png)

Typically you may see a metric like $AP@0.5$ which means Average Precision with IoU of 0.5.

Average Precision matches the idea in the image classification space where we have a threshold in a multi-class classification scenario to determine true positives.

### Standard Classification Metrics

As segmentation based classifiers are classifying classes on a pixel level, the metrics for precision, recall, f1-score

- Precision - What percentage of pixels for each class that were predicted are actually correct (true positive)
- Recall - What percentage of pixels for each class were marked correctly (true positive)


## Modeling

Differing from the more familiar Image Classification scenario, in the object detection and segmentation domain our predictions are in the shape of an image.

As a result, it's a natural result to use a convolutional layer as our final prediction layer. As a result, several architectures use fully convolutional networks.

We leverage two popular models available from `torchvision`
- FCN-ResNet50
- DeepLabV3



### DeepLabV3 Architecture

![DeepLabV3 Architecture](https://production-media.paperswithcode.com/models/Screen_Shot_2021-02-21_at_10.34.37_AM_qcoqzIU.png)

#### Atrous Convolution

![Atrous Convolution](https://miro.medium.com/max/395/1*SVkgHoFoiMZkjy54zM_SUw.gif)

Integrated as just part of good ol' Conv2D in PyTorch parametrized by `dilation`. With `dilation=1` there is no additional space and is just the convolution you would expect.

[All kinds of Convolution Animations](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md)

In [None]:
from src.models.fcn_resnet50 import FCNResNet50

model = FCNResNet50(24, pretrained=True, is_feature_extracting=False)
model

In [None]:
import torch
from src.models.deeplabv3 import DeepLabV3

model = DeepLabV3(24, pretrained=True, is_feature_extracting=False)
model

## Running on AML

To run the training pipeline we utilize the Azure ML CLI to execute `preprocess.py` and `train.py` via `train_pipeline.yml`.

```bash
az ml job create --file train_pipeline.yml
```

The `preprocess.py` is specific to the Semantic Drone Dataset we are using for this workshop

The `train.py` is a general training script that assumes a supported labels file format will be fed into it 

## Configuration

Once the AML Training Pipeline has been properly connected to a data source of interest, there are two files that may require configuration

1. Augmentation Configuration
   1. Located at `config/augmentation.py`
2. Experimentation Parameters (Model, Loss, Hyperparameters)
   1. Located at `train_pipeline.yml`

## MLFlow Parameter Tracking

## Experimentation Time

As a lightweight exercise, let's look to try running some other experiments.

### Exercise 1

Modify some of the modeling parameters in `train_pipeline.yml`

### Exercise 2

Modify the augmentation pipeline. Add an augmentation that you empirically believe to be valuable that is missing here.

[Albumentations Augmentations Docs](https://albumentations.ai/docs/api_reference/augmentations/transforms/)





## Thank you everyone for participating in the dry run!

### Any Questions?

## Cleanup

To delete all the resources created today, we can just delete the resource group.

You can do so by running the cleanup script via

```bash
bash cleanup.sh
```