<a href="https://colab.research.google.com/github/harshjoshi23/Object_segmentation_TF/blob/main/Segmentation_model_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Imports and all

In [None]:
# !pip install segmentation-models-pytorch

import segmentation_models_pytorch as smp
from segmentation_models_pytorch.encoders import get_preprocessing_fn


In [None]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as ply


In [None]:
# # Installations and PyPI versoning :
# !pip install segmentation-models-pytorch
# !pip install git+https://github.com/qubvel/segmentation_models.pytorch

1. Create Segmentation model with SMP

In [None]:
# model = smp.Unet( means our model is unet
#     encoder_name="resnet34",        # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
#     encoder_weights="imagenet",     # use `imagenet` pre-trained weights for encoder initialization
#     in_channels=1,                  # model input channels (1 for gray-scale images, 3 for RGB, etc.)
#     classes=3,                      # model output channels (number of classes in your dataset)
# )

In [None]:
model = smp.Unet(encoder_name = "resnet34", encoder_weights="imagenet", in_channels=1, classes=3,)

2. **Configure data preprocessing **:
All encoder -> pre traind weights yes

Preparing ds will give higher metric and faster convergence. <br>
**not necessary** in case you train the whole model, not only decoder.

In [None]:
preprocess_input = get_preprocessing_fn('resnet18', pretrained='imagenet')

Now we can train model with any framework!



Text cell From Github ~ <br>

## Architectures
- **Unet**
- **Unet++**
- **MAnet**
- **Linknet**
- **FPN**
- **PSPNet**
- **PAN**
- **DeepLabV3**
- **DeepLabV3+**

## Encoders
The following is a list of supported encoders in the SMP. Select the appropriate family of encoders and click to expand the table and select a specific encoder and its pre-trained weights (`encoder_name` and `encoder_weights` parameters).

- **ResNet**
- **ResNeXt**
- **ResNeSt**
- **Res2Ne(X)t**
- **RegNet(x/y)**
- **GERNet**
- **SE-Net**
- **SK-ResNe(X)t**
- **DenseNet**
- **Inception**
- **EfficientNet**
- **MobileNet**
- **DPN**
- **VGG**
- **Mix Vision Transformer**
- **MobileOne**

*Note: `ssl`, `swsl` - semi-supervised and weakly-supervised learning on ImageNet (repo).*

### Timm Encoders

- Pytorch Image Models (a.k.a. timm) has many pretrained models and an interface which allows using these models as encoders in SMP. However, not all models are supported:
  - Not all transformer models have `features_only` functionality implemented, which is required for encoders.
  - Some models have inappropriate strides.
- **Total number of supported encoders: 549**

## Table with available encoders
**🔁 Models API**
- `model.encoder` - pretrained backbone to extract features of different spatial resolution
- `model.decoder` - depends on model's architecture (Unet/Linknet/PSPNet/FPN)
- `model.segmentation_head` - last block to produce the required number of mask channels (includes optional upsampling and activation)
- `model.classification_head` - optional block which creates classification head on top of the encoder
- `model.forward(x)` - sequentially pass `x` through model's encoder, decoder, and segmentation head (and classification head if specified)

## Input Channels
`input_channels` -> create models that process tensors with an arbitrary number of channels. If you use pretrained weights from ImageNet:
- Weights of the first convolution will be reused.
- For 1-channel case, it would be a sum of weights of the first convolution layer.
- For other cases, channels will be populated with weights like `new_weight[:, i] = pretrained_weight[:, i % 3]` and then scaled with `new_weight * 3 / new_in_channels`.





In [None]:
model = smp.FPN('resnet34', in_channels=1)
mask = model(torch.ones([1, 1, 64, 64]))

## Auxiliary Classification Output
All models support `aux_params`, default set to `None`. <br>If `aux_params` is `None`, classification auxiliary output is not created. Otherwise, the model produces both mask and label output with shape `NC`. The classification head consists of `GlobalPooling -> Dropout (optional) -> Linear -> Activation (optional)` layers, configurable by `aux_params` as follows:


In [None]:
# aux_params=dict(
#     pooling='avg',             # one of 'avg', 'max'
#     dropout=0.5,               # dropout ratio, default is None
#     activation='sigmoid',      # activation function, default is None
#     classes=4,                 # define number of output labels
# )

In [None]:
aux_params = dict(pooling='avg', dropout=0.5,activation='sigmoid', classes=4,)

In [None]:
model = smp.Unet('resnet34', classes=4, aux_params=aux_params)
# mask, label = model(x) # Error as x not defined

I've defined `x` as a random tensor with a batch size of 1, 3 channels (e.g., RGB image), and a size of 256x256 pixels.

Adjust the dimensions of `x` to match the input requirements of model.


In [None]:
import torch
import segmentation_models_pytorch as smp

# Define auxiliary parameters
aux_params = dict(pooling='avg', dropout=0.5,activation='sigmoid', classes=4,)


# Creating the model
model = smp.Unet('resnet34', classes=4, aux_params=aux_params)

# Define input tensor 'x' with random values
x = torch.randn(1, 3, 256, 256)  # example input not any sense rn (batch size, channels, height, width)

# Get the output
mask, label = model(x)

print(mask.shape)  # Op means (1, num_classes, height, width)
print(label.shape)  # Op means (1, num_aux_classes)


torch.Size([1, 4, 256, 256])
torch.Size([1, 4])


### Meaning of `mask` and `label`:

- **Mask**:
  - The mask is the output of the segmentation head of the model.
  - It represents the segmented output, which is typically a pixel-wise classification where each pixel is assigned a class label.
  - For example, in a medical image segmentation task, the mask might label different tissues or organs.

- **Label**:
  - The label is the output of the auxiliary classification head, if `aux_params` is provided.
  - It gives a global classification for the entire input image, which could be used for tasks like identifying the presence of a certain condition in a medical image.

### What the Model is Doing:
- The model performs image segmentation, producing a mask that segments the input image into different classes.
- Additionally, if auxiliary parameters are provided, it also performs a global classification, producing a label.



**Depth Param :** <br>
 specify a number of downsampling operations in encoder, so you can make your model lighter if specify smaller depth

**code with the default encoder_depth:**

In [None]:
# Define auxiliary parameters
aux_params = dict(
    pooling='avg',             # one of 'avg', 'max'
    dropout=0.5,               # dropout ratio, default is None
    activation='sigmoid',      # activation function, default is None
    classes=4,                 # define number of output labels
)

# Create the model with default encoder depth
model = smp.Unet('resnet34', classes=4, aux_params=aux_params)

# Define input tensor 'x' with random values
x = torch.randn(1, 3, 256, 256)  # example input (batch size, channels, height, width)

# Get the output
mask, label = model(x)

# Print the shapes of the outputs
print(mask.shape)  # Should be (1, num_classes, height, width)
print(label.shape)  # Should be (1, num_aux_classes)

torch.Size([1, 4, 256, 256])
torch.Size([1, 4])


**Adjusting encoder_depth with proper decoder_channels:**

In [None]:
# Define auxiliary parameters
aux_params = dict(
    pooling='avg',             # one of 'avg', 'max'
    dropout=0.5,               # dropout ratio, default is None
    activation='sigmoid',      # activation function, default is None
    classes=4,                 # define number of output labels
)

# Define the correct decoder_channels for encoder_depth=4
decoder_channels = [256, 128, 64, 32]

# Create the model with custom encoder depth and decoder channels
model = smp.Unet('resnet34', encoder_depth=4, decoder_channels=decoder_channels, classes=4, aux_params=aux_params)

# Define input tensor 'x' with random values
x = torch.randn(1, 3, 256, 256)  # example input (batch size, channels, height, width)

# Get the output
mask, label = model(x)

# Print the shapes of the outputs
print(mask.shape)  # Should be (1, num_classes, height, width)
print(label.shape)  # Should be (1, num_aux_classes)

torch.Size([1, 4, 256, 256])
torch.Size([1, 4])


**Adjusting encoder_depth with proper decoder_channels:**

- **Auxiliary Parameters (`aux_params`)**:
  - Configures the auxiliary classification output.

- **`encoder_depth=4`**:
  - Specifies 4 blocks in the encoder.

- **`decoder_channels=[256, 128, 64, 32]`**:
  - Matches the number of blocks in the encoder, ensuring the decoder channels are correctly defined for 4 blocks.

- **Input Tensor `x`**:
  - A random tensor simulating an input image.

- **Output (`mask`, `label`)**:
  - The segmentation mask and classification label, respectively.