## Introduction

The first step in conducting object-based image analysis (OBIA) is to create some segments, i.e. to partition the set of image pixels into spatially continuous, disjoint and homogeneous regions. Once segments are available, they can be classified and regrouped (i.e. merged/split) in an iterative manner to derive objects. Compared to the traditional pixel-level analyses, this modelling of reality in an object-based manner offers several advantages for downstream tasks:  
* avoidance of salt-&pepper effects in classifications due to spatial contiguity
* facilitation of tasks such as object counting
* possibility to leverage various feature descriptors for object characterization
* reduction of number of observations -> performance/speed-up

Segmentation methods can be divided into several groups. In this notebook, we will explore some representatives of those groups. For further information, please find the references below.

* [Blaschke, T., Burnett, C., & Pekkarinen, A. (2004).](https://doi.org/10.1007/978-1-4020-2560-0_12) Image Segmentation Methods for Object-based Analysis and Classification  
* [Hossain, M. D., & Chen, D. (2019).](https://linkinghub.elsevier.com/retrieve/pii/S0924271619300425) Segmentation for Object-Based Image Analysis (OBIA): A review of algorithms and challenges from remote sensing perspective  
* [Kotaridis, I., & Lazaridou, M. (2021).](https://linkinghub.elsevier.com/retrieve/pii/S0924271621000265) Remote sensing image segmentation advances: A meta-analysis

## Setup

In [None]:
# to run this on google colab
if 'google.colab' in str(get_ipython()):
    import os
    repo_dir = "obia_tutorials"
    marker_file = os.path.join(repo_dir, ".setup_done")    
    if not os.path.exists(marker_file):
        !git clone https://github.com/fkroeber/obia_tutorials.git
        !pip install -r obia_tutorials/requirements.txt
        with open(marker_file, 'w') as f:
            f.write("Setup completed")
    if not os.getcwd().endswith(repo_dir):
        os.chdir(repo_dir)

In [None]:
# imports
import matplotlib.colors as mcolors
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import skimage
import xarray as xr

from matplotlib import rcParams
from skimage import util
from skimage.segmentation import mark_boundaries
from tqdm import tqdm

# set general figsize
rcParams['figure.figsize'] = (7.5, 7.5)

## Exemplary data

For demonstration purposes, we will use a 4-band R,G,B,NIR image. We will also create two single-band grey scale images out of that. The first one is just the average reflectance across the R,G,B bands. The second should represent the Normalized Difference Vegetation Index (NDVI).

<span style="color:orange">Task 1: Write the code for the NDVI calculation. The band order is R,G,B,NIR.</span>

In [None]:
# read image
img_path = "sample_data/ortho_subset_I.tif"
ds = xr.open_dataset(img_path)

# extract arrays
arr = np.transpose(np.array(ds.to_array()[0]), (1,2,0))
rgb = arr.astype(np.uint8)[...,[0,1,2]]
grey = arr.mean(axis=2)

# your code goes here
# ndvi = ...

# visualise rgb & segmentation
fig, axs = plt.subplots(ncols=3, figsize=(15,5), constrained_layout=True)
axs[0].imshow(rgb)
axs[1].imshow(grey, cmap="gray")

# your code goes here
# axs[2].imshow(...)

axs[0].set_title("original rgb")
axs[1].set_title("average reflectance")
axs[2].set_title("ndvi")

for ax in axs:
    ax.set_axis_off();

## Methods

### A. Threshold-based segmentation

#### A.1 Manual, supervised thresholding

Let's start with a very basic method to create regions: the thresholding approach. Picking one or multiple thresholds, one can discriminate regions based on a pixel-level evaluation of intensities. Let's try this method to differentiate between vegetated and unvegetated regions based using an NDVI threshold. 

In [None]:
# define a set of ndvi thresholds
thres = [0, 0.1, 0.2, 0.3]

# define boundary plotting settings
contour_kwargs = dict(color=(1,0,0), outline_color=(1,0,0), mode="thick")

# plot the figure
fig, axs = plt.subplots(
    ncols=len(thres), 
    nrows=2, 
    figsize=(5*len(thres),10), 
    constrained_layout=True
)

for i, t in enumerate(thres):
    mask = ndvi > t
    axs[0,i].imshow(mask, cmap="gray", interpolation="nearest")
    axs[0,i].set_title(f"ndvi > {t}")
    axs[0,i].axis("off")
    axs[1,i].imshow(mark_boundaries(rgb, mask, **contour_kwargs))
    axs[1,i].axis("off")

<span style="color:orange">Task 2: Take a look at results? Which NDVI threshold would you take to differentiate between vegetated and unvegetated regions?</span>

<i>Answer goes here</i>

#### A.2 Automated, unsupervised thresholding

Picking the right threshold is not trivial. To this end, there is a suite of unsupervised methods to automate the search for a suitable threshold. To try all of the methods that are implemented in scikit-image, one can simply call `skimage.filters.try_all_threshold()`. This will return the binarised versions of the different thresholds as determined by the respective unsupervised approaches. Most of these methods are somehow operating on the histogram of the grey-level intensities to set the threshold in an unsupervised manner.

In [None]:
skimage.filters.try_all_threshold(ndvi, figsize=(5,10));

<span style="color:orange">Task 3: Does the convenience of automatically setting values in unsupervised mode also come at a certain cost? Think about the general disadvantages compared to setting thresholds manually.</span>

<i>Answer goes here</i>


To understand the operating mechanism of the unsupervised methods more closely, we will focus on two popular ones in the following - Otsu and Li thresholding.

##### A.2.1 Otsu thresholding

Let's take a look at the results of Otsu thresholding first. The Otsu threshold separates the pixels into two classes, foreground and background. It is calculated by looking at minimizing the intra-class variance, which is a measure of the spread of the data within each class. The threshold is calculated as the value that minimizes the weighted sum of the intra-class variances of the two classes. For 2 classes, this is equivalent to maximizing inter-class variance.

The between-class variance $\sigma_b^2(t)$ is given by:
\begin{equation}
\sigma_b^2(t) = \omega_1(t) \omega_2(t) \left[ \mu_1(t) - \mu_2(t) \right]^2
\end{equation}
where...  
$\omega_1(t)$ is the probability of class 1 (the class of all pixels with intensity levels 0 to $t$),   
$\omega_2(t)$ is the probability of class 2 (the class of all pixels with intensity levels $t+1$ to $L-1$),   
$\mu_1(t)$ is the mean intensity of class 1,  
$\mu_2(t)$ is the mean intensity of class 2.

<span style="color:orange">Task 4: Plot the results of the Otsu thresholding. Also add a histogram and visualise the Otsu threshold in there.</span>

In [None]:
# get Otsu threshold 
thres = skimage.filters.threshold_otsu(ndvi)

# plot results 
fig, axs = plt.subplots(
    ncols=3, 
    nrows=1, 
    figsize=(15, 5), 
    constrained_layout=True
)

# code goes here
# ax[0] -> histogram
# ax[1] -> binary version
# ax[2] -> rgb + contours

Given the predominance of the contrast between the spectral properties of vegetated and unvegetated areas in the given scene, the threshold derived by Otsu's method align with the interval of suitable thresholds as determined in the manual approach above.  

However, looking at the histogram of NDVI values for the given case, it seems that the distribution of the intensities are rather trimodal than bimodal. Luckily, the Otsu method can also be used for multilevel thresholding. In this case, threshold values are chosen to maximize the total sum of pairwise variances between the thresholded graylevel classes.

In [None]:
# get multiple Otsu thresholds 
thres = skimage.filters.threshold_multiotsu(ndvi, classes=3)

# plot results 
fig, axs = plt.subplots(
    ncols=3, 
    nrows=1, 
    figsize=(15, 5), 
    constrained_layout=True
)
axs[0].hist(ndvi.ravel(), bins=100, color="grey")
axs[0].set_title("ndvi histogram")
for t,c in zip(thres, ["red", "orange"]):
    axs[0].axvline(t, color=c, linestyle="--")
    axs[1].imshow(ndvi > t, cmap="gray", alpha=0.5, interpolation="nearest")
axs[1].set_title(f"ndvi thresholds: {', '.join([str(round(x, 2)) for x in thres])}")
axs[1].axis("off")
for t,c in zip(thres, [(1,0,0), (1,0.5,0)]):
    axs[2].imshow(mark_boundaries(rgb, ndvi > t, color=c, outline_color=c, mode="thick"), alpha=0.5)
axs[2].axis("off");

With the exception of the water body, the multilevel Otsu method seems to be capable of differentiating between the three classes "non-vegetated", "meadow" and "trees".

##### A.2.2 Li thresholding

Let's compare this result to the method of another unsupervised thresholding approach, the Li thresholding. It is based on the notion of cross-entropy minimisation. So let's have a brief recap on the measure of cross-entropy at this point. 

>Cross-entropy is a measure from the field of information theory to calculate the difference between two probability distributions. Cross-entropy is effectively calculating the total entropy between the distributions. It can be understood as the number of bits required to represent or transmit an average event from one distribution to another distribution. The cross entropy between two probability distributions $p$ and $q$ over a given set is defined as:

\begin{equation}
H(p, q) = - \sum_{i} p(i) \log q(i)
\end{equation}

> where $p(i)$ is the true probability distribution (reference distribution), and $q(i)$ is the estimated probability distribution. This calculation is for discrete probability distributions, although a similar calculation can be used for continuous probability distributions using the integral across the events instead of the sum.

Li thresholding aims at minimizing the cross entropy between the image and its segmented version. The is an iterative algorithm to quickly find this optimum threshold by using the slope of the cross-entropy. While Ostu's method minimizes the mean square distance between the image and its segmented version, Li's method minimizes the cross entropy. Li and Lee argued that the cross entropy is the preferred measure for positive and additive distributions.

<span style="color:orange">Task 5: Calculate and plot the results of the Li thresholding (analogous to the single-class Otsu thresholding from above). Describe your results.</span>

In [None]:
# code goes here

#### A.3 Postprocessing & Result exploration

A strong deficit of all these threshold-based methods is that they work on a pixel-level and do not consider any spatial context such as the intensity levels of a pixel's neighborhood. Thus, they don't satisfy the spatial contiguity requirement and - in a strict sense - shouldn't be considered segmentation methods.

One option to impose some constraints on spatial consistency for these threshold-based methods is to apply filter-based or morphological postprocessing to reduce noisy artifacts. Below you can see the effect of applying three different kinds of such operators.

1. median filter
2. morphological closing
3. area closing  

To briefly explain the basis of each of them:

1. The median filter is non-linear filter that returns the local median within a given footprint (usually a square neighborhood such as a 3x3 kernel).
 
2. Morphological methods originate from the domain of mathematical morphology where images are scanned with a structuring element (i.e. a neighborhood, window, pattern), and some sort of non-linear comparison is carried out between the structuring element and the local image content. Two basic morphological operations are dilations and erosions. Dilations set a pixel at (i,j) to the maximum over all pixels in the neighborhood centered at (i,j). This effectively enlarges bright regions and shrinks dark regions. Erosions sets a pixel at (i,j) to the minimum over all pixels in the neighborhood centered at (i,j). Correspondingly, erosions shrink bright regions and enlarge dark regions. The morphological closing on an image is defined as a dilation followed by an erosion. It results in a removal of small dark spots (i.e. “pepper”) with the connection of bright areas. 

3. Area closings are similar to morphological closings. However, do are not using a fixed structuring element, but rely on a deformable one. The surface of this deformable footprint corresponds to a defined area threshold such that they effectively remove all dark holes with an areal extent smaller than this threshold. Technically, this operator is based on a hierarchical representation of the image, the so-called max-tree.

To demonstrate the effect of these operators, we will re-use the segmentation result provided by the Otsu method. For reasons of comparability, we will use the same structuring elements for all methods below. Specifically, we will use a kernel of size 10 x 10 pixels, which amounts to an area of 1m<sup>2</sup> for the given scene.

In [None]:
# get initial thresholding result
thres = skimage.filters.threshold_otsu(ndvi)
binary = ndvi > thres

# apply different filters using the same structuring element
se = skimage.morphology.square(10)
seg_I = skimage.filters.median(binary, se)
seg_II = skimage.morphology.binary_closing(binary, se)
seg_III = skimage.morphology.area_closing(binary, area_threshold=se.size)

Let's take a first look at the results.

In [None]:
# compile final segmentation results 
seg_results = {
    "initial result": binary,
    "median filter": seg_I,
    "morphological closing": seg_II,
    "area closing": seg_III
}

# plot results
fig, axs = plt.subplots(
    ncols=4, 
    nrows=2, 
    figsize=(20,10), 
    constrained_layout=True
)

for i, (name, res) in enumerate(seg_results.items()):
    axs[0,i].imshow(res, cmap="gray", interpolation="nearest")
    axs[0,i].set_title(name)
    axs[1,i].imshow(mark_boundaries(rgb, res, **contour_kwargs))

for ax in axs.flat:
    ax.axis("off");

Note that it is always advisable to explore your results not just by plotting the contours of the segments but also by plotting the segments themselves. Using the average rgb value for each segment can be an adequate representation. However, in cases of many segments, this calculation can be time-consuming, which is why an arbitrary colormap may be a possible substitute in these situations. Calculating the number of segments may further support the interpretation of the results.

<span style="color:orange">Task 6a: Calculate the number of segments for each result.</span>  
<span style="color:orange">Task 6b: Try to add a visualization of the segments (not just their contours) below.</span>

In [None]:
# code goes here

# define function to create a random colormap
# note: this is just a placeholder and does not really return a random colormap
def get_random_cmap(num_colors, base_cmap="Spectral"):
    return plt.get_cmap(base_cmap)

<span style="color:orange">Task 7: Describe and interpret the results from above (possibly supplemented by your own visualisations & figures from Task 6a & 6b). How does the postprocessing affect the final results and are there differences between the individual methods?</span>  

<i>Answer goes here</i>


Finally, it remains important to remember that the workflow of thresholding followed by postprocessing will always remain a two-step procedure, in which spatial context is only taken into account for the later part. This inevitably limits the results that can be obtained in the end. From the examples above, this is pretty evident when looking at the ill-formed region combining water, beach and meadow. Since this region represents a large connected component in the initial thresholding result, it will remain to be a region also in the post-processed results. For most practical applications, having such mixed landcover types within one region is definitely suboptimal. 

### B. Edge-based segmentation

Moving to the core set of segmentation methods, a division is made commonly between edge-based and region-based segmentation methods. The former identify edges based on variations of image intensities, followed by postprocessing such as edge linkage to derive closed boundaries whose interiors define the segments. For the edge detection itself, there are some traditional gradient-based image filters available such as Roberts, Prewitt and Sobel.

In [None]:
# apply different edge filters
gradients = {
    "roberts": skimage.filters.roberts(grey),
    "prewitt": skimage.filters.prewitt(grey),
    "sobel": skimage.filters.sobel(grey)
}

fig, axs = plt.subplots(
    ncols=4, 
    nrows=1, 
    figsize=(20,5), 
    constrained_layout=True
)

axs[0].imshow(grey, cmap="gray")
axs[0].set_title("greyscale image")

for i, (name, res) in enumerate(gradients.items()):
    v_max = np.quantile(res, 0.98)
    axs[i+1].imshow(res, cmap="gray", vmax=v_max, interpolation="nearest")
    axs[i+1].set_title(name)

for ax in axs.flat:
    ax.axis("off")

As you can see from the example above, the results are not too different, although Prewitt and Sobel usually obtain a less noisy result. This is a consequence of them operating on a 3x3 kernel instead of a 2x2 neighborhood that is used by Roberts. The next step is to binarise the gradient information by thresholding it. With this step, a set of discrete edges is derived. Below, we will test some quantiles of the gradients as thresholds.     

In [None]:
# threshold gradients to derive edges
edges = {}
for k,v in gradients.items():
    edges[k] = {}
    for thresh in [0.5, 0.7, 0.9]:
        edges[k][thresh] = v > np.quantile(v, thresh)

fig, axs = plt.subplots(
    ncols=3, 
    nrows=3, 
    figsize=(15,15), 
    constrained_layout=True
)

for i, (name, res) in enumerate(edges.items()):
    for j, (thresh, edge) in enumerate(res.items()):
        axs[j,i].imshow(edge, cmap="gray", interpolation="nearest")
        axs[j,i].set_title(f"{name} ({thresh:.0%}-quantile)")
        axs[j,i].axis("off")

To obtain the segments from these binary edge masks, we will use the label method below. This method labels connected regions of an integer array. Two pixels are considered connected when they the same value and are neighbors (in a 1- or 2-connected sense, referring to the maximum number of orthogonal hops). 

<span style="color:orange">Task 8: Find a way to avoid that edges themselves will be recognized as their own regions when using the label function as shown below.</span>  

In [None]:
fig, axs = plt.subplots(
    ncols=3, 
    nrows=3, 
    figsize=(15,15), 
    constrained_layout=True
)

for i, (name, res) in enumerate(edges.items()):
    for j, (thresh, edge) in enumerate(res.items()):
        labels = skimage.measure.label(edge.astype("int"), background=-1, connectivity=2)
        # code goes here
        # labels = <something based on the initial labels>
        n_segments = len(np.unique(labels))
        axs[j,i].set_title(f"{name} ({thresh:.0%}-quantile), n segments: {n_segments}")
        axs[j,i].imshow(labels, cmap=get_random_cmap(len(np.unique(labels))), interpolation="nearest")
        axs[j,i].axis("off");

<span style="color:orange">Task 9: Describe the segmentation results. Which problems can you identify and why are they occurring?</span>     

<i>Answer goes here</i>

Edge detection is only the core of edge-based methods and does not directly lead to well-defined boundaries without gaps. Therefore, usually further postprocessing steps such as adaptive thresholding, morphological thinning, edge point linking and edge point trimming are performed to refine the initial edge detection results.

An edge detector that includes such processing steps in an initial, rudimentary form is the Canny detector, which followed the already established methods of Roberts, Prewitt and Sobel in 1986. It consists of a pipeline of Gaussian filtering, gradient magnitude and angle calculation, non-maxima-suppression (NMS), double thresholding and edge hysteresis tracking.

<span style="color:orange">Task 10: How are the different processing steps of Canny contributing to the aim of deriving well-defined boundaries? Do some background search on NMS and hysteresis tracking.</span>

<i>Answer goes here</i>

In [None]:
fig, axs = plt.subplots(
    ncols=4, 
    nrows=2, 
    figsize=(20,10), 
    constrained_layout=True
)

axs[0,0].imshow(grey, cmap="gray")
axs[0,0].set_title("greyscale image")

for i, sigma in enumerate([0.5, 3, 10]):
    canny_edge = skimage.feature.canny(grey, sigma=sigma)
    axs[0,i+1].imshow(canny_edge, cmap="gray", interpolation="nearest")
    axs[0,i+1].set_title(f"sigma = {sigma}")
    labels = skimage.measure.label(canny_edge.astype("int"), background=-1, connectivity=2)
    labels = skimage.morphology.closing((labels * (-canny_edge.astype("int") + 1)))
    n_segments = len(np.unique(labels))
    axs[1,i+1].set_title(f"n segments: {n_segments}")
    axs[1,i+1].imshow(labels, cmap=get_random_cmap(len(np.unique(labels))), interpolation="nearest")

for ax in axs.flat:
    ax.axis("off");

Contrary to expectations, the result tends to be worse than before. The gradient information is binarised directly into edges, but with a lot of noise. In this case, this is partly due to the high resolution of the VHR input orthophoto used. Even if Canny usually delivers more useful results in other cases, it is clear that this is not sufficient for satisfactory segmentation results. As further post-processing methods (e.g. more complex edge linkage) are not natively supported by skimage, further methods from the core area of edge-based methods can unfortunately not be demonstrated here.

However, the already calculated gradients offer a good transition to a widely used segmentation method that combines this part of the edge-based approaches with the basic region-based segmentation methods - the so-called watershed segmentation.  

### C. Region-based segmentation 

Region-based methods are based on a homogeneity criterion assuming that pixels within a segment have similar intensity values. The corresponding suite of methods can be subdivided into:
* bottom-up region growing algorithms, which start from seeds and successively add pixels to regions
* top-down region splitting algorithms, which perform iterative splitting based on an initial undersegmentation

Top-down segmentation methods (such as Quadtree, for example) are not implemented in skimage. However, we can take a look at the watershed segementation as a representative for a bottom-up segmentation approach. It interprets an image with its intensity values as a topographical surface and floods the surface starting from its local minima or seed points that are specified otherwise. As the flooding process continues, the rising water from different minima will eventually meet, forming watershed lines separating different catchment basins representing the segments. Since pixels with high gradient values are likely to represent object boundaries and thus watershed lines, gradient images are usually taken as the input constituting the topographical surface. As the algorithm incorporates elements of region-growing, it can effectively be seen as a hybrid method between edge-based and region-based methods.

Let's take a look at the results of the watershed segmentation, using the gradient images from above. If you don't specify any seeds explicitly, skimage will take the seed information from the minima calculated on gradient image.

In [None]:
# get gradient image
grad = gradients["sobel"]

# get seed points, equivalent to default used in watershed below
local_minima = skimage.morphology.local_minima(grad)
seeds = skimage.measure.label(local_minima)

# apply watershed segmentation
seg_water = skimage.segmentation.watershed(grad, seeds, connectivity=2)
n_segments = len(np.unique(seg_water))

fig, axs = plt.subplots(
    ncols=4, 
    nrows=1, 
    figsize=(20,5), 
    constrained_layout=True
)

axs[0].imshow(rgb)
axs[0].set_title("original image")
axs[1].imshow(grad, cmap="gray")
axs[1].set_title("gradient image")
axs[2].imshow(seeds > 0, cmap="gray", interpolation="nearest")
axs[2].set_title(f"seed points")
axs[3].imshow(seg_water, cmap=get_random_cmap(n_segments), interpolation="nearest")
axs[3].set_title(f"segmentation result, n segments: {n_segments}")

for ax in axs.flat:
    ax.axis("off");

What you're observing above is a common issue with watershed segmentation: oversegmentation. Thousands of local minima with the given noisy gradient image as an input result in a corresponding amount of segments. In the given case, the average segment only comprises about 11 pixels. There are several possibilities to mitigate this oversegmentation:

* the application of a smoothing filter as a preprocessing step prior to the gradient calculation
* a different/explicit choice of image markers as enforced seeds 
* a clustering of segments (region-merging) to be applied after the initial watershed segmentation

<span style="color:orange">Task 11: Try to implement one or several of the described options above to reduce the degree of oversegmentation.</span>

In [None]:
# code goes here

### D. Other segmentation approaches

<span style="color:orange">Task 12: Pick one of the algorithms described below and apply it to a use case of your choice (e.g. aim to segment the cars or trees in the image). Read the documentation of the algorithm to understand its working mechanism and compare the results to the ones obtained by thresholding and watershed segmentation.</span>


### D.1 Cluster-based segmentation

Cluster-based methods provided by skimage encompass Simple Linear Iterative Clustering ([SLIC](https://scikit-image.org/docs/dev/api/skimage.segmentation.html#skimage.segmentation.slic)) for superpixel generation as well as [quickshift](https://scikit-image.org/docs/dev/api/skimage.segmentation.html#skimage.segmentation.quickshift) clustering. You can also leverage the [KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) method from the sklearn package to construct a similar clustering model considering the RGB value + (x,y) pixel positions.

### D.2 Graph-based segmentation

The image is treated as a graph where each pixel is a node, and edges connect neighboring pixels with weights representing the similarity (e.g., based on the color). Among the pre-implemented graph-based methods, skimage offers the [Felsenszwalb](https://scikit-image.org/docs/dev/api/skimage.segmentation.html#skimage.segmentation.felzenszwalb) algorithm. Other graph-based processing options are possible via Region Adjacency Graphs ([RAG](https://scikit-image.org/docs/stable/auto_examples/segmentation/plot_rag_mean_skimage.colorhtml)). Also, the [random walker](https://scikit-image.org/docs/stable/auto_examples/segmentation/plot_random_walker_segmentation.html#random-walker-segmentation) segmentation should be considered a graph-based segmentation.

### D.3 Active contour segmentation

Active contour segmentation, also known as snakes, is a technique used in image processing to delineate object boundaries. It involves initializing a curve (contour) around the object of interest and iteratively adjusting this curve to minimize an energy functional. The energy functional typically includes terms that attract the contour towards edges (high image gradients) and terms that impose smoothness constraints. The process continues in an interative manner until convergence. Within skimage, there is the initial [active contour](https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.active_contour) model as well the [Chan-Vese](https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.chan_vese) as a more recent algorithm implemented.

### D.4 Neural-network based segmentation

Neural network (NN)-based segmentation is a large field on its own. It is important to understand that the term "segmentation" as used so far, is used in a more nuanced manner in the deep learning domain. In this domain's terminology, semantic segmentation, instance segmentation and panoptic segmentation are all under the umbrella of segmentation approaches. Semantic segmentation, however, actually terms pixel-level labelling (also known as pixel-level classification in the remote sensing domain). Instance and panoptic segmentation, on the other hand, are specifically designed to delineate spatially contiguous regions in given scene.

A common characteristic of these NN-based models is that they essentially represent lengthy chains of non-linear functions, which consider image intensity information along with spatial context information to derive at the segmentation result. Most of the NN models leverage a huge set of annotated training images to learn how to segment images in a supervised, data-driven manner. At the inference stage, however, the model may then be applied in an unsupervised manner without the need to tweak any parameters.    

One approach that gained a lot of attention soon after its release in 2023 is Segment Anything (SAM). This model allows to perform instance segmentation based on no or a few markers (zero-/few-shot learning). The model itself build on a vision transformer - a particular type of neural network architecture - which has been trained with an enormous set of input images across several domains (e.g. medical imaging, natural images, etc). This broad basis of training images gives rise to the notion of SAM as a so-called "foundation model", which is expected to generalise well, even to imagery from previously unseen domains. Remote sensing imagery actually represents one type of these images, which SAM hasn't been tailored for originally. 

<i>Due to the model's complexity, library dependencies and GPU requirements, the SAM model application is not demonstrated within this notebook. However, if you want to try it out, the easiest way to get started is to use the [segment-geospatial](https://samgeo.gishub.org/installation/) package. Give it a try and explore its functionalities for the given orthophoto. Note: It is recommended to have a GPU with at least 8 GB of GPU memory to run this model.</i>

## Conclusion & Further thoughts

The current notebook presented a variety of segmentation algorithm structured by its methodological domain (i.e. threshold-based, watershed-based, graph-based, etc). This will (hopefully) help in understanding the basics of segmentation methods and their differences. From a practical application perspective, the following considerations are often relevant:
* What are the characteristics of the segmentation results of the respective algorithm? 
* What parameterization options does the algorithm offer (e.g. number of regions, shape constraints, etc.)?
* Which input can the algorithm process (e.g. single vs. multi-channel images, original image vs. pre-processed version)?
* How performant and scalable is the algorithm?

In addition to these considerations, the significance of the input data (Which input bands? Which indices?) should always be taken into account as the basis for segmentation. Finally, it should be remembered that OBIA is more than just segmentation. There is no perfect initial segmentation. Rather the idea is to have an iterative cycle of segmentation & classification in order to derive at final results. 