SLIC algorithm

The SLIC algorithm is essentially a {\it k-means} algorithm that partitions a color image into a desired number of regular, compact superpixels with a low computational overhead. Actually, it locally clusters pixels in the combined 5D color ($(L, a, b)$ values of the CIELAB color space) and image plane ($(x, y)$ pixel coordinates) space. The reason why CIELAB color space is chosen is that it is perceptually uniform for small color distance.  

The SLIC algorithm takes as input a desired number of approximately equally-sized superpixels $K$ and begins by sampling $K$ regularly spaced cluster centers $C_k = [L_k, a_k, b_k, x_k, y_k]^T$ at regular grid intervals $S$ (depending on $K$ and the number $N$ of pixels in the image: $S = \sqrt{N/K}$). In practice, for roughly equally sized superpixels there would be a superpixel center at every grid interval $S$. Since the spatial extent of any superpixel is approximately $S^2$ (the approximate area of a superpixel), it is assumed that pixels that are associated with this cluster center lie within a $2S \times 2S$ area around the superpixel center on the $xy$ plane. This becomes the search area for the pixels nearest to each cluster center.

Instead of directly using the Euclidean distance in the 5-D space, a 'perceptual closeness' measure $D_s$ that considers superpixel size is introduced to control the compactness of superpixels:
\begin{equation}\label{eq:slic-closeness}
D_s ({\mathbf x}_i,C_k) = d_{lab}({\mathbf x}_i,C_k) + \frac{m}{S} \cdot d_{xy} ({\mathbf x}_i,C_k),
\end{equation}
defined as the (weighted) sum of the color distance $d_{lab}$ and the plane distance $d_{xy}$ (normalized by the grid interval~$S$) of a pixel ${\mathbf x}_i=[L_i, a_i, b_i, x_i, y_i]^T$  to any cluster center:
\begin{eqnarray}\label{eq:distances}
d_{lab} ({\mathbf x}_i,C_k) & = & \sqrt{(L_k - L_i)^2 + (a_k - a_i)^2 + (b_k - b_i)^2} \\
d_{xy} ({\mathbf x}_i,C_k) & = & \sqrt{(x_k - x_i)^2 + (y_k - y_i)^2} .
\end{eqnarray}
while $m$ is introduced to control the compactness of superpixels: the greater the value of $m$, the more spatial proximity is emphasized and the more compact the cluster. 

This distance is used to cluster together preceptually close pixels in an iterative manner.
Namely, each pixel ${\mathbf x}_i$ is associated with the nearest cluster center $C_k$  (_w.r.t_ the distance $D_s$) whose search area overlaps this pixel. After all the pixels are associated with their nearest cluster center, new centers are  computed as the barycenters of all the pixels belonging to the updated clusters. This process is iterated till convergence to produce approximately equally-sized superpixels. 
The distance $D_s$ however, while being easy to calculate, does not take into account the image intensity values between two image pixels and thus ignores connectivity: a pixel belonging to an object distinct from the cluster center but with similar spectral values, while separated by other intermediate distinct structures, can still reach low $D_s$ and be considered as 'perceptually close', hence a post-processing of the output clusters is necessary. A post-processing of the output clusters is necessary 
to enforce connectivity.
