### 7.1 Points and patches

- There are two main approaches to finding feature points and their correspondences. 
    - The first is to find features in one image that can be accurately tracked using a local search technique, such as correlation or least squares
    - The second is to independently detect features in all the images under consideration and then match features based on their local appearance

- We split the keypoint detection and matching pipeline into four separate stages

    - During the feature detection (extraction) stage (Section 7.1.1), each image is searched for locations that are likely to match well in other images.
        - Forstner–Harris, DoG (difference of Gaussian)
        - Adaptive non-maximal suppression (ANMS)
        - Scale invariance
            - This kind of approach is suitable when the images being matched do not undergo large scale changes
        - Rotational invariance and orientation estimation
        - Affine invariance
            - Maximally stable extremal region (MSER)

    - In the feature description stage (Section 7.1.2), each region around detected keypoint locations is converted into a more compact and stable (invariant) descriptor that can be matched against other descriptors. (GLOH performed best, followed closely by SIFT)
        - Bias and gain normalization (MOPS)
        - Scale invariant feature transform (SIFT)
        - PCA-SIFT
        - RootSIFT
        - Gradient location-orientation histogram (GLOH)
        - Since 2015 or so, most of the new feature descriptors are constructed using deep learning techniques:
            - (LIFT, TFeat, HPatches, L2-Net, HardNet, Geodesc, SOSNet, Key.Net) operate on patches, much like the classical SIFT approach.  They hence require an initial local feature detector to determine the center of the patch and use a predetermined patch size when constructing the input to the network
            - Approaches such as DELF, SuperPoint, D2-Net, ContextDesc, R2D2, ASLFeat and CAPS use the entire image as the input to the descriptor computation

    - The feature matching stage (Sections 7.1.3 and 7.1.4) efficiently searches for likely matching candidates in other images.
        - Matching startegy and error rates
        - Efficient matching
        - Feature match verification and densification

    - The feature tracking stage (Section 7.1.5) is an alternative to the third stage that only searches a small neighborhood around each detected feature and is therefore more suitable for video processing

### 7.2 Edges and contours

#### 7.2.1 Edge detection

- Qualitatively, edges occur at boundaries between regions of different color, intensity, or texture. Under such conditions, a reasonable approach is to define an edge as a location of rapid
intensity or color variation. Edges occur at locations of steep slopes, or equivalently, in regions of closely packed contour lines

- A mathematical way to define the slope and direction of a surface is through its gradient

- Unfortunately, taking image derivatives accentuates high frequencies and hence amplifies noise, as the proportion of noise to signal is larger at high frequencies. It is therefore prudent to smooth the image with a low-pass filter prior to computing the gradient

#### 7.2.2 Contour detection
#### 7.2.3 Application: Edge editing and enhancement

### 7.3 Contour tracking
- curves corresponding to object boundaries, especially in the natural environment. We describe some approaches to locating such boundary curves in images.

    - The first, originally called snakes by its inventors (Kass, Witkin, and Terzopoulos 1988), is an energy-minimizing, two-dimensional spline curve that evolves (moves) towards image features such as strong edges

    - The second, intelligent scissors, allows the user to sketch in real time a curve that clings to object boundaries.

    - Finally, level set techniques evolve the curve as the zero set of a characteristic function, which allows them to easily change topology and incorporate region-based statistics.

#### 7.3.1 Snakes and scissors
- Dynamic snakes and CONDENSATION
    - In many applications of active contours, the object of interest is being tracked from frame to frame as it deforms and evolves. In this case, it makes sense to use estimates from the previous frame to predict and constrain the new estimates.

    - One way to do this is to use Kalman filtering, which results in a formulation called Kalman snakes

#### 7.3.3 Application: Contour tracking and rotoscoping

### 7.4 Lines and vanishing points
- While edges and general curves are suitable for describing the contours of natural objects, the human-made world is full of straight lines. Detecting and matching these lines can be useful in a variety of applications, including architectural modeling, pose estimation in urban environments, and the analysis of printed document layouts.

#### 7.4.1 Successive approximation

- describing a curve as a series of 2D locations provides a general representation suitable for matching and further processing. In many applications, it is preferable to approximate such a curve with a simpler representation, e.g., as a piecewise-linear polyline or as a B-spline curve

#### 7.4.2 Hough transforms
- While curve approximation with polylines can often lead to successful line extraction, lines
in the real world are sometimes broken up into disconnected components or made up of many collinear line segments. In many cases, it is desirable to group such collinear segments into extended lines. At a further processing stage, we can then group such lines into collections with common vanishing points

- Another alternative to the Hough transform :
    - RANSAC-based line detection:
        - An advantage of RANSAC is that no accumulator array is needed, so the algorithm can be more space efficient and potentially less prone to the choice of bin size
        - The disadvantage is that many more hypotheses may need to be generated and tested than those obtained by finding peaks in the accumulator array
    
    - Bottom-up grouping
        - The resulting algorithm is quite fast, does a good job of distinguishing line segments from texture, and is widely used in practice 
        - Recently, deep neural network algorithms have been developed to simultaneously extract line segments and their junctions

#### 7.4.3 Vanishing points
- Finding the vanishing points common to such line sets can help refine their position in the image and, in certain cases, help determine the intrinsic and extrinsic orientation of the camera

### 7.5 Segmentation

- The main difference between clustering and segmentation is that the former usually ignores pixel layout and neighborhoods, while the latter relies heavily on spatial cues and constraints.

#### 7.5.1 Graph-based segmentation

#### 7.5.2 Mean shift
- Mean-shift and mode finding techniques, such as k-means and mixtures of Gaussians, model the feature vectors associated with each pixel (e.g., color and position) as samples from an unknown probability density function and then try to find clusters (modes) in this distribution.

- The k-means and mixtures of Gaussians techniques use a parametric model of the density function to finding these clusters

- Mean shift, on the other hand, smoothes the distribution and finds its peaks as well as the regions of feature space that correspond to each peak. 

- The difference between mean shift and bilateral filtering, is that in mean shift, the spatial coordinates of each pixel are adjusted along with its color values, so that the pixel migrates more quickly towards other pixels with similar colors, and can therefore later be used for clustering and segmentation.
