# Chapter 7 - Point Feature Description & Matching (FAST, SIFT, SURF)

In the last chapter, we have seen how to detect corners using the Harris or Shi Tomasi Corner detection functions. We have also used a very simple descriptors to match the corner features from two different images together. 

However, our detector was quite poor since it fails as soon as the image is slightly scaled, rotated or transformed. We therefore need to find for a better descriptor than just the pixel batch around the corners. 

## 1. Overcome scale changes

We'll first look into the problem of scaling. When we have two similar images but taken from different distances to the object, our descriptor matcher would fail miserably since the two descriptors show the same thing but have very different pixel values. This can be easily illustrated in the following example of the same image at different scales with descriptors of constant size:

![Same corner at different scaled images](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/scale_change.png)
*Figure 1: Same corner at different scaled images. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

To react to such scale changes, we could just compare patches at different image scale levels S, but this would be computationally expensive. If we have N features per Image and S scale levels, the time complexity needed to compare all Patches at all image sizes would result in **(NS)<sup>2</sup>**.

The solution is to define a function that indicates how well the edge is perceived at different scale levels S. We can then choose take the descriptor for the optimal patch size  for both images and later match the patches since they will both be at different scale but with the same content. The resulting descriptor will always be roughly the same for all images, even at different scales. 

This approach is rather simple. Calculate the descriptor quality function by taking some different patch sizes S. Take the local maximum of the function, and take the descriptor of this patch size. Do this independently for both images and you will only end up with descriptors that are optimal for each feature. We then normalize all batches to an uniform size to make them easier to compare to each other. We do this rescaling using an appropriate interpolation method like bilinear interpolation. 

![Automatic Scale Selection](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/automatic_scale_selection.png)
*Figure 2: Automatic Scale Selection. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

We still have the problem that, in order to find out whether the current patch size is ideal, we'd have to calculate the cornerness function. To avoid doing this for each patch size, we can just filter the image with *Laplacian of Gaussian* kernels of different sizes. The DoG's output is high if the region contains a clear, sharp discontinuity, which is ideal for a corner. We can therefore approximate the optimal size by looking for which sized DoG filter the central pixel was at maximum. 

![Laplacian of Gaussian](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/laplacian_of_gaussian.png)
*Figure 3: Laplacian of Gaussian. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

## 2. Overcome Rotation changes

Now that we know how to overcome scale changes between images, we can address rotation changes. A simple but effecive way to make a patch rotation invariant is to find the direction of most dominant gradient (by using the Harris eigenvector we need anyways) and de-rotate the patch such that the most dominant gradient is exactly pointing upwards. We warp the patch by applying the warp function and interpolating at the patches pixel coordinates using Bicubic interpolation. 

![Roto-Translation warping](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/roto_translation_warping.png)
*Figure 4: Roto-Translation warping. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

## 3. Overcome affine Warping
Next, let's consier affine warping. This can be overcome similarly to the rotation changes by looking at the direction and magnitude of gradients. We have seen in the last chapter that a patches eigenvectors point into the direction of most and least dominant gradients, while their magnitude indicates the gradient strength. The two eigenvectors form an elipsis. We can just scale the two eigenvectors such that both eigenvectors are of the same magnitude, e.g. change the smaller eigenvalue to match the larger one. We then end up not with a ellipsis but with a circle, we essentially warp the patch always the same way to ensure we can later match them together again. 

![Affine Warping](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/affine_warping.png)
*Figure 5: Affine Warping. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

### Histogram of Oriented Gradient (HoG) Descriptor
The simple descriptors we have seen so far is based on taking a patch around the discovered feature. To overcome scale, rotation and affine transformation changes, we first scale, rotate and warp the patch. But this method is not just computationally expensive but also still quite sensitive, since small patch changes can result in a significantly lower score. 

A better descriptor is the so called **Historam of Oriented Gradient**, or in short just **HoG** descriptor. It does not need to wrap the patch since HoG is nearly not affected by little viewport changes of up to 50 Degrees. 

To extract the HoG descriptor of a patch, we multiply the patch by a gaussian kernel to make the shape circular rather than square. Then, we compute gradient vectors at each pixel and build a histogram of gradient orientations, weighted by the gradient magnitude. This histogram now serves us as a HoG descriptor. To make the descriptor rotation invariant, we simply do a circular shift to bring the most dominant direction to the beginning of the histogram. 
To compare two patches, we simply compare the similarity of their histograms. Since affine translations will only affect the rotations very little, the descriptor is highly flexible wihtout being actively warped. 

![Histogram of Oriented Gradient](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/histogram_of_oriented_gradient.png)
*Figure 6: Histogram of Oriented Gradient. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

## Scale Invariant Feature Transform (SIFT) Descriptor

As the name says, SIFT is a descriptor that is in itself scale invariant. To construct sift, we perform some simple steps. First, we multiply the patch by a gaussian filter to make it circular, exactly as we did for HoG. Then we divide each patch into 4x4 sub-patches, resulting in 16 new patches, so called *cells*. We compute HoG with 8 bins for all pixels in each cell. We concatonate all HoG into a single 1D Vector with 4x4x8 = 128 values. This vector then acts as our descriptor and can be matched using regular SSD. 

![Scale Invariant Feature Transform](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/scale_invariant_feature_transform.png)
*Figure 7: Scale Invariant Feature Transform, HoG symbolized as Polar Histogram. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

The descriptor vector **v** can be normalized to 1 to guarantee **affine** illumination invariance using the following formula:

![SIFT Normalization for Affine illumination invariance](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/sift_affine_illumination_changes.png)
*Figure 7: SIFT Normalization for Affine illumination invariance. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

SIFT is an extremely powerfull descriptor that can handle large viewport changes, out-of-plane rotations and significant changes in illumination, but it is also computationally expensive and runs on at most 10 FPS on an Intel i7 processor.

## SIFT Detector
Besides being a good descriptor, SIFT also comes with a scale invariant feature detector. It uses a Difference of Gaussians at different sizes to find out the best scale for a given feature. Why does it matter? Well, when we use Difference of Gaussians instead of Laplacian of Gaussians, we can apply a Gaussian filter on an image once and use it to calculate multiple difference DoG images. To calculate the DoG at four different scales, we can just calculate five Gaussians and take their sequential differences. 
As soon as one Filter gets too big, we reset the filter to its initial size and shrink the image in half. we all this step "creating a new Octave". This way, we save a lot of performance since we get the same result but with fewer convolution steps. The same is done in the opposit direction: Instead of making the filter smaller and smaller, we double the image size and reset the filter again. This creates a space-scale pyramid. 

To be more prezise, we do the following steps to create the pyramid: We convolve the initial image at original size with Gaussians G(k<sup>i</sup>o) to produce blurred images seperated by a constnat factor k in scale space. The initial Gaussian G(o) is o=1.6, k=2<sup>1/S</sup> where s is the number of intervals at each octave. When k<sup>i</sup> = 2, we diwnsample the image by a factor of 2 to create a new octve and repeat all the previous steps. Finally, from all the created Gaussian images, we substract adjacent images to create the **Difference of Gaussians**. For each DoG, we can now compare the central pixel to its 9+8+9 neighbours on the higher and lower level. Only when the pixel is a maximum of the whole region, we have successfully identified a SIFT Feature. 

![SIFT Detector with 5 octaves and 4 DoG per octave](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/sift_detector_octaves.png)
*Figure 8: SIFT Detector with 5 octaves and 4 DoG per octave. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

Note that a sift feature is only detected at a postion (u,v) when in a space-scale neighbourhood of 9+8+9 Pixels in the next higher and lower space region no higher-valued pixel is found.
When can finally mark all the features found by SIFT Detector with a circle, where the radius of the circle indicates the DoG Size needed to create a space-scale maximum at that position. 

![SIFT Space Scale maximum images and the resulting detected features](https://github.com/joelbarmettlerUZH/PyVisualOdometry/raw/master/img/chapter_7/sift_space_scale_detector.png)
*Figure 9: SIFT Space Scale maximum images and the resulting detected features. [source](http://rpg.ifi.uzh.ch/docs/teaching/2019/06_feature_detection_2.pdf)*

This method makes the SIFT detector extremely repeatable even with large viewport changes and significant scaling. SIFT acts best with 3 score levels per octave, as impirical research has shown. 

When we use SIFT Detector with the SIFT descriptor (as it works best), we get as an output a **Descriptor**, represented by a 4x4x8=128 element vector, a pixel **location** (u, v) where the feature is located, as well as a **scale** at which the feature is most dominant and an **orientation** of the feature angle (most dominant angle in histogram). 