# Features:

- In computer vision and image processing, a feature is a unique pattern or a piece of key information which is relevant for solving the computational task related to a certain application.
- Features consists of keypoints and their descriptors: 
  - Keypoints are the “stand out” points in an image, so no matter the image is rotated, shrink, or expand, its keypoints will always be the same. 
  - Descriptor is the description of the keypoint. It can be considered as vectors including series of numbers.
- The choice of features in a particular computer vision task may be highly dependent on the specific problem at hand. 

## Feature Detection:

- A feature detector is an algorithm which takes an image and outputs **locations (i.e. pixel coordinates and possibly numbers describing the size or scale of the feature)** of significant areas in your image. 
- An example of this is a **corner detector**, which outputs the locations of corners in your image but does not tell you any other information about the features detected.
- Feature detection may also provide complementary attributes, such as the **edge orientation** and **gradient magnitude** in edge detection and the polarity and the strength of the blob in blob detection.

![image](feature-detection.png)

## Feature Extraction:

- Once features have been detected, a local image patch around the feature can be extracted. 
- This extraction may involve quite considerable amounts of image processing. 
- The result is known as a feature descriptor or feature vector. 
- Among the approaches that are used to feature description, one can mention local histograms (SIFT is an example of a local histogram descriptor).

## Feature Descriptor:

- After feature detection and extraction, each image is abstracted by several local patches. Feature representation methods deal with how to represent **the patches** as **numerical vectors**. These vectors are called **feature descriptors**.
- Basically, a feature descriptor is an algorithm which takes an image and outputs feature descriptors or **feature vectors**.
- Feature descriptors encode interesting information into **a sequence of numbers** and act as a sort of **numerical "fingerprint"** that can be used to differentiate one feature from another. 
- Ideally this information would be **invariant under photometric and geometric transformations (variations in intensity (brightness), rotation, scale and affine etc.)**, so we can find the feature again even if the image is transformed in some way. 
- An example would be **SIFT**, which encodes information about the local neighbourhood image gradients the numbers of the feature vector. 
- Other examples you can read about are **HOG** and **SURF**.

## Types of Image Features:
Features may be specific structures in the image such as points, edges or objects:
- Edges
- Corners/Interest/Key points
- Blobs/Regions of interest points
- Ridges: From a practical viewpoint, a ridge can be thought of as a one-dimensional curve that represents an axis of symmetry, and in addition has an attribute of local ridge width associated with each ridge point.

![image](feature-class.png)

# Filtering:

A digital filter is a system that performs mathematical operations on a sampled, discrete-time signal to **reduce** or **enhance** certain aspects of that signal. Filtering forms a new image whose pixel values are transformed from original pixel values.

**Goals:**
 - Extract useful information i.e. the features. (reduce details, focus on edges, corners, blobs etc.)
 - Transform images into another domain where image properties can be modified or enhanced. (sharpening, blurring, super-resolution, in-painting, de-noising etc.)
 
## Moving Average:

- Also known as blur (with a box filter)
- Replaces each pixel with the average of its neighbours to achieve smoothing effect (i.e. reduction of sharp features)
- Shift invariant, linear

$\displaystyle
h=\frac{1}{9} \begin{bmatrix}
                    1 & 1 & 1\\
                    1 & 1 & 1\\
                    1 & 1 & 1\\
              \end{bmatrix}$

$\displaystyle\frac{1}{9}\sum_{k=-1}^{1}\sum_{l=-1}^{1}f[n-k,m-l]$

# Convolution:

2D convolution of a function f with filter h is as follows:

$f[n,m]*h[n,m]=\displaystyle\sum_{k=-\infty}^{\infty}\sum_{l=-\infty}^{\infty}f[k,l]h[n-k,m-l]$

## Padding (Border Effects):
- Zero-padding: Set all pixels outside the source image to 0, darkens the edges. 

![image](zero.png)

- Edge value replication (clamp): Repeat edge pixels indefinitely, padding propagates border values inward.

![image](clamp.png)

- Mirror extension (reflection): Reflect pixels across the image edge, padding preserves colors near the borders.

![image](mirror.png)

## Calculations:

- The amount by which the filter shifts is the **stride**. In this discussion, stride will be assumed as 1.
- Zero **padding** pads the input volume with zeros around the border.
- What happens when you apply three 5 x 5 x 3 filters $(F=5)$ to a 32 x 32 x 3 $(N_{row}=N_{col}=32)$ input volume?
  - Condition at the boundary: $n+(F-1)\leq N_{row} (\textit{or, }N_{col})$: This gives how many times that convolution can be applied to a row (or column) or, equivalently, the size of the output row or column.
  - As a result, the output volume would be 28 x 28 x 3. Notice that the spatial dimensions decrease by 4 from $F-1$. 
  - Let’s say we want to apply the same conv layer but we want the output volume to remain 32 x 32 x 3. To do this, we can apply a zero padding of size 2 (pad both sides, total padding size is 4) to that layer. If we think about a zero padding of two, then this would result in a 36 x 36 x 3 input volume.
  
### General case for an image of size $(N, N)$ and a filter of size $(F, F)$:


  - Output of the convolution will be of size $(N-F+1, N-F+1)$,


  - Padding required **for one side** to keep the size same as before: $P=\displaystyle\frac{F-1}{2}$ (by $N+2P-F+1=N$)
  

### General case for an image of size $(N_1, N_2)$ and a filter of size $(F_1, F_2)$:


  - Output of the convolution will be of size $(N_1-F_1+1, N_2-F_2+1)$,
  
  
  - Padding required **for one side** to keep the size same as before: $P_1=\displaystyle\frac{F_1-1}{2}$ and $P_2=\displaystyle\frac{F_2-1}{2}$ (by $N_1+2P_1-F_1+1=N_1$ and $N_2+2P_2-F_2+1=N_2$) where $P_1$ and $P_2$ are used for row and column respectively.

# (Cross) Correlation:

2D correlation of two 2D functions f and g is as follows:

$f[n,m]\star g[n,m]=\displaystyle\sum_{k=-\infty}^{\infty}\sum_{l=-\infty}^{\infty}f^*[k,l]g[n+k,m+l]$

# Notes:

- Correlation is equivalent to convolution without the flip.
- Convolution=Correlation if kerned used is symmetric with respect to both axes.
- Convolution is used for filtering and linear manipulations on the image.
- Correlation is used for similarity measurement and pattern matching.
- Both are shift-invariant
