## Manhattan Norm and Distance

The Manhattan norm (also known as the L1 norm or taxicab norm) and Manhattan distance (also known as city block distance) are measures of distance in a vector space.

### Intuition

The Manhattan norm of a vector is the sum of the absolute values of its components. It gets its name from the grid layout of streets in Manhattan, which resembles a coordinate system. If you were to travel from one point to another in Manhattan, you would have to move along the grid lines (streets), so the distance you would travel (the Manhattan distance) is the sum of the horizontal and vertical distances.

### Computation


The Manhattan norm of a vector `x` in n-dimensional real or complex space is computed as:

||x||1 = |x1| + |x2| + ... + |xn|


The Manhattan distance between two points `x` and `y` is the Manhattan norm of the difference between the points:

d(x, y) = ||x - y||1



## Euclidean Norm and Distance

The Euclidean norm (also known as the L2 norm or 2-norm) and Euclidean distance are measures of distance in a vector space.

### Intuition

The Euclidean norm of a vector is the length of the vector from the origin to the point represented by the vector. It's the straight-line distance, or "as the crow flies" distance. 

The Euclidean distance between two points is the length of the straight line between them. It's like the distance measured with a ruler between two points on a map.

### Computation

The Euclidean norm of a vector `x` in n-dimensional real or complex space is computed as:

||x||2 = sqrt(x1^2 + x2^2 + ... + xn^2)


The Euclidean distance between two points `x` and `y` is the Euclidean norm of the difference between the points:

d(x, y) = ||x - y||2



## Chebyshev Distance

Chebyshev distance (also known as maximum value distance) is a metric defined on a vector space where the distance between two vectors is the greatest of their differences along any coordinate dimension.

### Intuition

Imagine you're moving on a grid and you can move in 8 directions: horizontally, vertically, and diagonally. The Chebyshev distance between two points is the minimum number of moves you need to reach one point from the other. It's named after Pafnuty Chebyshev, a Russian mathematician.

### Computation

The Chebyshev distance between two points `x` and `y` in n-dimensional space is computed as:

d(x, y) = max(|x1 - y1|, |x2 - y2|, ..., |xn - yn|)


## Minkowski Distance

Minkowski distance is a metric in a normed vector space which can be considered as a generalization of both Euclidean distance and Manhattan distance.

### Intuition

The Minkowski distance between two variables is a generalized metric form of Euclidean distance and Manhattan distance. It's named after the German mathematician Hermann Minkowski.

### Computation

The Minkowski distance of order `p` between two points `x` and `y` in n-dimensional real space is defined as:

d(x, y) = (sum(|xi - yi|^p))^(1/p) for i = 1 to n


When `p=1`, the Minkowski distance equals the Manhattan distance. When `p=2`, it equals the Euclidean distance.


## Hamming Distance

Hamming distance is a metric used to measure the difference between two strings of equal length. It's named after Richard Hamming, an American mathematician and computer scientist.

### Intuition

The Hamming distance between two strings is the number of positions at which the corresponding symbols are different. It measures the minimum number of substitutions required to change one string into the other.

### Computation

The Hamming distance between two strings `s` and `t` of equal length is defined as:

d(s, t) = sum(s[i] != t[i]) for i = 1 to n



## Cosine Similarity

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. It's often used to compare documents in text analysis.

### Intuition

The cosine similarity captures the angle between the two vectors. It's a judgement of orientation rather than magnitude. If the vectors are orthogonal (the angle between them is 90 degrees), they are less similar. If the vectors are in the same direction (the angle between them is 0 degrees), they are more similar.

### Computation

The cosine similarity between two vectors `a` and `b` is defined as:

cosine_similarity(a, b) = dot_product(a, b) / (||a||2 * ||b||2)


where `dot_product(a, b)` is the dot product of the vectors `a` and `b`, and `||a||2` and `||b||2` are the Euclidean lengths (L2 norm) of the vectors.


## Jaccard Index

The Jaccard Index, also known as the Jaccard similarity coefficient, is a statistic used for comparing the similarity and diversity of sample sets.

### Intuition

The Jaccard Index measures similarity between finite sample sets and is defined as the size of the intersection divided by the size of the union of the sample sets. It's a measure of how similar the two sets are.

### Computation

The Jaccard Index between two sets `A` and `B` is defined as:

J(A, B) = |A ∩ B| / |A ∪ B|


where `|A ∩ B|` is the size of the intersection of `A` and `B`, and `|A ∪ B|` is the size of the union of `A` and `B`.


## Mahalanobis Distance

The Mahalanobis distance is a measure of the distance between a point and a distribution. It's named after Prasanta Chandra Mahalanobis, an Indian scientist and statistician.

### Intuition

Unlike Euclidean distance, Mahalanobis distance takes into account the correlations of the data set and is scale-invariant. It measures distance relative to the centroid — a base or reference point that is the mean of all the input points.

### Computation

The Mahalanobis distance of a multivariate vector `x` from a group of values with mean `μ` and covariance matrix `S` is defined as:

D(x) = sqrt((x - μ)T * S^-1 * (x - μ))


where `T` denotes the transpose, and `S^-1` is the inverse covariance matrix.


## Gram-Schmidt Process

The Gram-Schmidt process is a method for orthonormalizing a set of vectors in an inner product space, most commonly the Euclidean space R^n. It's named after Jørgen Pedersen Gram and Erhard Schmidt, two mathematicians who independently published this method.

### Intuition

The Gram-Schmidt process takes a finite, linearly independent set of vectors and generates an orthogonal or orthonormal (if the vectors are normalized) set of vectors that spans the same subspace as the original set.

### Computation

The Gram-Schmidt process is computed as follows:

1. Start with a non-zero vector `v1`, normalize it to get the first vector `u1` in the orthonormal basis.
2. For each subsequent vector `vi`, subtract the projection of `vi` onto all the previously computed vectors `u1, ..., ui-1`, and normalize the result to get `ui`.

The mathematical formula for the Gram-Schmidt process is:

ui = vi - sum((vi * uj) * uj for j = 1 to i-1)


where `*` denotes the dot product.
