## From the article: https://dataaspirant.com/five-most-popular-similarity-measures-implementation-in-python/

### Similarity:

* The similarity measure is the measure of how much alike two data objects are. If the distance is small, the features are having a high degree of similarity. Whereas a large distance will be a low degree of similarity.

* Special care should be taken when calculating distance across **dimensions/features** that are unrelated. The relative values of each element must be normalized, or one feature could end up dominating the distance calculation.

* Generally, **similarity are measured in the range 0 to 1 [0,1]**. In the machine learning world, this score in the range of [0, 1] is called the similarity score.

### 1. Euclidean Distance:

* The Euclidean distance between two points is the length of the path connecting them. The Pythagorean theorem gives this distance between two points.
* When data is dense or continuous, this is the best proximity measure.

In [4]:
from math import*
 
def euclidean_distance(x,y):
 
    return sqrt(sum(pow(a-b,2) for a, b in zip(x, y)))
 
euclidean_distance([0,3,4,5],[7,6,3,-1])

9.746794344808963

### 2. Manhattan Distance:

* Manhattan distance is a metric in which the distance between two points is calculated as the sum of the **absolute differences** of their Cartesian coordinates. In a simple way of saying it is the total sum of the difference between the x-coordinates and y-coordinates.
* Manhattan distance = |x1 – x2| + |y1 – y2|

In [5]:
from math import*
 
def manhattan_distance(x,y):
 
    return sum(abs(a-b) for a,b in zip(x,y))
 
manhattan_distance([10,20,10],[10,20,20])

10

### 3. Minkowski Distance:

* The Minkowski distance is a generalized metric form of Euclidean distance and Manhattan distance.

equation_minkowski-distance:
![image.png](attachment:image.png)

* In the equation, d^MKD is the Minkowski distance between the data record i and j, k the index of a variable, n the total number of variables y and λ the order of the Minkowski metric. Although it is defined for any λ > 0, it is rarely used for values other than 1, 2, and ∞.




* Different names for the Minkowski distance or Minkowski metric arise from the order:

1. λ = 1 is the Manhattan distance. Synonyms are L1-Norm, Taxicab, or City-Block distance. For two vectors of ranked ordinal variables, the Manhattan distance is sometimes called Foot-ruler distance.
2. λ = 2 is the Euclidean distance. Synonyms are L2-Norm or Ruler distance. For two vectors of ranked ordinal variables, the Euclidean distance is sometimes called Spear-man distance.
3. λ = ∞ is the Chebyshev distance. Synonyms are Lmax-Norm or Chessboard distance.
reference.

In [7]:
from math import*
from decimal import Decimal
 
def nth_root(value, n_root):
 
    root_value = 1/float(n_root)
    return round (Decimal(value) ** Decimal(root_value),3)
 
def minkowski_distance(x,y,p_value):
 
    return nth_root(sum(pow(abs(a-b),p_value) for a,b in zip(x, y)),p_value)
 
minkowski_distance([0,3,4,5],[7,6,3,-1],3)

Decimal('8.373')

### 4. Cosine Similarity

![image.png](attachment:image.png)

* The cosine similarity metric finds the normalized dot product of the two attributes. By determining the cosine similarity, we would effectively try to find the cosine of the angle between the two objects. The cosine of 0° is 1, and it is less than 1 for any other angle.

* It is thus a judgment of orientation and not magnitude. Two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0. Whereas two vectors diametrically opposed having a similarity of -1, independent of their magnitude.

* Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1]. One of the reasons for the popularity of cosine similarity is that it is very efficient to evaluate, especially for sparse vectors.

In [9]:
from math import*
 
def square_rooted(x):
 
    return round(sqrt(sum([a*a for a in x])),3)
 
def cosine_similarity(x,y):
 
    numerator = sum(a*b for a,b in zip(x,y))
   denominator = square_rooted(x)*square_rooted(y)
   return round(numerator/float(denominator),3)
 
cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15])

IndentationError: unindent does not match any outer indentation level (<tokenize>, line 10)