### Euclidean Distance
Euclidean distance is the most common use of distance. When data is dense or continuous, this is the best proximity measure. Distance between two points is given by the Pythagorean theorem

In [5]:
from math import *

def euclidean_distance(x, y):
    return sqrt(sum(pow(a-b, 2) for a, b in zip(x, y)))

print("euclidean proximity measure for rattlesnake and boa constrictor: " + str(euclidean_distance([1,1,1,1,0], [0,1,0,1,0])))
print("euclidean proximity measure for rattlesnake and dart frog: " + str(euclidean_distance([1,1,1,1,0], [1,0,1,0,4])))

euclidean proximity measure for rattlesnake and boa constrictor: 1.4142135623730951
euclidean proximity measure for rattlesnake and dart frog: 4.242640687119285


### Manhattan Distance
Manhattan distance is a metric in which distance between two points is the sum of the absolute differences of their cartesian coordinates. Simply it is the absolute sum of difference between the x and y coordinates

In [9]:
from math import *

def manhattan_distance(x, y):
    return sum(abs(a-b) for a, b in zip(x, y))

print("manhattan distance measure for rattlesnake and boa constrictor: " + str(manhattan_distance([1,1,1,1,0], [0,1,0,1,0])))
print("manhattan distance measure for rattlesnake and dart frog: " + str(manhattan_distance([1,1,1,1,0], [1,0,1,0,4])))

manhattan distance measure for rattlesnake and boa constrictor: 2
manhattan distance measure for rattlesnake and dart frog: 6


### Minkowski Distance
Minkowski distance is a generalized metric form of Euclidean and Manhattan distance

Synonyms of Minkowski:
* p = 1 is the Manhattan distance. Synonyms are L1-Norm, Taxicab, City-Block
* p = 2 is the Euclidean distance. Synonyms are L2-Norm or Ruler distance.
* p = Infinity is the Chebyshev distance. Synonyms are Lmax-norm or chessboard distance.

In [11]:
from math import *
from decimal import Decimal

def minkowski_distance(x, y, p):
    return pow(sum(pow(abs(a-b), p) for a, b in zip(x, y)), 1/float(p))

# minkowski distance with p=1 should be equal to manhattan distance
print("minkowski distance measure with p=1 for rattlesnake and boa constrictor: " + str(minkowski_distance([1,1,1,1,0], [0,1,0,1,0], 1)))
print("minkowski distance measure with p=1 for rattlesnake and dart frog: " + str(minkowski_distance([1,1,1,1,0], [1,0,1,0,4], 1)))

# minkowski distance with p=2 should be equal to euclidean distance
print("minkowski distance measure with p=2 for rattlesnake and boa constrictor: " + str(minkowski_distance([1,1,1,1,0], [0,1,0,1,0], 2)))
print("minkowski distance measure with p=2 for rattlesnake and dart frog: " + str(minkowski_distance([1,1,1,1,0], [1,0,1,0,4], 2)))

minkowski distance measure with p=1 for rattlesnake and boa constrictor: 2.0
minkowski distance measure with p=1 for rattlesnake and dart frog: 6.0
minkowski distance measure with p=2 for rattlesnake and boa constrictor: 1.4142135623730951
minkowski distance measure with p=2 for rattlesnake and dart frog: 4.242640687119285


### Cosine Similarity
Cosine similarity metric finds the normalized dot product of the 2 attributes. By determining the cosine similarity, we would effectively try to find the cosine of the angle between the two objects. The cosine of 0-degree is 1 and it is less than 1 for any other angle. It is this a judgement of orientation and not magnitude

Cosine similarity is very efficient to evaluate, especially for sparse vectors
Cosine similarity is calculated by dividing the dot product of 2 vectors by the product of the length of each vector

* Cosine similarity: http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/
* Dot product: https://www.khanacademy.org/math/linear-algebra/vectors-and-spaces/dot-cross-products/v/dot-and-cross-product-comparison-intuition
* Length of vectors: https://chortle.ccsu.edu/VectorLessons/vch04/vch04_8.html


In [19]:
from math import *

def vector_length(v):
    return sqrt(sum(elt*elt for elt in v))

def dot_product(x, y):
    return sum(a*b for a,b in zip(x, y))
    
def cosine_similarity(x, y):
    return dot_product(x, y)/(vector_length(x)*vector_length(y))

print("cosine similarity measure for rattlesnake and boa constrictor: " + str(cosine_similarity([1,1,1,1,0], [0,1,0,1,0])))
print("cosine similarity measure for rattlesnake and dart frog: " + str(cosine_similarity([1,1,1,1,0], [1,0,1,0,4])))

cosine similarity measure for rattlesnake and boa constrictor: 0.7071067811865475
cosine similarity measure for rattlesnake and dart frog: 0.23570226039551587


### Jaccard Similarity
Jaccard similarity measures the similarity between finite sample sets and is defined as the cardinality of the intersection of sets divided by the cardinality of the union of the sample sets.

In [20]:
from math import *

def jaccard_similarity(x, y):
    intersection_cardinality = len(set(x) | set(y))
    union_cardinality = len(set(x) & set(y))
    
    return intersection_cardinality/union_cardinality

print("jaccard similarity measure for rattlesnake and boa constrictor: " + str(jaccard_similarity([1,1,1,1,0], [0,1,0,1,0])))
print("jaccard similarity measure for rattlesnake and dart frog: " + str(jaccard_similarity([1,1,1,1,0], [1,0,1,0,4])))

jaccard similarity measure for rattlesnake and boa constrictor: 1.0
jaccard similarity measure for rattlesnake and dart frog: 1.5
