## Continuous distance measures

1. Euclidean distance $$d(x^{[a]}, x^{[b]}) = \sqrt{\sum_{j=1}^{m}   \left(x_j^{[a]}-x_j^{[b]}\right)^2    }$$


2. Mahattan distance $$d(x^{[a]}, x^{[b]}) = \sum_{j=1}^{m}  |x_j^{[a]}-x_j^{[b]}|$$

Minkowski: $$d(x^{[a]}, x^{[b]}) = \left[   \sum_{j=1}^{m}  \left(|x_j^{[a]}-x_j^{[b]}|\right)^p    \right]^{\frac{1}{p}}$$

- $p=1 \rightarrow$ $\texttt{Mathattan distance}$
- $p=2 \rightarrow$ $\texttt{Euclidean distance}$


3. Cosine similarity 
$$\cos \left(\theta\right)=\frac{a . b}{||a|| . ||b||}$$


## Discrete distance measures
1. Hamming distance $$d(x^{[a]}, x^{[b]}) = \sum_{j=1}^{m}  |x_j^{[a]}-x_j^{[b]}|, where: j \in \{0, 1\}$$
2. Jaccard/ Tanimoto similarity $$J(A, B)=\frac{ |A\cap{B}| }{ |A\cup{B}| }=\frac{  |A\cap{B}|  }{|A| + |B| - |A\cap{B}|}$$
Dice: (Independent event) $$D(A, B)=\frac{ 2|A\cap{B}| }{|A| + |B|}$$

# Implementation with NumPy

In [156]:
import numpy as np


################################
### Continuous distance measures
################################

def euclidean_distance(a, b):
    distance = 0
    for ele_i, ele_j in zip(a, b):
        distance += np.power(ele_i - ele_j, 2)
    return np.sqrt(distance)


def manhattan_distance(a, b):
    distance = 0
    for ele_i, ele_j in zip(a, b):
        distance += abs(ele_i - ele_j)
    return distance



def cosine_similarity(a, b):
    magnitude_a = np.sqrt(np.sum(np.power(a, 2)))
    magnitude_b = np.sqrt(np.sum(np.power(b, 2)))
    numerator = np.dot(a.T, b)
    denominator = magnitude_a * magnitude_b
    distance = numerator / denominator
    return distance




##############################
### Discrete distance measures
##############################

def hamming_distance(a, b):
    distance = 0
    for i, values in enumerate(a):
        distance += abs(a[i] - b[i])
    return distance


def jaccard_distance(a, b):
    set_a, set_b = set(a), set(b)
    numerator = len(set_a & set_b)
    denominator = len(a) + len(b) - len(set_a & set_b)
    jaccard_similarity = numerator / denominator
    return 1 - jaccard_similarity

In [161]:
a = np.array([2, 3, 4])
b = np.array([12, 22, 34])

In [162]:
euclidean_distance(a, b)

36.89173349139343

In [163]:
manhattan_distance(a, b)

59

In [164]:
cosine_similarity(a, b)

0.99360098914121

In [170]:
# if hamming distance = 0 (a is exactly b. No difference between a and b)
hamming_distance([1, 1, 1, 1], 
                 [1, 0, 0, 0])

3

In [172]:
jaccard_distance([1, 2, 3, 4, 5, 6],
                [5, 6, 7, 8, 9, 10])

0.8