### Distance!

The Math:

$\sqrt{\sum\limits_{i=1}^k(x_i - y_i)^2}$

Made a little less stuffy with a 2D example:

$\text{Euclidean  distance}=\sqrt{(x_1-x_2)^2+(y_1-y_2)^2}$

In a picture:

<img src="images/triangle.png" width=300 align="left">

### Python:

In [1]:
alice = (6.5, 2.5)
bob = (8, 2)
carol = (1, 10)

In [2]:
((alice[0] - bob[0])**2 + (alice[1] - bob[1])**2) ** (1/2)

1.5811388300841898

In [3]:
((alice[0] - carol[0])**2 + (alice[1] - carol[1])**2) ** (1/2)

9.300537618869138

In [4]:
# three?!
alice = (6.5, 2.5, 4)
eve = (8, 2, 5)

In [5]:
((alice[0] - eve[0])**2 + (alice[1] - eve[1])**2 + (alice[2] - eve[2])**2) ** (1/2)

1.8708286933869707

A "Real" Example:

In [6]:
import numpy as np
import pandas as pd

df = pd.read_csv('data/alcohol.csv')

df = df.rename(columns={'Unnamed: 0': 'name'}).set_index('name')
df = df.fillna(0)

In [7]:
df

Unnamed: 0_level_0,Old Fashion,Manhattan,Negroni,Margarita,Martini,Whiskey Sour,Beer,Water,Soda Water,Screw Driver
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Finn,2,5,0.0,1,5,4.0,5.0,5.0,2.0,0.0
Marina,2,2,0.0,2,4,0.0,4.0,5.0,0.0,1.0
Aman,5,2,3.0,3,5,4.0,3.0,5.0,3.0,0.0
Anita,5,3,3.0,1,5,3.0,1.0,5.0,0.0,5.0
Jordan,4,3,0.0,4,4,4.0,4.0,4.0,1.0,4.0
Ashley,2,2,1.0,3,2,5.0,5.0,5.0,0.0,2.0
Michael,2,2,3.0,3,2,1.0,0.0,5.0,5.0,0.0
Rittik,5,5,4.0,3,4,2.0,4.0,5.0,4.0,0.0
Brandon,5,5,5.0,5,5,5.0,1.0,0.0,0.0,4.0
Max,4,3,1.0,4,2,0.0,4.0,1.0,5.0,4.0


In [8]:
from sklearn.metrics.pairwise import euclidean_distances

In [9]:
np.round(euclidean_distances(df, df), 2)

array([[ 0.  ,  5.74,  6.  ,  8.25,  6.08,  5.66,  8.6 ,  6.24, 10.58,
         9.22],
       [ 5.74,  0.  ,  6.86,  7.42,  6.  ,  5.74,  7.55,  7.48, 11.  ,
         8.  ],
       [ 6.  ,  6.86,  0.  ,  6.63,  5.92,  6.32,  6.32,  4.12,  8.49,
         8.31],
       [ 8.25,  7.42,  6.63,  0.  ,  5.74,  7.48,  8.83,  7.81,  7.35,
         9.11],
       [ 6.08,  6.  ,  5.92,  5.74,  0.  ,  4.36,  8.77,  7.21,  7.68,
         6.78],
       [ 5.66,  5.74,  6.32,  7.48,  4.36,  0.  ,  8.6 ,  7.81,  9.59,
         8.77],
       [ 8.6 ,  7.55,  6.32,  8.83,  8.77,  8.6 ,  0.  ,  6.4 , 10.86,
         7.68],
       [ 6.24,  7.48,  4.12,  7.81,  7.21,  7.81,  6.4 ,  0.  ,  9.  ,
         7.48],
       [10.58, 11.  ,  8.49,  7.35,  7.68,  9.59, 10.86,  9.  ,  0.  ,
         9.54],
       [ 9.22,  8.  ,  8.31,  9.11,  6.78,  8.77,  7.68,  7.48,  9.54,
         0.  ]])

In [10]:
similar = pd.DataFrame(
    np.round(euclidean_distances(df, df), 2),
    columns=df.index, 
    index=df.index
)

In [11]:
similar

name,Finn,Marina,Aman,Anita,Jordan,Ashley,Michael,Rittik,Brandon,Max
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Finn,0.0,5.74,6.0,8.25,6.08,5.66,8.6,6.24,10.58,9.22
Marina,5.74,0.0,6.86,7.42,6.0,5.74,7.55,7.48,11.0,8.0
Aman,6.0,6.86,0.0,6.63,5.92,6.32,6.32,4.12,8.49,8.31
Anita,8.25,7.42,6.63,0.0,5.74,7.48,8.83,7.81,7.35,9.11
Jordan,6.08,6.0,5.92,5.74,0.0,4.36,8.77,7.21,7.68,6.78
Ashley,5.66,5.74,6.32,7.48,4.36,0.0,8.6,7.81,9.59,8.77
Michael,8.6,7.55,6.32,8.83,8.77,8.6,0.0,6.4,10.86,7.68
Rittik,6.24,7.48,4.12,7.81,7.21,7.81,6.4,0.0,9.0,7.48
Brandon,10.58,11.0,8.49,7.35,7.68,9.59,10.86,9.0,0.0,9.54
Max,9.22,8.0,8.31,9.11,6.78,8.77,7.68,7.48,9.54,0.0


In [12]:
similar.loc['Max'].sort_values(ascending=True)[1:]

name
Jordan     6.78
Rittik     7.48
Michael    7.68
Marina     8.00
Aman       8.31
Ashley     8.77
Anita      9.11
Finn       9.22
Brandon    9.54
Name: Max, dtype: float64

In [13]:
list(zip(df.loc['Jordan'], df.loc['Max']))

[(4.0, 4.0),
 (3.0, 3.0),
 (0.0, 1.0),
 (4.0, 4.0),
 (4.0, 2.0),
 (4.0, 0.0),
 (4.0, 4.0),
 (4.0, 1.0),
 (1.0, 5.0),
 (4.0, 4.0)]

In [14]:
df.loc['Jordan'].index[5]

'Whiskey Sour'