## Euclidean Distance

We are going to work with the Euclidean distance function

### 1. Import libraries

In [18]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### 2. Create our data frame

For the data frame we will use random data. We will use NFL players

In [19]:
# Create the variables with the data
players = ['DK Metcalf', 'Fletcher Cox', 'Tyreek Hill', 'Bobby Wagner', 'Derrick Henry', 'Keisean Nixon']
height = [6.4, 6.4, 5.10, 6.0, 6.3, 5.10]
weight = [235, 310, 191, 242, 247, 200]
position = ['WR', 'DT', 'WR', 'MLB', 'RB', 'CB']

mydf = {
    'Player Name': players,
    'Height': height,
    'Weight': weight,
    'Player Position': position
}

df = pd.DataFrame(mydf)
df

Unnamed: 0,Player Name,Height,Weight,Player Position
0,DK Metcalf,6.4,235,WR
1,Fletcher Cox,6.4,310,DT
2,Tyreek Hill,5.1,191,WR
3,Bobby Wagner,6.0,242,MLB
4,Derrick Henry,6.3,247,RB
5,Keisean Nixon,5.1,200,CB


In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Player Name      6 non-null      object 
 1   Height           6 non-null      float64
 2   Weight           6 non-null      int64  
 3   Player Position  6 non-null      object 
dtypes: float64(1), int64(1), object(2)
memory usage: 324.0+ bytes


### 3. Euclidean distance function

Using the Euclidean distance function with the purpose of finding similarities

In [21]:
# Calculate the euclidean distance between speed and yards for each player
ecudist = df['Height'] - df['Weight']
df['Eucdist'] = np.sqrt(np.power(ecudist, 2))
# df['Eucdist'] = np.sqrt((df['Speed'] - df['Play Yards']) ** 2)

df

Unnamed: 0,Player Name,Height,Weight,Player Position,Eucdist
0,DK Metcalf,6.4,235,WR,228.6
1,Fletcher Cox,6.4,310,DT,303.6
2,Tyreek Hill,5.1,191,WR,185.9
3,Bobby Wagner,6.0,242,MLB,236.0
4,Derrick Henry,6.3,247,RB,240.7
5,Keisean Nixon,5.1,200,CB,194.9


#### 3.1 Sorting the values

Now we sort the values by the 'Ecudist' column

In [22]:
sorted_df = df.sort_values(by='Eucdist', ascending=True)
sorted_df

Unnamed: 0,Player Name,Height,Weight,Player Position,Eucdist
2,Tyreek Hill,5.1,191,WR,185.9
5,Keisean Nixon,5.1,200,CB,194.9
0,DK Metcalf,6.4,235,WR,228.6
3,Bobby Wagner,6.0,242,MLB,236.0
4,Derrick Henry,6.3,247,RB,240.7
1,Fletcher Cox,6.4,310,DT,303.6


#### 3.2 New Player Record

We add a new player record to find similarities with the ones in our data frame

In [23]:
new_player = {
    'Player Name': 'Jayden Reed',
    'Height': 5.11,
    'Weight': 187,
    'Player Position': 'WR'
}

df.loc[len(df)] = new_player
df

Unnamed: 0,Player Name,Height,Weight,Player Position,Eucdist
0,DK Metcalf,6.4,235,WR,228.6
1,Fletcher Cox,6.4,310,DT,303.6
2,Tyreek Hill,5.1,191,WR,185.9
3,Bobby Wagner,6.0,242,MLB,236.0
4,Derrick Henry,6.3,247,RB,240.7
5,Keisean Nixon,5.1,200,CB,194.9
6,Jayden Reed,5.11,187,WR,


In [24]:
df['Eucdist'] = np.sqrt((df['Height'] - df['Weight']) ** 2)

df

Unnamed: 0,Player Name,Height,Weight,Player Position,Eucdist
0,DK Metcalf,6.4,235,WR,228.6
1,Fletcher Cox,6.4,310,DT,303.6
2,Tyreek Hill,5.1,191,WR,185.9
3,Bobby Wagner,6.0,242,MLB,236.0
4,Derrick Henry,6.3,247,RB,240.7
5,Keisean Nixon,5.1,200,CB,194.9
6,Jayden Reed,5.11,187,WR,181.89


In [25]:
sorted_df = df.sort_values(by='Eucdist', ascending=True)
sorted_df

Unnamed: 0,Player Name,Height,Weight,Player Position,Eucdist
6,Jayden Reed,5.11,187,WR,181.89
2,Tyreek Hill,5.1,191,WR,185.9
5,Keisean Nixon,5.1,200,CB,194.9
0,DK Metcalf,6.4,235,WR,228.6
3,Bobby Wagner,6.0,242,MLB,236.0
4,Derrick Henry,6.3,247,RB,240.7
1,Fletcher Cox,6.4,310,DT,303.6


#### 4. Change non-numeric data into numeric data

We can use LabelEndocer to convert non-numeric data into numeric data

In [29]:
# import the library
from sklearn.preprocessing import LabelEncoder

In [31]:
# To work with LabelEncoder first we need to save it into a variable
lb = LabelEncoder()

# Now let´s change the values of the 'Player Position' column
df_copy = df.copy()

df_copy['Player Position'] = lb.fit_transform(df_copy['Player Position'])
df_copy

Unnamed: 0,Player Name,Height,Weight,Player Position,Eucdist
0,DK Metcalf,6.4,235,4,228.6
1,Fletcher Cox,6.4,310,1,303.6
2,Tyreek Hill,5.1,191,4,185.9
3,Bobby Wagner,6.0,242,2,236.0
4,Derrick Henry,6.3,247,3,240.7
5,Keisean Nixon,5.1,200,0,194.9
6,Jayden Reed,5.11,187,4,181.89
