Clustering is an important part of data cleaning, used in the field of artificial intelligence, deep learning, and data science. Today we are going to discuss distance metrics, which is the backbone of clustering. Distance metrics basically deal with finding the proximity or distance between data points and determining if they can be clustered together. 

### What are distance metrics?

Distance metrics are a key part of several machine learning algorithms. These distance metrics are used in both supervised and unsupervised learning, generally **to calculate the similarity between data points.**

An effective distance metric improves the performance of our machine learning model, whether that’s for classification tasks or clustering.

Let’s say you need to create clusters using a clustering algorithm such as K-Means Clustering or k-nearest neighbor algorithm (knn), which uses nearest neighbors to solve a classification or regression problem. How will you define the similarity between different observations? How can we say that two points are similar to each other? This will happen if their features are similar, right? When we plot these points, they will be closer to each other by distance.

![image.png](attachment:image.png)

Hence, we can calculate the distance between points and then define the similarity between them. Here’s the million-dollar question – how do we calculate this distance, and what are the different distance metrics in machine learning?

#### Types of Distance Metrics in Machine Learning
* Euclidean Distance
* Manhattan Distance
* Minkowski Distance
* Hamming Distance

### Euclidean Distance

Euclidean Distance represents the shortest distance between two vectors.It is the square root of the sum of squares of differences between corresponding elements.

Most machine learning algorithms, including K-Means use this distance metric to measure the similarity between observations. Let’s say we have two points, as shown below:

![image.png](attachment:image.png)

So, the Euclidean Distance between these two points, A and B, will be:

![image-2.png](attachment:image-2.png)

Here’s the formula for Euclidean Distance:

![image-3.png](attachment:image-3.png)

We use this formula when we are dealing with 2 dimensions. We can generalize this for an n-dimensional space as:

![image-4.png](attachment:image-4.png)

Where,

n = number of dimensions\
pi, qi = data points

In [3]:
#Let’s code Euclidean Distance in Python. 

# importing the library
from scipy.spatial import distance

# defining the points
point_1 = (1, 2, 3)
point_2 = (4, 5, 6)
point_1, point_2

# computing the euclidean distance
euclidean_distance = distance.euclidean(point_1, point_2)
print('Euclidean Distance b/w', point_1, 'and', point_2, 'is: ', euclidean_distance)

#This is how we can calculate the Euclidean Distance between two points in Python

Euclidean Distance b/w (1, 2, 3) and (4, 5, 6) is:  5.196152422706632


### Manhattan Distance