<a href="https://colab.research.google.com/github/sivasaiyadav8143/Deep-Learning-with-Python/blob/main/Euclidean_Distance_vs_Cosine_Similarity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## What Is Distance?

Both cosine similarity and Euclidean distance are methods for measuring the proximity between vectors in a vector space.

<center>
<img src="https://github.com/sivasaiyadav8143/Deep-Learning-with-Python/blob/main/ScreenShots/distance.png?raw=1"></center>

We have a 2D vector space in which three distinct points are located: blue, red, and green. We could ask ourselves the question as to which pair or pairs of points are closer to one another. As we do so, we expect the answer to be comprised of a unique set of pair or pairs of points:

*  If only one pair is the closest, then the answer can be either (blue, red), *(blue, green), or (red, green)
*  If two pairs are the closest, the number of possible sets is three, corresponding to all two-element combinations of the three pairs
*  Finally, if all three pairs are equally close, there is only one possible set that contains them all



This means that the set with the closest pair or pairs of points is one of seven possible sets. How do we determine then which of the seven possible answers is the right one? To do so, we need to first determine a method for measuring distances.

Some of the ways to measure is with
1. Euclidean distance
2. Cosine similarity.

1. **Euclidean distance** is a measure of the true straight line distance between two points in Euclidean space. It is corresponds to the L2-norm of a difference between vectors.
2. **Cosine similarity** is generally used as a metric for measuring distance when the magnitude of the vectors does not matter. 

**When to use what**

<center>
<img src="https://github.com/sivasaiyadav8143/Deep-Learning-with-Python/blob/main/ScreenShots/Cosine%20Vs%20Euclidean.png?raw=1"></center>

Case 1: When Cosine Similarity is better than Euclidean distance

Let’s assume OA, OB and OC are three vectors as illustrated in the figure 1. The points A, B and C form an equilateral triangle. This means that the Euclidean distance of these points are same (AB = BC = CA). In this case, the Euclidean distance will not be effective in deciding which of the three vectors are similar to each other. Although the magnitude (length) of the vectors are different, Cosine similarity measure shows that OA is more similar to OB than to OC.

In [1]:
#Python code for Case 1: Where Cosine similarity measure is better than Euclidean distance
from scipy.spatial import distance

# The points below have been selected to demonstrate the case for Cosine similarity
O = [0.00, 0.00]
A = [1.45, 7.56]
B = [7.81, 12.41]
C = [8.83, 4.48]

#Cosine similarity
cos_simA_B = 1 - distance.cosine(A, B)
cos_simB_C = 1 - distance.cosine(B, C)
cos_simA_C = 1 - distance.cosine(A, C)

#Measuring Euclidean distances
euc_dstA_B = distance.euclidean(A,B)
euc_dstB_C = distance.euclidean(B,C)
euc_dstA_C = distance.euclidean(C,A)

print('Cosine Similarity measure:\n')
print('Between OA and OB: {}'.format(cos_simA_B))
print('Between OB and OC: {}'.format(cos_simB_C))
print('Between OC and OA: {}\n'.format(cos_simA_C))

print('Euclidean Distances::\n')
print('From A to B: {}'.format(euc_dstA_B))
print('From B to C: {}'.format(euc_dstB_C))
print('From C to A: {}'.format(euc_dstA_C))


Cosine Similarity measure:

Between OA and OB: 0.9315258342391336
Between OB and OC: 0.8579300679601176
Between OC and OA: 0.6123399158783746

Euclidean Distances::

From A to B: 7.998256059917061
From B to C: 7.995329886877713
From C to A: 7.996924408796171


**Case 1: Where Cosine similarity measure is better than Euclidean distance**

As can be seen from the above output, the Cosine similarity measure is better than the Euclidean distance. Cosine similarity measure suggests that OA and OB are closer to each other than OA to OC.

**Case 2: When Euclidean distance is better than Cosine similarity**

Consider another case where the points A’, B’ and C’ are collinear as illustrated in the figure 1. In this case, Cosine similarity of all the three vectors (OA’, OB’ and OC’) are same (equals to 1). However, the Euclidean distance measure will be more effective and it indicates that A’ is more closer (similar) to B’ than C’.

In [2]:
#Python code for Case 2: Euclidean distance is better than Cosine similarity
A_ = [8.00, 2.00]
B_ = [12.00, 3.00]
C_ = [32.00, 8.00]

#Cosine similarity
cos_simA_B_ = 1 - distance.cosine(A_, B_)
cos_simB_C_ = 1 - distance.cosine(B_, C_)
cos_simA_C_ = 1 - distance.cosine(A_, C_)

#Euclidean distance
dstA_B_ = distance.euclidean(A_,B_)
dstB_C_ = distance.euclidean(B_,C_)
dstA_C_ = distance.euclidean(C_,A_)

print('Cosine Similarity measure:\n')
print('Between OA and OB: {}'.format(cos_simA_B_))
print('Between OB and OC: {}'.format(cos_simB_C_))
print('Between OC and OA: {}\n'.format(cos_simA_C_))

print('Euclidean Distances::\n')
print('From A to B: {}'.format(dstA_B_))
print('From B to C: {}'.format(dstB_C_))
print('From C to A: {}'.format(dstA_C_))

Cosine Similarity measure:

Between OA and OB: 1.0
Between OB and OC: 1.0
Between OC and OA: 1.0

Euclidean Distances::

From A to B: 4.123105625617661
From B to C: 20.615528128088304
From C to A: 24.73863375370596


**Case 2: Euclidean distance is a better measure than Cosine similarity**

As can be seen from the above output, the Cosine similarity measure was same but the Euclidean distance suggests points A and B are closer to each other and hence similar to each other.

**When Should We Prefer One Over the Other?**

The decision as to which metric to use depends on the particular task that we have to perform:

* Some tasks, such as preliminary data analysis, benefit from both metrics; each of them allows the extraction of different insights on the structure of the data
* Others, such as text classification, generally function better under Euclidean distances
* Some more, such as retrieval of the most similar texts to a given document, generally function better with cosine similarity

As is often the case in machine learning, the trick consists in knowing all techniques and learning the heuristics associated with their application. This is acquired via trial and error.

## References

1. https://medium.com/@sasi24/cosine-similarity-vs-euclidean-distance-e5d9a9375fc8
2. https://cmry.github.io/notes/euclidean-v-cosine
3. https://www.baeldung.com/cs/euclidean-distance-vs-cosine-similarity