# Representing Points

When representing data points there are three ways to define the distance between two points
1. Euclidean Distance
2. Manhattan Distance
3. Hamming Distance

In python we can represent points like a list in which each element is a value for each dimension.
So [2,4] would be a point where x =2, y =4
We can also represent points with more than 2 dimensions: [4,8,15,16,23]. This would be a point that has 5 dimensions



# Euclidean Distance

This is a very common distance formula. To find the E. distance between two points, we first calculate the squared distance between each dimension.
When we add all the squared distances and take the sqrt then we will have Euclidean distance
![image.png](attachment:image.png)
![image.png](attachment:image-2.png)


In [3]:
def euclidean_distance(p1, p2):
    distance=0
    for x in range(len(p1)):
        distance +=(p1[x]-p2[x]) **2
    return round(distance **0.5,2)

print(euclidean_distance([1,2],[4,0]))
print(euclidean_distance([5, 4, 3], [1, 7, 9]))


3.61
7.81


# Manhattan Distance
This is very similar to E. distance but instead of taking the sqr of each mathching dimension and then returning the sqrt. We instead will be finding the absolute distance between corresponding dimensions
![image.png](attachment:image.png)

In [5]:
def manhattan_distance(p1, p2):
    distance=0
    for i in range(len(p1)):
        distance+=abs(p1[i]-p2[i])
    return distance

print(manhattan_distance([1, 2], [4, 0]))
print(manhattan_distance([5, 4, 3], [1, 7, 9]))

print("Note that the outputs are different but this is fine as they still produce accurate values for distance")

5
13
Note that the outputs are different but this is fine as they still produce accurate values for distance


# Hamming Distance

Hamming distance is quite different to the other formulas. It doesn't care about arithmetic distance between points. Hamming just checks if corresponding points are equal or not. If corresponding points are diff then we add 1.

## Uses
Hamming distance formula is used in spell checking algorithms. The hamming distance between "three" and "thete" is 1. As only one dimension has different corresponding values.

In [6]:
def hamming_distance(p1, p2):
    distance=0
    for i in range(len(p1)):
        if p1[i]!=p2[i]:
            distance+=1
    return distance
print(hamming_distance([1, 2], [1, 100]))
print(hamming_distance([5, 4, 9], [1, 7, 9]))

1
2


# SciPy Distances

Instead of having to write out own functions to produce these distance formulas we can use the SciPy module
- Euclidean Distance .euclidean()
- Manhattan Distance .cityblock()
- Hamming Distance .hamming()

It's important to note that the scipy .hamming function will not retur the number of points thay are different (like in the function that we made), instead it will return the fraction that represents the number of points that are different over the number of total dimensions. In this way we can see the percentage of difference between points





In [7]:
print(euclidean_distance([1, 2], [4, 0]))
print(manhattan_distance([1, 2], [4, 0]))
print(hamming_distance([5, 4, 9], [1, 7, 9]))

from scipy.spatial import distance
print("\n",distance.euclidean([1,2], [4,0]))
print(distance.cityblock([1, 2], [4, 0]))
print(distance.hamming([5, 4, 9], [1, 7, 9]))

3.61
5
2
3.605551275463989
5
0.6666666666666666
