# Distances metrics from scratch

For real valued vector space (continous variables) -
1. Euclidean (Pythagorean theoram)
2. Manhattan (Also known as cityblock distance, or taxicab geometry)
3. chebyshev
3. Minkowski (generalized dsitance metrics)

Metrics for integer valued vector space (categorical data) - 

4. Hamming (metrics for comparing two binary strings)



### Preference (Euclidean vs manhattan - high dimentional data) 
- Many high dimensional indexing structures and algorithms use the euclidean distance metric as a natural
extension of its traditional use in two- or three-dimensional spatial applications (refer paper [2] )
- the Manhattan distance metric (L1 norm) is consistently more preferable than the
Euclidean distance metric (L2 norm) for high dimensional data mining
applications - (refer paper [2] )

## Getting pairwise distance metrics using sklearn's Distance metrics

In [1]:
from sklearn.neighbors import DistanceMetric
import sklearn

In [2]:
dist = DistanceMetric.get_metric('euclidean') #euclidean
X = [[1,2],[7,8]]
dist.pairwise(X)


array([[0.        , 8.48528137],
       [8.48528137, 0.        ]])

In [3]:
DistanceMetric.get_metric('manhattan').pairwise(X) #manhattan

array([[ 0., 12.],
       [12.,  0.]])

In [4]:
DistanceMetric.get_metric('minkowski').pairwise(X) #minkowski

array([[0.        , 8.48528137],
       [8.48528137, 0.        ]])

In [5]:
DistanceMetric.get_metric('hamming').pairwise(X) #hamming

array([[0., 1.],
       [1., 0.]])

In [6]:
DistanceMetric.get_metric('chebyshev').pairwise(X) #chebyshev

array([[0., 6.],
       [6., 0.]])

## Calculating distance metrics using scipy

In [23]:
from scipy.spatial import distance
distance.euclidean(X[0],X[1])

8.48528137423857

In [24]:
distance.cityblock(X[0],X[1]) #manhattan - named as cityblock in scipy

12

## Writing my own pairwise Distance metrics from scratch

In [7]:

# Made use of Inner class 

#covers euclidean and manhattan distance metrics

class MyDistanceMetrics:
    
    def __init__(self):
        self.metric = ''
        self.cl = ''
    
    def get_metric(metric):
        if metric == 'euclidean':
            cl = Euclidean
        elif metric == 'manhattan':
            cl = Manhattan
        return cl
    
        
    

class Euclidean:
    def __init__(self,X):
        self.X = X
    def pairwise(X):
        pwise = []
        for i in range(len(X)):
            templist = []
            for j in range(len(X[i])):
                templist.append(math.sqrt((X[i][i]-X[j][i])**2 + (X[i][j]-X[j][j])**2))
            pwise.append(templist)
        return pwise
   

class Manhattan:
        def __init__(self,X):
            self.X=X
            
        def pairwise(X):
            pwise = []
            for i in range(len(X)):
                templist = []
                for j in range(len(X[i])):
                    templist.append(abs(X[i][i]-X[j][i]) + abs(X[i][j]-X[j][j]))
                pwise.append(templist)
            return pwise
        
        

    

In [25]:
# calculation of metrics
dsit = MyDistanceMetrics.get_metric('euclidean') #euclidean
dist.pairwise(X)



array([[0.        , 8.48528137],
       [8.48528137, 0.        ]])

In [27]:
MyDistanceMetrics.get_metric('manhattan').pairwise(X) #manhattan

[[0, 12], [12, 0]]

## Resources:
- [1] [Sklearn Distance metrics - Docs](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html)
- [2] [On the Surprising Behavior of Distance Metrics
in High Dimensional Space - Paper](https://bib.dbvis.de/uploadedFiles/155.pdf)
- [3] [Different Types of Distance Metrics used in Machine Learning - blog](https://medium.com/@kunal_gohrani/different-types-of-distance-metrics-used-in-machine-learning-e9928c5e26c7)
