## 3.5.1 Definition of a distance measure

Suppose we have a set of points, called a *space*. A *distance measure* on this space is a function d(x,y) that takes two points in the space as arguments and produces a real number, and satisfies the following axioms:

1. d(x,y) >= 0 (no negative distances)
1. d(x,y) == 0 if and only if x == y (distances are positive, except for the distances from the point to itself).
1. d(x,y) == d(y,x) (distance is symmetric).
1. d(x,y) <= d(x,z) + d(z,y) (the triangle inequality)

The triangle inequality is the most complex condition. It says, intuitively, that to travel from x to y, we cannot obtain any benefit if we are forced to travel via some particular third point z. The triangle-inequality axiom is what makes all distance measures behave as if distance describes the length of a shortest path from one point to another.

## 3.5.7 Exercises for Section 3.5

### !Exercise 3.5.1: On the space of nonnegative integers, which of the following functions are distance measures? If so, prove it; if not, prove that it fails to satisfy one or move of the axioms.




In [1]:
from collections import OrderedDict

def axiomOne(function):
    """ d(x,y) >= 0 (no negative distances) """
    if function >= 0:
        return True
    elif function < 0:
        return False
    else:
        raise TypeError("Bad Input")
        
def axiomTwo(function, x, y):
    """ d(x,y) == 0 if and only if x == y (distances are positive, 
        except for the distances from the point to itself). 
    """
    if x == y and function(x,y) == 0:
        return True
    elif x == y and function(x,y) != 0:
        return False
    elif x != y:
        return True
    else:
        return None
    
def axiomThree(function, x, y):
    """ d(x,y) == d(y,x) (distance is symmetric). """
    if function(x,y) == function(y, x):
        return True
    elif function(x,y) != function(y, x):
        return False
    else:
        raise TypeError("Bad Input")
        
def axiomFour(function, x,y,z):
    """ d(x,y) <= d(x,z) + d(z,y) (the triangle inequality) """
    if function(x,y) <= (function(x,z) + function(z, y)):
        return True
    elif function(x,y) > (function(x,z) + function(z, y)):
        return False
    else:
        raise TypeError("Bad Input")
        
        
def axiomCheck(function, x, y, z):
    axioms = {}
    
    axioms[1] = axiomOne(function)
    axioms[2] = axiomTwo(function, x, y)
    axioms[3] = axiomThree(function, x, y)
    axioms[4] = axiomFour(function, x,y,z)
    
    return OrderedDict(sorted(axioms.items()))


In [2]:
# (a) max(x,y) == the larger of x and y

def maxDist(x, y):
    """  max(x,y) == the larger of x and y """
    if x > y:
        return x
    elif x < y:
        return y
    elif x == y:
        return x
    else:
        raise TypeError("Bad input")

# violates axiom 2
x = 3
y = 3
z = 3

print("When Axiom Two is applied to the function, it returns {}".format(axiomTwo(maxDist, x,y)))
print("\nPart A function is not a distance measure")

axiomCheck(maxDist, x,y,z)

When Axiom Two is applied to the function, it returns False

Part A function is not a distance measure


OrderedDict([(1, True), (2, False), (3, True), (4, True)])

In [3]:
# (b) diff(x,y) == |x - y| (the absolute magnitude of the difference between x and y).

def diff(x,y):
    """ diff(x,y) == |x - y| 
    (the absolute magnitude of the difference between x and y). 
    """
    return abs(x - y)

x = 10
y = 11
z = 3

print("Check one: {}".format(axiomCheck(diff, x,y,z)))


x = 3
y = 3
z = 3

print("Check two: {}".format(axiomCheck(diff, x,y,z)))

print("\nPart B function is a distance measure")


Check one: OrderedDict([(1, True), (2, True), (3, True), (4, True)])
Check two: OrderedDict([(1, True), (2, True), (3, True), (4, True)])

Part B function is a distance measure


In [4]:
# (c) sum(x,y) == x + y

def sumDist(x,y):
    return x + y

x = 3
y = 3
z = 3

print("Check one: {}".format(axiomCheck(sumDist, x,y,z)))
print("\nPart C function is not a distance measure")


Check one: OrderedDict([(1, True), (2, False), (3, True), (4, True)])

Part C function is not a distance measure


## Exercise 3.5.2: Find the L1 and L2 distances between the points (5,6,7) and (8,2,4)

In [5]:
# L1 and L2 distances pertain to the Euclidean distance measure

import numpy as np

def lrNorm(x, y, r):
    """ Computer the Lr Norm for Euclidean Distance
    
    Args:
        x,y = float, list
        
    Returns:
        Float, Euclidean distance
    """
    r_flt = float(r)
    x_arr = np.array(x, np.float64)
    y_arr = np.array(y, np.float64)
    
    sigma = np.sum(
        np.power(np.absolute(
                x_arr - y_arr),r_flt))
    
    eucDist = np.power(sigma, (1/r_flt))
    
    return eucDist
    
    

In [6]:
x = [5,6,7]
y = [8,2,4]

print("L1 norm: {}".format(lrNorm(x, y, 1)))
print("L2 norm: {}".format(lrNorm(x, y, 2)))

L1 norm: 10.0
L2 norm: 5.83095189485


## Exercise 3.5.4: Find the Jaccard distances between the following pairs of sets:
- {1,2,3,4} and {2,3,4,5}
- {1,2,3} and {4,5,6}

In [7]:
def jaccardSim(x,y):
    """ Returns Jaccard similarity score 
    
    Args: 
      x, y: list, int or float
      
    Returns:
      float, jaccard similarity score
    """
    x_set = set(x)
    y_set = set(y)
    
    simScore = float(len(x_set & y_set))/ \
    float(len(x_set | y_set))
    
    return simScore

def jaccardDistance(x,y):
    """ Computes the Jaccard distance """
    return (1 - jaccardSim(x,y))

In [8]:
print("(a) (1,2,3,4) and (2,3,4,5) is jaccard distance: {}".format(jaccardDistance([1,2,3,4],[2,3,4,5])))
print("(b) (1,2,3) and (4,5,6) is jaccard distance: {}".format(jaccardDistance([1,2,3],[4,5,6])))

(a) (1,2,3,4) and (2,3,4,5) is jaccard distance: 0.4
(b) (1,2,3) and (4,5,6) is jaccard distance: 1.0


## Exercise 3.5.5: Compute the cosines of the angles between each of the following pairs of vectors.
- (a) (3,-1,2) and (-2,3,1)
- (b) (1,2,3) and (2,4,6)
- (c) (5,0,-4) and (-1,-6,2)
- (d) (0,1,1,0,1,1) and (0,0,1,0,0,0)

In [24]:
def cosineDistance(x,y):
    """ Computes the cosine distance """
    x_arr = np.array(x, np.float64)
    y_arr = np.array(y, np.float64)
    
    cos = np.dot(x,y) / \
    (np.sqrt(np.dot(x,x)) * np.sqrt(np.dot(y,y)))
    return cos
    


In [25]:
# Example 3.13 

x = [1,2,-1]
y = [2,1,1]

cosineDistance(x,y)

0.50000000000000011

In [26]:
# a 
x = [3,-1,2]; y = [-2,3,1]
cosineDistance(x,y)

-0.5

In [27]:
# b
x = [1,2,3]; y = [2,4,6]
cosineDistance(x,y)

1.0

In [28]:
# c
x = [5,0,-4]; y = [-1,-6,2]
cosineDistance(x,y)

-0.31707317073170732

In [29]:
# d 
x = [0,1,1,0,1,1]; y = [0,0,1,0,0,0]
cosineDistance(x,y)

0.5