In [None]:
//boring imports
var ml = require('ml-distance')

# Distances

In the last section we reviewed vectors and looked at their `inner products` a fundamental measure of similarity.

Another important measure on vectors are distances, essentially the distane between two points in $R^N$. This seems straight forward to calculate in cartesian space. 

However, there are various definitons of distance, each are calculated differently and have different impact on the performance of an algorihm.


#### First Up - Building a grid



In [None]:
var X = [];
var step = 0.01;

// build a 2d array of coordinates
for (var i = 0; i <= 1; i += step) {
    var x = [];
    for (var j = 0; j <= 1; j += step) {
        x.push([i, j])
    }
    X.push(x)
}

#### Euclidean Distance

Also known as the `L2 Distance`, this is well known distance between two vectors that we all learn in school.

$$d_{a,b} = \sqrt{\sum_{i=0}^{N-1} (a_i - b_i)^2}$$

An interesting way to visualise this distance measure is to hold one point constant, $A$ and vary the other over a range. This allows us to plot the distance function.

This distance measure produces a smooth function.

In [None]:
var A = [0.5, 0.5]

// compute the distance over our coordinate grid
var D = X.map(col => {
    return col.map(x => {
        return euclideanDistance(x, A);
    })
});

function euclideanDistance(a, b) {
    var sum = a.reduce((acc, x, i) => {
        return acc + Math.pow(a[i] - b[i], 2);
    }, 0)
    return Math.sqrt(sum);
}


// plot the resulting distance surface
var Plot = require('plotly-notebook-js');

var data = {
    z: D,
    type: "surface"
}

var layout = {
    title: "Euclidean distance 'surface' realtive to A[0.5, 0.5]",
    width: 700,
    height: 700
}

$$html$$ = Plot.createPlot([data], layout).render()

#### Manhattan Distance

Also known as the `City Block`, `Taxicab` or `L1 Distance` uses the sum os absolute deviations in parallel to the L1 norm.


$$d_{a,b} = \sum_{i=0}^{N-1} \vert a_i - b_i \vert$$


In [None]:
// compute the distance over our coordinate grid, reuse the coordinate array from above
var D = X.map(col => {
    return col.map(x => {
        return manhattanDistance(x, A);
    })
});

function manhattanDistance(a, b) {
    var sum = a.reduce((acc, x, i) => {
        return acc + Math.abs(a[i] - b[i]);
    }, 0)
    return Math.sqrt(sum);
}


// plot the resulting distance surface
var Plot = require('plotly-notebook-js');

var data = {
    z: D,
    type: "surface"
}

var layout = {
    title: "Manhattan distance 'surface' realtive to A[0.3, 0.3]",
    width: 700,
    height: 700
}

$$html$$ = Plot.createPlot([data], layout).render()

And there are lots more. Check out all of the options in `ml-distance` [docs](https://www.npmjs.com/package/ml-distance)


So let's try plotting some out. Use the grid and plotting code above (makea copy of the cell if you like) to see what the error / distance / cost surfaces of some of these are. Interesting ones might be:

 - chebyshev
 - kullbackLeibler


## Similarity

There are also some Simularity measures at the bbottom of that list. 


Why do you think Similarities might be different from distances?

Try some out!


## Further Reading

 - [5 most popular similarity measures in python](http://dataaspirant.com/2015/04/11/five-most-popular-similarity-measures-implementation-in-python/)
 - [excellent cross validated stackexchange question](https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications)

In [None]:
// note the distance library has a funny export structure
console.log('ml.distsnce', Object.keys(ml.distance))
console.log('ml.similarity', Object.keys(ml.distance))