### Reviewing our Nearest Neighbors Function

Previously, we used the Pythagorean Theorem to calculate distances between individuals, and ultimately wrote a nearest neighbors function to return an array of the closest neighbors to a given individual.

Once again, here were the locations of Bob and our customers:

| Name | Avenue #| Block # | 
|------|------| ------     |
| Bob    | 4  |     8     | 
| Suzie  | 1  |     11     | 
| Fred   | 5  |     8     | 
| Edgar  | 6  |     13     | 
| Steven | 3  |     6     | 
| Natalie| 5  |     4     | 

And we can represent these individuals in Python the following:

In [41]:
neighbors = [{'name': 'Bob', 'x': 4, 'y': 8}, {'name': 'Suzie', 'x': 1, 'y': 11}, 
             {'name': 'Fred', 'x': 5, 'y': 8}, {'name': 'Edgar', 'x': 6, 'y': 13},{'name': 'Steven', 'x': 3, 'y': 6}, {'name': 'Natalie', 'x': 5, 'y': 4}]
selected_individual = neighbors[0]
neighbor = neighbors[1]

To write a `nearest_neighbors` function, we broke this into steps.  

1. First, we wrote a function that calculated the distance between one individual and another.  This function is a translation of our Pythagorean Theorem, which says that given a first individual with coordinates $(x_{1}, y_{1})$, and a second individual with coordinates $(x_{2}, y_{2})$, then $distance = \sqrt{(x_{2} - x_{1})^2 + (y_{2} - y_{1})^2}$.

In [31]:
import math

def distance(selected_individual, neighbor):
   distance_squared = (neighbor['x'] - selected_individual['x'])**2 + (neighbor['y'] - selected_individual['y'])**2
   return math.sqrt(distance_squared)

def distance_between_neighbors(selected_individual, neighbor):
    neighbor_with_distance = neighbor.copy()
    neighbor_with_distance['distance'] = distance(selected_individual, neighbor)
    return neighbor_with_distance

In [42]:
distance_between_neighbors(selected_individual, neighbor)

{'distance': 4.242640687119285, 'name': 'Suzie', 'x': 1, 'y': 11}

Second, we moved onto calculating the distance between a selected_individual, and all of the other neighbors.  We do this by calling our `distance_between_neighbors` function with the `selected_individual` and each of the rest of the neighbors.  Then once we calculate all of the distances, we sort the neighbors by their distances from the `selected_individual`, and select the closest neighbors.   

In [38]:
def distance_all(selected_individual, neighbors):
    remaining_neighbors = filter(lambda neighbor: neighbor != selected_individual, neighbors)
    return list(map(lambda neighbor: distance_between_neighbors(selected_individual, neighbor), remaining_neighbors))

In [39]:
def nearest_neighbors(selected_individual, neighbors, number = None):
    number = number or len(neighbors) - 1
    neighbor_distances = distance_all(selected_individual, neighbors)
    sorted_neighbors = sorted(neighbor_distances, key=lambda neighbor: neighbor['distance'])
    return sorted_neighbors[:number]

In [43]:
nearest_neighbors(selected_individual, neighbors)

[{'distance': 1.0, 'name': 'Fred', 'x': 5, 'y': 8},
 {'distance': 2.23606797749979, 'name': 'Steven', 'x': 3, 'y': 6},
 {'distance': 4.123105625617661, 'name': 'Natalie', 'x': 5, 'y': 4},
 {'distance': 4.242640687119285, 'name': 'Suzie', 'x': 1, 'y': 11},
 {'distance': 5.385164807134504, 'name': 'Edgar', 'x': 6, 'y': 13}]

In [44]:
nearest_neighbors(selected_individual, neighbors, 2)

[{'distance': 1.0, 'name': 'Fred', 'x': 5, 'y': 8},
 {'distance': 2.23606797749979, 'name': 'Steven', 'x': 3, 'y': 6}]

### Nearest by Interest

So far, we have used our nearest neighbor formula to calculate physical differences between individuals.  Data scientists extend this function to find individuals who are close not just physically, but by many other attributes.

For example, imagine that we are recommending movies for friends.  Our technique is to assess whose tastes are most similar, so then we can look to those nearby individuals to recommend a movie.  If two friends have enjoyed movies similarly in the past, we can predict that they will rate a different movie similarly.  We can see which friends have interests which are close to each other through our nearest neighbors formula.  

Ok, let's start with some ratings of the movie, *Hunger Games: Catching Fire*.

| Name | Hunger Games: Catching Fire | 
|------|---------------------------------| 
| Bob    | 3  |     
| Suzie  | 1  |     
| Fred   | 5  |     
| Edgar  | 2  |     
| Steven | 4  |     
| Natalie| 4.5  |     

Let's plot out these ratings.

![Customers and Bob](./Hunger-Games-Ratings.png "Customers and Bob")

Just by looking at this graph, we can see that Steven, Natalie and Fred are grouped closest together.  So if we were assessing similarity in movie taste just by Hunger Game ratings alone, it looks like we to recommend a movie to either Steven or Fred, we would turn to a movie that Natalie likes.  Now let's add in a second movie:

| Name | Hunger Games: Catching Fire | Toy Story 3 | 
|------|---------------------------------| ------ |
| Bob    | 3  |     4     | 
| Suzie  | 1  |     4     | 
| Fred   | 5  |     4     | 
| Edgar  | 2  |     5     | 
| Steven | 4  |     2     | 
| Natalie| 4.5  |     2     | 

Just like in previous lessons, we have two dimensional data, and given one individual, we can calculate distances from our selected individual to the rest of the individuals.  Our data would be represented as the following in Python:

![Customers and Bob](./ratings-2d.png "Customers and Bob")

In [61]:
neighbors = [{'name': 'Bob', 'HungerGames': 3, 'ToyStory': 4}, {'name': 'Suzie', 'HungerGames': 1, 'ToyStory': 4}, 
{'name': 'Fred', 'HungerGames': 5, 'ToyStory': 4}, {'name': 'Edgar', 'HungerGames': 2, 'ToyStory': 5},
{'name': 'Steven', 'HungerGames': 4, 'ToyStory': 2}, {'name': 'Natalie', 'HungerGames': 4.5, 'ToyStory': 2}]

Our previous methods, almost work perfectly.  The only problem is that our previous attributes were stored as x and y values, except now we have stored the attributes as `"HungerGames"` and `"ToyStory"`.  So let's modify our distance, formula, which previously was the following:

```python
def distance(selected_individual, neighbor):
   distance_squared = (neighbor['x'] - selected_individual['x'])**2 + (neighbor['y'] - selected_individual['y'])**2
   return math.sqrt(distance_squared)
```

In [74]:
def distance(selected_individual, neighbor):
   distance_squared = (neighbor['HungerGames'] - selected_individual['HungerGames'])**2 + (neighbor['ToyStory'] - selected_individual['ToyStory'])**2
   return math.sqrt(distance_squared)

In [69]:
selected_individual = neighbors[0]
neighbor = neighbors[1]
distance(selected_individual, neighbor)

2.0

In [67]:
nearest_neighbors(selected_individual, neighbors, 3)

[{'HungerGames': 2,
  'ToyStory': 5,
  'distance': 1.4142135623730951,
  'name': 'Edgar'},
 {'HungerGames': 1, 'ToyStory': 4, 'distance': 2.0, 'name': 'Suzie'},
 {'HungerGames': 5, 'ToyStory': 4, 'distance': 2.0, 'name': 'Fred'}]

### Increasing the Dimensions

So far, we have worked with two dimensional data, that is, data that we plot in an x and y axis.  This makes sense when comparing individuals by physical distances, however with interests, we can increase the number of dimensions.  For example, let's add a third movie to the mix, and then see how nearest neighbor applies.

| Name | Hunger Games: Catching Fire | Toy Story 3 | Frozen | 
|------|---------------------------------| ------ |
| Bob    | 3  |     4     | 1 | 
| Suzie  | 1  |     4     | 4
| Fred   | 5  |     4     | 1|
| Edgar  | 2  |     5     | 5|
| Steven | 4  |     2     | 4|
| Natalie| 4.5  |     2     | 3|

In [73]:
from IPython.display import IFrame
IFrame('movies-3d.html', width=1200, height=800)

You can see that the above graph, displays the data in three dimensions.  Now how do you calculate the distance between any two points?  Well, again it is just the single straight line between the two points. 

For example, consider the distance between Fred and Natalie.  The diagonal line between Fred and Natalie is the distance between the two points.  And that diagonal line is itself the hypotenuse of a right triangle.  Now the formula for calculating a hypotenuse in three coordinates, is the following: 

$distance = \sqrt{(x_{2} - x_{1})^2 + (y_{2} - y_{1})^2 + (z_{2} - z_{1})^2}$

You can watch the [following video](https://www.youtube.com/watch?v=Yi1jYlCzU4E), if you'd like to see why.

Once we get to a fourth video, however, our brain stops having the capacity to visualize space.  But that doesn't mean that our distance formula breaks down.  You simply follow the pattern.  

So this would be distance in four dimensions.

$distance = \sqrt{(x_{2} - x_{1})^2 + (y_{2} - y_{1})^2 + (z_{2} - z_{1})^2 + (w_{2} - w_{1})^2}$


And so on.  Or, to be formal, the square of the distance between two points is equal to the sum, from the first coordinate to the last coordinate, of the squares of the distances in each dimension.

### Summary

In this lesson, we reviewed the nearest neighbors function and saw how it derives from calculating the distance using the Pythagorean Theorem to then sorting neighbors by that interest.  

Then we saw that the nearest neighbors technique has applications beyond calculating physical distance, and can also be used to relate individuals by preferences.  With each new attribute, another axis or dimension is added and the points are plotted.  Calculating nearest neighbors in higher than two dimensions is similar to our previous technique, with required modification being to the distance formula.         