## Problem 3 (1pt)

Implement `find_closest` which takes:

1. A `location`
2. A sequence of `centroids` (locations)

...and returns the element of `centroids` closest to `location`.

Use the `distance` function from `utils.py` to measure distance between locations. The `distance` function calculates the Euclidean distance between 2 locations.

### `python3 ok -q 03 -u` quiz

Q: Which of the following types of values can be passed as
an argument to distance?
Choose the number of the correct choice:

0. pair; e.g. `[1, 1]`
1. restaurant; e.g. `make_restaurant('A', [1, 1], ['Food'], 1, [])`
2. number; e.g. 1
3. string of a pair; e.g. `'[1, 1]'`

**Answer**: 0. This can be seen from the `distance` function doctest in `utils.py`.

Q: Consider the list l = `[[4, 1], [-3, 2], [5, 0]]`. Which of
the choices below for fn would make `min(l, key=fn)` evaluate
to `[4, 1]`?
Choose the number of the correct choice:

0. `lambda x, y: abs(x - y)`
1. `sum`
2. `lambda x, y: pow(-x, y)`
3. `lambda x: abs(x[0] - x[1])`

**Answer**: 3

In [None]:
>>> import tests.test_functions as test
>>> from recommend import *
>>> distance([0, 0], [3, 4]) # should be a decimal
5.0

In [None]:
>>> distance([6, 1], [6, 1]) # should be a decimal
0.0

In [None]:
>>> distance([-2, 7], [-3.5, 9]) # should be a decimal
2.5

In [4]:
>>> import tests.test_functions as test
>>> from recommend import *
>>> find_closest([6, 1],
...              [[1, 5], [3, 3]])
[3, 3]

2.5

In [None]:
>>> find_closest([1, 6],
...              [[1, 5], [3, 3]])
[1, 5]

In [None]:
>>> find_closest([0, 0],
...              [[-2, 0], [2, 0]])
[-2, 0]

In [None]:
>>> find_closest([0, 0],
...              [[1000, 1000]])
[1000, 1000]

In [None]:
>>> import tests.test_functions as test
>>> from recommend import *
>>> # Be sure to use the distance function!
>>> find_closest([0, 0],
...              [[2, 2], [0, 3]])
[2, 2]

In [None]:
>>> find_closest([0, 0],
...              [[5, 5], [2, 7]])
[5, 5]

### Implementation of Problem 3

Looking at the `ok` quiz, the `centroids` consists of a list of lists of x, y coordinates. We want to find the `centroid` that's the closest to the `location`. This can be done nicely with one line `min` statement using an additional `key` statement.

In [None]:
def find_closest(location, centroids):
    return min(centroids, key = lambda centroid: distance(location, centroid)

With the implementation above, 
* For every `centroid` in `centroids`, Python calculates the distance between the `location` and the `centroid`
* In the end, Python returns the centroid that results in the least distance

## Problem 4

The function `group_by_centroid` takes in:
* A sequence of `restaurants`
    * Not only the location, but all attributes of a restaurant (e.g. name, reviews,etc.)
* A sequence of `centroids` (locations)

...and returns a list of `clusters`. Each cluster is a list of restaurants that are closer to a specific centroid. 

If a restaurant is equally close to 2 centroids, associate it with the centroid that appears first in the sequence of `centroids`

### `python ok -q 04 -u` quiz

Q: If centroids is [[-1, 1], [5, -1], [1, 10], [-1, -10]],
to which centroid will [6, 0] be associated?
Choose the number of the correct choice:

0. [-1, 1]
1. [5, -1]
2. [1, 10]
3. [-1, -10]

**Ans**: 1

Q: If centroids is [[1, 1], [1, -1], [-1, 1], [-1, -1]],
to which centroid will [0, 0] be associated?
Choose the number of the correct choice:

0. [1, -1]
1. [-1, 1]
2. [-1, -1]
3. [1, 1]

**Ans**: 3

### Implementation of Problem 4

The hint instructs to use the provided `group_by_first` function to group together all values for the same key in a list of `[key, value]` pairs.

In [None]:
def group_by_first(pairs):
    """Return a list of pairs that relates each unique key in the [key, value]
    pairs to a list of all values that appear paired with that key.

    Arguments:
    pairs -- a sequence of pairs

    >>> example = [ [1, 2], [3, 2], [2, 4], [1, 3], [3, 1], [1, 2] ]
    >>> group_by_first(example)
    [[2, 3, 2], [2, 1], [4]]
    """
    keys = []
    for key, _ in pairs:
        if key not in keys:
            keys.append(key)
    return [[y for x, y in pairs if x == key] for key in keys]

In the doctest below, The first pair in the `example` sequence if `[1, 2]`. This function then groups all the pairs whose key is `1` (`[1, 3]`, `[1, 2]`) in orderly fashion. From this grouping, we obtain `[2, 3, 2]`.

The next pair to group is `[3, 2]`. All other pairs whose key is `3` is `[3, 1]`. Hence the output `[2, 1]`.

There are no other pairs with key `2` for the pair `[2, 4]`. Thus, the output is simply `[4]`.

Pseudocode:

1. For each restaurant, find the closest `centroid` via `find_closest` function
2. Using `zip` function, pair each restaurant with its closest centroid
3. Group the result of step #2 using `group_by_first`.

In [None]:
# With the restaurant_location selector, obtain each restaurant's location and put in a list
restaurant_locations = [restaurant_location(restaurant) for restaurant in restaurants]
# Calculate the closest centroid for each restaurant location
closest_centroid = [find_closest(location, centroids) for location in restaurant_locations]
# Zip each closest_centroid with restaurants
zipped_pair = zip(closest_centroid, restaurants)
# Then group the zipped_pair using group_by_first
return group_by_first(zipped_pair)

## Problem 5

The `find_centroid` function takes in a `cluster` (a list of restaurants) and returns a `centroid`, a coordinate based on the `mean` of the latitude (first number) and the `mean` of the longitude (second number).

### `python ok -q 05 -u` quiz

In [None]:
>>> from recommend import *
>>> cluster1 = [
...     make_restaurant('A', [-3, -4], [], 3, [make_review('A', 2)]),
...     make_restaurant('B', [1, -1],  [], 1, [make_review('B', 1)]),
...     make_restaurant('C', [2, -4],  [], 1, [make_review('C', 5)]),
... ]
>>> find_centroid(cluster1) # should be a pair of decimals
[0.0, -3.0]

### Implementation of Problem 5

Pseudocode:

1. Access each restaurant's location using the `restaurant_location` selector
2. For each `location`, calculate the `mean` of the latitude and longitude separately
3. Return the result in a list of `[mean of latitude, mean of longitude]`

In [None]:
# Obtain the location of each restaurant using the restaurant_location selector
locations = [restaurant_location(restaurant) for restaurant in cluster]
# Obtain the latitudes from locations, indicated by index 0 of each location
latitudes = [location[0] for location in locations]
# Obtain the longitude, indicated by index 1 of each location
longitudes = [location[1] for location in locations]
# Simply calculate the mean of latitudes and longitudes separately then return them in a form of list
return [mean(latitudes), mean(longitudes)]

## Problem 6

Q: What are we using the k-means algorithm to achieve?
Choose the number of the correct choice:

0. Predicting the ratings for k restaurants.
1. Grouping the restaurants into k clusters by location.
2. Finding the mean rating of restaurants for k categories.

**Ans**: 1

Q: What is the first step of the k-means algorithm?
Choose the number of the correct choice:

0. Create a cluster for each centroid consisting of all elements closest to
   that centroid.
1. Find the centroid (average position) of each cluster.
2. Randomly initialize k centroids.

**Ans**: 2

Q: After we randomly initialize k centroids, what is the first step
of the iterative portion of the k-means algorithm?
Choose the number of the correct choice:

0. Create a cluster for each centroid consisting of all elements closest to
   that centroid.
1. Group restaurants by latitude.
2. Find the centroid (average position) of each cluster.
3. Randomly reassign centroids.

**Ans**: 0

Q: What is the second step of the iterative portion of the k-means
algorithm?
Choose the number of the correct choice:

0. Find the centroid (average position) of each cluster.
1. Randomly reassign centroids.
2. Group restaurants by latitude.
3. Create a cluster for each centroid consisting of all elements closest to
   that centroid.

**Ans**: 0

### Implementation of Problem 6

The function `k_means` came with the following lines:

In [None]:
assert len(restaurants) >= k, 'Not enough restaurants to cluster'
old_centroids, n = [], 0
# Select initial centroids randomly by choosing k different restaurants
centroids = [restaurant_location(r) for r in sample(restaurants, k)]

while old_centroids != centroids and n < max_updates:
    old_centroids = centroids 
    # BEGIN Question 6
    
    
    # END Question 6
    n += 1
    # The cycle stops once the centroids don't change anymore
return centroids

From the provided code above, we get the following information:

1. `n` is the number of iteration for `while` loop. It stops when it reaches `max_updates`
2. The `centroids` is a list of lists (e.g. `[[1, 3], [5, -2]]`)

Pseudocode:

1. Group `restaurants` into clusters where each cluster contains all restaurants closest to the same centroid. This can be done via calling `group_by_centroid(restaurants, centroids)`
2. Find the centroid of each cluster using `find_centroid`.
3. Return the centroids from step #2 in a single list and bind it to the variable `centroids`

In [None]:
# Group restaurants into clusters
clusters = group_by_centroid(restaurants, centroids)
# Find the centroid for each cluster, and bind to to the variable centroids
centroids = [find_centroid(cluster) for cluster in clusters]