# KNeighborsClassifier from scratch

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Algorithm
**Input:**  
- `X`: an array of shape `(N,d)` whose rows are samples and columns are features
- `y`: the labels of shape `(N,)`
- `k`: Numbers of neighbors (including self) to vote
- `algorithm`: `'brute'`, `'ball_tree'`, or `'kd_tree'`

**Output:**  
A tuple `(predict, k_nearest_neighbors)`.  
- `predict`: a function that takes data `X_sample` and output their predicted labels
- `k_nearest_neighbors`: a function that takes data `X_sample` and return an array of shape `(X_sample_height, k)` that stores the indices of the nearest neighbors in `X` for each row in `X_sample`

**Steps:**
1. If `algorithm=="brute"`, create the function `k_nearest_neighbors` by the distance matrix.  
2. If `algorithm=="ball_tree"` or `algorithm=="kd_tree"`, create the function `k_nearest_neighbors` by `sklearn.neighbors.NearestNeighbors` with the corresponding algorithm.
3. Create the function `predict` that executes the following steps:
    1. Input `X_sample` .
    2. Let `nbrhoods = k_nearest_neighbors(X_sample)` .  
    3. Let `votes = y[nbrhoods]` .
    4. Calculate the most frequent label in each row of `votes` and store the results in `y_new` .
    5. Return `y_new` .

## Pseudocode
Translate the algorithm into the pseudocode.  
This helps you to identify the parts that you don't know how to do it.  

    1. 
    2. 
    3. ...

## Code

In [None]:
### your answer here

## Test
Take some sample data from [KNeighborsClassifier-with-scikit-learn](KNeighborsClassifier-with-scikit-learn.ipynb) and check if your code generates similar outputs with the existing packages.

##### Name of the data
Description of the data.

In [None]:
### results with your code

In [None]:
### results with existing packages

## Comparison

##### Exercise 1
Let  
```python
t = np.arange(20)
angle = 2 * np.pi / 20 * t
X1 = np.vstack([np.cos(angle), np.sin(angle)]).T
X2 = 5 * X1
X = np.vstack([X1, X2])
y = np.array([0]*20 + [1]*20)
X_sample = 10 * np.random.rand(1000,2) - np.array([5,5])
```

###### 1(a)
Train a $k$-nearest neighbors classification model by `X` and `y` .  
Make a prediction of `X_sample` by:  
1. your code with different algorithm settings
2. `sklearn.neighbors.KNeighborsClassifier`

The results should be the same.  
Check if this is true.

In [None]:
### your answer here

###### 1(b)
Let `y_new` be the prediction of `X_sample` in the previous question. 
Plot the points (rows) in `X` with `c=y` .  
Plot the points (rows) in `X_sample` with `c=y_new` and `alpha=0.1` .

In [None]:
### your answer here

###### 1(c)
Let  
```python
model = KNeighborsClassifier()
model.fit(X, y)
```  
and let `k_nearest_neighbors` be one of the output of your function.  
The results of `k_nearest_neighbors(X_sample)` should be the same as `model.kneighbors(X_sample, return_distance=False)` .  
(The corresponding rows contains the same collection of elements, but might be in different order.)  
Check if this is true.

In [None]:
### your answer here

##### Exercise 2
Let  
```python
m,n = 8,8
frames = (m-2) * (n-2)

o = np.array([[1,1,1],
              [1,0,1],
              [1,1,1]])
x = np.array([[1,0,1],
              [0,1,0],
              [1,0,1]])
oo = np.zeros((frames, m, n))
xx = np.zeros((frames, m, n))
count =  0
for i in range(m-2):
    for j in range(n-2):
        oo[count, i:i+3, j:j+3] = o
        xx[count, i:i+3, j:j+3] = x
        count += 1


X = np.vstack([oo, xx]).reshape(2*frames, -1)
y = np.array([0]*frames + [1]*frames)
```

###### 2(a)
Run  
```python
plt.imshow(oo[i], cmap="Greys")
```
with different `i` .  
Guess what is the meaning of `oo` and `xx` .

In [None]:
### your answer here

###### 2(b)
Train a $k$-nearest neighbors classification model by `X` an `y` .  
Make a prediction `y_new` for the training data `X` .  
What is the outcome?  
Can you give a reason to this phenomenon?

In [None]:
### your answer here