# Coding activity 1: Nearest neighbor smoothers

## Nearest neighbor smoothers

 Write a nearest neighbor smoother.

In [2]:
import numpy as np

In [9]:
def nns(xis, yis, x):
    """Return yis[i], where xis[i] is the entry of xis closest to x."""
    
    i = np.argmin(np.abs(xis - x))
    
    return yis[i]

# (Check) Should print 6 666 666 6666.
xis = np.array([0, 1, 5, 6.5])
yis = np.array([6, 66, 666, 6666])
print(nns(xis, yis, -1), nns(xis, yis, 4), nns(xis, yis, 5.1), nns(xis, yis, 100))

6 666 666 6666


In [15]:
x = [1, 2, 3, 4, 5]
x[[2, 4]]

TypeError: list indices must be integers or slices, not list

Write a "vectorized" nearest neighbor smoother.

Your solution here should be independent of your solution to the previous exercise.
The naming clash is intentional.

In [13]:
def nns(xis, yis, xs):
    """Same as nns, but returning a list of ys given a list of xs."""
    
    i = [np.argmin(np.abs(xis - x)) for x in xs]
    ys = yis[i]
    
    return ys

# (Check) Should print [   6  666  666 6666].
xis = np.array([0, 1, 5, 6.5])
yis = np.array([6, 66, 666, 6666])
print(nns(xis, yis, [-1, 4, 5.1, 100]))

[   6  666  666 6666]


Write a "higher order" smoother function, i.e., a function that takes training data as input and return a function to evaluate on test data. Are there advantages to such an implementation over that of the previous exercise?

In [16]:
def nns_factory(xis, yis):
    def nns(xs):
        
        i = [np.argmin(np.abs(xis - x)) for x in xs]
        ys = yis[i]
    
        return ys
    return nns

# (Check) Should print [   6  666  666 6666].
xis = np.array([0, 1, 5, 6.5])
yis = np.array([6, 66, 666, 6666])
nns = nns_factory(xis, yis)
print(nns([-1, 4, 5.1, 100]))

[   6  666  666 6666]


If you don't consider the "higher order function" pattern from the previous exercise to be particularly pythonic, you aren't alone. Let's write a class offering the same functionality, in the <code>sklearn</code> style.

In [29]:
from sklearn.base import RegressorMixin
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted

class NNS(RegressorMixin):
    
    def fit(self, X, y):
        
        # Check that X and y have compatible shapes.
        X, y = check_X_y(X, y)

        # Store the training data on the instance. (Why?)
        self.X_ = X
        self.y_ = y
        
        # Return the instance. (Why?)
        return self
    
    def predict(self, X):
        
        # Check is fit had been called.
        check_is_fitted(self, ['X_', 'y_'])
        
        # Validate input type.
        X = check_array(X).reshape(1, -1)
       
        i = np.argmin(np.abs(self.X_ - X), axis=0)
        y = self.y_[i]
        
        return y
    
# (Check) Should print [6, 666, 666, 6666].
X = np.array([0, 1, 5, 6.5]).reshape(-1, 1)
y = np.array([6, 66, 666, 6666])
X_test = np.array([-1, 4, 5.1, 100, 0, 30, 2]).reshape(-1, 1)
S = NNS()
print(S
      .fit(X, y)
      .predict(X_test))

print(X - X_test.reshape(1, -1))

[   6  666  666 6666    6 6666   66]
[[ 1.00e+00 -4.00e+00 -5.10e+00 -1.00e+02  0.00e+00 -3.00e+01 -2.00e+00]
 [ 2.00e+00 -3.00e+00 -4.10e+00 -9.90e+01  1.00e+00 -2.90e+01 -1.00e+00]
 [ 6.00e+00  1.00e+00 -1.00e-01 -9.50e+01  5.00e+00 -2.50e+01  3.00e+00]
 [ 7.50e+00  2.50e+00  1.40e+00 -9.35e+01  6.50e+00 -2.35e+01  4.50e+00]]


- What happens if you remove the `.reshape(-1, 1)` method call in the definition of `X` and run the above cell again?

- If you inspect the methods available on `S` by typing `S.<tab>`, you'll notice a `score` method.
  - Where did it come from? What does it do? Find out by typing `S.score?<ctrl+enter>` or, on a mac, `S.score?<command+enter>`. (`<command+enter>` is a jupyter keyboard shortcut for running the current cell.)
  - Try it out! You'll need to provide `y`-values to go with the `x`-values in `X_test`.
  - Look at the source code of `RegressorMixin`. (Where is it?) Notice that the implementation of the `score` method invokes `predict`, which we have thoughfully provided. According to the <a href="https://scikit-learn.org/stable/glossary.html#term-regressors"><code>sklearn</code> docs</a>, a *regressor* is a class that implements `fit`, `predict`, and `score`. (Duck typing!)