# Coding activity 1: Nearest neighbor smoothers

## Nearest neighbor smoothers

 Write a nearest neighbor smoother.

In [2]:
import numpy as np

In [4]:
def nns(xis, yis, x):
    """Return yis[i], where xis[i] is the entry of xis closest to x."""
    
    i = np.argmin(np.abs(xis - x))
    
    return yis[i]

# (Check) Should print 6 666 666 6666.
xis = np.array([0, 1, 5, 6.5])
yis = np.array([6, 66, 666, 6666])
print(nns(xis, yis, -1), nns(xis, yis, 4), nns(xis, yis, 5.1), nns(xis, yis, 100))

6 666 666 6666


Write a "vectorized" nearest neighbor smoother.

Your solution here should be independent of your solution to the previous exercise.
The naming clash is intentional.

In [47]:
def nns(xis, yis, xs):
    """Same as nns, but returning a list of ys given a list of xs."""
    
    xis_ = np.reshape(xis, (-1, 1))
    xs_ = np.reshape(xs, (1, -1))
    i = np.abs(xis_ - xs_).argmin(axis=0)
    ys = yis[i]

    return ys

# (Check) Should print [   6  666  666 6666].
xis = np.array([0, 1, 5, 6.5])
yis = np.array([6, 66, 666, 6666])
print(nns(xis, yis, [-1, 4, 5.1, 100]))

[   6  666  666 6666]


Write a "higher order" smoother function, i.e., a function that takes training data as input and return a function to evaluate on test data. Are there advantages to such an implementation over that of the previous exercise?

In [1]:
def nns_factory(xis, yis):
    def nns(xs):
        
        xis_ = np.reshape(xis, (-1, 1))
        xs_ = np.reshape(xs, (1, -1))
        i = np.abs(xis_ - xs_).argmin(axis=0)
        ys = yis[i]
        
        return ys
    return nns

# (Check) Should print [   6  666  666 6666].
xis = np.array([0, 1, 5, 6.5])
yis = np.array([6, 66, 666, 6666])
nns = nns_factory(xis, yis)
print(nns([-1, 4, 5.1, 100]))

NameError: name 'np' is not defined

If you don't consider the "higher order function" pattern from the previous exercise to be particularly pythonic, you aren't alone. Let's write a class offering the same functionality, in the <code>sklearn</code> style.

In [3]:
from sklearn.base import RegressorMixin
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted

class NNS(RegressorMixin):
    
    def fit(self, X, y):
        
        # Check that X and y have compatible shapes.
        X, y = check_X_y(X, y)

        # Store the training data on the instance. (Why?)
        self.X_ = X
        self.y_ = y
        
        ### Your code goes here.

        # Return the instance. (Why?)
        return self
    
    def predict(self, X):
        
        # Check is fit had been called.
        check_is_fitted(self, ['X_', 'y_'])
        
        # Validate input type.
        X = check_array(X).reshape(1, -1)
        i = np.abs(self.X_ - X).argmin(axis=0)
        y = self.y_[i]
        
        # Your code goes here.
        
        return y
    
# (Check) Should print [6, 666, 666, 6666].
X = np.array([0, 1, 5, 6.5]).reshape(-1, 1)
y = np.array([6, 66, 666, 6666])
X_test = np.array([-1, 4, 5.1, 100, 0, 30, 2]).reshape(-1, 1)
S = NNS()
print(S
      .fit(X, y)
      .predict(X_test))

[   6  666  666 6666    6 6666   66]


- What happens if you remove the `.reshape(-1, 1)` method call in the definition of `X` and run the above cell again?

- If you inspect the methods available on `S` by typing `S.<tab>`, you'll notice a `score` method.
  - Where did it come from? What does it do? Find out by typing `S.score?<ctrl+enter>` or, on a mac, `S.score?<command+enter>`. (`<command+enter>` is a jupyter keyboard shortcut for running the current cell.)
  - Try it out! You'll need to provide `y`-values to go with the `x`-values in `X_test`.
  - Look at the source code of `RegressorMixin`. (Where is it?) Notice that the implementation of the `score` method invokes `predict`, which we have thoughfully provided. According to the <a href="https://scikit-learn.org/stable/glossary.html#term-regressors"><code>sklearn</code> docs</a>, a *regressor* is a class that implements `fit`, `predict`, and `score`. (Duck typing!)

In [6]:
S.score(X, y)

1.0

In [26]:
np.argmin(np.abs(xis - x), axis=0)

array([0, 2, 2])

array([[ 1. , -4. , -5.1],
       [ 2. , -3. , -4.1],
       [ 6. ,  1. , -0.1],
       [ 7.5,  2.5,  1.4]])

In [31]:
a = [0, 1, 5, 6.5]
b = [-1, 4, 5.1]
xis = np.array([a, a, a]).T
x = np.array([b, b, b, b])

In [33]:
xis - x

array([[ 1. , -4. , -5.1],
       [ 2. , -3. , -4.1],
       [ 6. ,  1. , -0.1],
       [ 7.5,  2.5,  1.4]])

In [34]:
np.reshape(a, (4, 1))

array([[0. ],
       [1. ],
       [5. ],
       [6.5]])

array([0. , 1. , 5. , 6.5])

In [49]:
X = np.array([0, 1, 5, 6.5]).reshape((1, -1))
X

array([[0. , 1. , 5. , 6.5]])