# Nonparametric regression

In [1]:
import numpy as np

## Nearest neighbor smoothers

 Write a nearest neighbor smoother.

In [3]:
def nns(xis, yis, x):
    """Return yis[i], where xis[i] is the entry of xis closest to x."""
    
    i = np.argmin(np.abs(x - xis))
    
    return yis[i]

# (Check) Should print "6 666 666 6666".
print(nns(xis, yis, -1), nns(xis, yis, 4), nns(xis, yis, 5.1), nns(xis, yis, 100))

6 666 666 6666


Write a "vectorized" nearest neighbor smoother.

Your solution here should be independent of your solution to the previous exercise.
The naming clash is intentional.

In [6]:
def nns(xis, yis, xs):
    """Same as nns, but returning a list of ys given a list of xs."""
    ys = yis[[np.argmin(np.abs(x - xis)) for x in xs]]
    return ys

# (Check) Should print [6, 666, 666, 6666].
xis = np.array([0, 1, 5, 6.5])
yis = np.array([6, 66, 666, 6666])
print(nns(xis, yis, [-1, 4, 5.1, 100]))

[   6  666  666 6666]


Write a "higher order" smoother function, i.e., a function that takes training data as input and return a function to evaluate on test data. Are there advantages to such an implementation over that of the previous exercise?

In [9]:
def nns_factory(xis, yis):
    def nns(xs):
        ys = yis[[np.argmin(np.abs(x - xis)) for x in xs]]
        return ys
    return nns

# (Check) Should print [   6  666  666 6666].
xis = np.array([0, 1, 5, 6.5])
yis = np.array([6, 66, 666, 6666])
nns = nns_factory(xis, yis)
print(nns([-1, 4, 5.1, 100]))

[   6  666  666 6666]


If you don't consider the "higher order function" pattern from the previous exercise to be particularly pythonic, you aren't alone. Let's write a class offering the same functionality, in the <code>sklearn</code> style.

In [56]:
from sklearn.base import RegressorMixin
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted

class NNS(RegressorMixin):
    
    def fit(self, X, y):
        # Check that X and y have compatible shapes.
        X, y = check_X_y(X, y)

        # Store the training data on the instance. (Why?)
        self.X_ = X
        self.y_ = y
        
        return self
    
    def predict(self, X):
        # Check is fit had been called.
        check_is_fitted(self, ['X_', 'y_'])
        
        # Validate input type.
        X = check_array(X)
        
        y = self.y_[[np.argmin(np.abs(x - self.X_)) for x in X]]
        
        return y
    
# (Check) Should print [6, 666, 666, 6666].
X = np.array([0, 1, 5, 6.5]).reshape((-1, 1))
y = np.array([6, 66, 666, 6666])
X_test = np.array([-1, 4, 5.1, 100]).reshape(-1, 1)
S = NNS().fit(X, y)
print(S.predict(X_test))

[   6  666  666 6666]


## Regression

dataset $D$ $\longrightarrow$ function $f$