# Non Parametric Regression

![](./helper/global_vs_local.PNG)

In linear regression, we fit our model to the whole of the data. To capture all the variance in the data, sometimes we would choose a high variance model (shown in left). Even though the points in the middle of the plot may be captured better with a constant line, we are forced to use polynomial as we are looking at the data at a global level. 

The plot to right fits the data at local level, and rather than having an equation describing the plot, this can be non parametric. 

But to get to that, we might want to divide our data into pockets, but we dont want to explicitly deal with it. Rather, we just look at the nieghbourhood of the query point and take decision based on the logic given below.

## 1 Nearest Neightbour 

<img src="helper/K1.PNG" alt="Drawing" style="width: 600px;"/>

<br>
<br>
<br>

For any query point $ x_{q} $ 
Find point closest in distance 

 
$$ X_{NN} = \underset{i}{\min}   distance(x_{i}, x_{q} )  $$


$$ \hat y_{q} = y_{NN} $$ (y corresponding to x closest to q) 

## 1 NN in 2 dimensional data  

<br>

<img src="helper/K2.PNG" alt="Drawing" style="width: 300px;"/>

Voronoi Tesselation : Divide space into N regions each containting single datapoint. Where any predicted example lying in the space, distance is calculated to nearest all data points. The prediction is given as per min distance.

For 2D data (and similar plot can be imagined in higher dimensions) we can visualize our plane segmented into N regions based on N data points. 
We dont actually need to do this division. Rather for every query point, we just calculate the point with minimum distance.

### Metrics 

We are to compute distance between datapoints, we used euclidean distance in the 1D, 2D example above. But we have many other choices. 

<br>

Eg:
* Scaled Eucledian Distance
    * For our housing price prediction, e may want to give feature importance, so we can weigh our distances for each feature differently 
    * $$ distance(x_{i},x_{q}) = \sqrt{a_{1}(x_{j}[1]-x_{q})^2 + a_{j}(x_{j}[i]-x_{q})^2 + a_{d}(x_{j}[d]-x_{q})^2}$$
     where d = no. of features 
* Manhattan Distance
    * Imagine you are driving on the streets of New York. So you find distance along x, then y; this is different than euclidean which will just take the diagonal. 
    * Other choices: Mahanobis, Hamming, cosine simiarity etc 
    
#### Predictive Surfaces change with the metrics chosen 
As shown below 


<img src="helper/K3.JPG" alt="Drawing" style="width: 400px;"/>

### 1 NN Pseudo Code 

dist1NN = $\infty$

for i = 0, 1,... N: <br>
    $ \delta = distance(x_{i},x_{q}) $ <br>
    $\qquad$ if  $ \delta  $ < dist1NN:<br>
    $\qquad$$\qquad$ dist1NN = $\delta$ <br>
    $\qquad$$\qquad$ $X_{NN} = x_{i}$
   
$ Y_{NN} = y_{i} $

### 1 NN Disadvantage

1 NN **doesnt interpolate well** as shown below 

<img src="helper/K4.PNG" alt="Drawing" style="width: 800px;"/>


**Sensitive to noise** in data. (high variance model)

## K Nearest Neighbours

To reduce the effect of noise, we can look at k neighbours rather than only one. (Real estate agents may look for multiple similar houses and give you average house price raher than looking for just one closest house)

### K NN Pseudo Code 

\# initialize with first k entries in dataset (sorted in ascending distance to query point : $X_{NN_k} $ is farthest)<br><br>
$ X_{NN_1}, X_{NN_2},.. X_{NN_k} = x_{1},  x_{2},...  x_{k} $ <br>
$ DistNN_1, DistNN_2,.. DistNN_k = sorted list(\delta(1,j), \delta(2,j) ...\delta(k,j)) $  
<br>
<br>

for i = k+1,... N: <br>
$\qquad$ 
    $ \delta = distance(x_{i},x_{q}) $ <br>
    $\qquad$$\qquad$ if $\delta < DistNN_k: $ <br>
    $\qquad$$\qquad$$\qquad$ find j s.t. $ DistNN_j$ 