## Side notes 
_(code snippets, summaries, resources, etc.)_
- First own use of `pylab` graph plotting

# Supervised Regression Learning
__definition: Regression model__, better named _numerical model_, uses data to build a model that predicts a numerical output based on a set of numerical inputs. Types of regressions include:
1. Parametric approach (polynomial regressions)
    - does not store original data
    - need to do rerun to include more data
    - training is slower, querying is faster
2. Instance-based approach (data-centric regressions)
    - A. K nearest neighbor (KNN)
    - B. Kernel regression (KNN with points weighted according to distance)
    - stores data within model, does not need to rerun for new data
    - Training is faster, querying is slower
    - When the is no initial guess of the underlying relationship
        - Termed _unbiased_, whereas parametric models are _biased_
        - Makes sense to use an available guess, i.e. not a bad thing to be _biased_ in this sense
    - Can fit any shape


![parametric regression](more_regressions_images/parametric_regression.png)

![instance-based regression](more_regressions_images/instance-based_regression.png)

![parametric or non quiz](more_regressions_images/parametric_or_non.png)



In [None]:
#
#
# Regression and Classification programming exercises
#
#


#
#   In this exercise we will be taking a small data set and 
#   computing a linear function that fits it, by hand.
#

#   the data set

import numpy as np

sleep = [5,6,7,8,10]
scores = [65,51,75,75,86]


def compute_regression(sleep,scores):

    #	First, compute the average amount of each list

    avg_sleep = sum(sleep) / (1.0*len(sleep))
    avg_scores = sum(scores) / (1.0*len(scores))

    #	Then normalize the lists by subtracting the mean 
    #	value from each entry

    normalized_sleep = [x - avg_sleep for x in sleep]
    normalized_scores = [y - avg_scores for y in scores]

    #	Compute the slope of the line by taking the sum 
    #	over each student of the product of their normalized
    #	sleep times and their normalized test score.
    #	Then divide this by the sum of squares of 
    #	the normalized sleep times.

    sum_of_products = sum( [x * y for x, y in 
                            zip(normalized_sleep, normalized_scores)])
    sum_of_squares = sum( [x * x for x in normalized_sleep])
    
    slope = sum_of_products / (1.0*sum_of_squares)

    #	Finally, We have a linear function of the form
    #	y - avg_y = slope * ( x - avg_x )
    #	Rewrite this function in the form
    #	y = m * x + b
    #	Then return the values m, b
    
    #   => y = slope*x - slope*avg_x + avg_y
    #   Thus,   b = - slope*avg_x + avg_y
    #        => b = avg_y - slope*avg_x
    
    b = avg_scores - slope*avg_sleep 
    m = slope
    
    return m,b

if __name__=="__main__":
    m,b = compute_regression(sleep,scores)
    print "Your linear model is y={}*x+{}".format(m,b)

## Numpy Polyfit
- This model, however, would breakdown with high inputs of sleep durations.
- One simple solution is using a polynomial regression, fitting a model of the form:
```python
y = p[0] * x**2 + p[1] * x + p[2]
```
- Note that a polynomial regression is a form of _linear regression_
    - because the space of polynomials is linear in it coefficients

Tool that can fit these models:
1. [numpy.`polyfit()`](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.polyfit.html)
    - Takes in a list of regression values x and y, and a degree
    - Outputs a polynomial in the form of a list 
```python
        p = [p[0],p[1],...,p[degree]]
```
- [sklearn.preprocessing.`PolynomialFeatures()`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)
    - adds features to a dataset which are quadratic (or higher) combinations of the previous features.


In [None]:
#
#	Polynomial Regression
#
#	In this exercise we will examine more complex models 
#   of test grades as a function of sleep using numpy.polyfit 
#   to determine a good relationship and incorporating more data.
#
#
#   at the end, store the coefficients of the polynomial you 
#   found in coeffs
#

import numpy as np

sleep = [5,6,7,8,10,12,16]
scores = [65,51,75,75,86,80,0]

coeffs = np.polyfit(sleep, scores, deg=2)


In [None]:
import pylab as pl

# Evaluate the polynomial at (other) points
u = np.linspace(-10., 30., 50)

v = np.polyval(coeffs, u)

pl.figure()  # ?? doesn't seem to change anything
pl.plot(sleep, scores, ".")
pl.plot(u, v, "--r")
pl.grid(True)
pl.show()