## Boston Housing Assignment

In this assignment you'll be using linear regression to estimate the cost of house in boston, using a well known dataset.

Goals:
+  Measure the performance of the model I created using $R^{2}$ and MSE
> Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error
+  Implement a new model using L2 regularization
> Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso 
+  Get the best model you can by optimizing the regularization parameter.   

In [524]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

In [525]:
bean = datasets.load_boston()
print bean.DESCR

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [526]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)
    

In [527]:
X_train, X_test, y_train, y_test = load_boston()

In [528]:
X_train.shape

(379L, 13L)

### Fitting a Linear Regression

It's as easy as instantiating a new regression object (line 1) and giving your regression object your training data
(line 2) by calling .fit(independent variables, dependent variable)



In [529]:

clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Making a Prediction
X_test is our holdout set of data.  We know the answer (y_test) but the computer does not.   

Using the command below, I create a tuple for each observation, where I'm combining the real value (y_test) with
the value our regressor predicts (clf.predict(X_test))

Use a similiar format to get your r2 and mse metrics working.  Using the [scikit learn api](http://scikit-learn.org/stable/modules/model_evaluation.html) if you need help!

In [530]:
y_pred = clf.predict(X_test)
zip (y_test, y_pred)

[(50.0, 42.984267242422597),
 (23.199999999999999, 16.679181699169),
 (28.600000000000001, 29.496092384816201),
 (18.600000000000001, 19.740977487583393),
 (29.800000000000001, 25.346931434226448),
 (17.5, 17.512364053156126),
 (24.5, 27.554283831503234),
 (36.5, 35.797500531633624),
 (22.100000000000001, 26.222789816059951),
 (20.5, 24.168469376208837),
 (31.600000000000001, 32.838694063153568),
 (28.100000000000001, 25.288843796600794),
 (13.4, 15.2272116103955),
 (7.2000000000000002, 10.318153562220266),
 (11.699999999999999, 16.015495761629268),
 (13.300000000000001, 19.906226592310613),
 (19.100000000000001, 24.351927859942261),
 (36.200000000000003, 28.672969722036132),
 (19.600000000000001, 18.458150974880638),
 (21.600000000000001, 25.231677796343543),
 (50.0, 33.686275052498516),
 (22.199999999999999, 22.53556413340144),
 (34.899999999999999, 29.797136481987714),
 (17.5, 17.32754457330099),
 (16.5, 22.887925770618384),
 (16.100000000000001, 21.247812752564585),
 (18.8000000000

In [531]:
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_pred)

31.025636554713657

In [532]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.63035073019791865

In [533]:
#r2_score(y_test, y_pred, multioutput='variance_weighted')


In [534]:
#r2_score(y_test, y_pred, multioutput='uniform_average')


In [535]:
#r2_score(y_test, y_pred, multioutput='raw_values')

Implement LASSO regression model

In [536]:
from sklearn import linear_model
clf = linear_model.Lasso(alpha = 0.09765)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
zip (y_test, y_pred)

mean_squared_error(y_test, y_pred)


32.961972255796354

In [537]:
#print(clf.coef_)

In [538]:
#print(clf.intercept_)

In [539]:
r2_score(y_test, y_pred)

0.60728061278921786

Implement Ridge Regression

In [540]:
from sklearn.linear_model import Ridge
import numpy as np
clf = Ridge(alpha=.9765)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
zip (y_test, y_pred)

mean_squared_error(y_test, y_pred)

31.080946154741319

In [541]:
#print(clf.coef_)

In [542]:
#print(clf.intercept_)

In [543]:
r2_score(y_test, y_pred)

0.62969175408868661