## Boston Housing Assignment

In this assignment you'll be using linear regression to estimate the cost of house in boston, using a well known dataset.

Goals:
+  Measure the performance of the model I created using $R^{2}$ and MSE
> Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error
+  Implement a new model using L2 regularization
> Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso 
+  Get the best model you can by optimizing the regularization parameter.   

In [993]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

In [994]:
bean = datasets.load_boston()
print bean.DESCR

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [995]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)
    

In [996]:
X_train, X_test, y_train, y_test = load_boston()

In [997]:
X_train.shape

(379L, 13L)

### Fitting a Linear Regression

It's as easy as instantiating a new regression object (line 1) and giving your regression object your training data
(line 2) by calling .fit(independent variables, dependent variable)



In [998]:

clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Making a Prediction
X_test is our holdout set of data.  We know the answer (y_test) but the computer does not.   

Using the command below, I create a tuple for each observation, where I'm combining the real value (y_test) with
the value our regressor predicts (clf.predict(X_test))

Use a similiar format to get your r2 and mse metrics working.  Using the [scikit learn api](http://scikit-learn.org/stable/modules/model_evaluation.html) if you need help!

In [999]:
y_pred = clf.predict(X_test)
zip (y_test, y_pred)

[(16.800000000000001, 20.919601109623148),
 (22.199999999999999, 24.360086400992916),
 (20.100000000000001, 15.260371116612678),
 (30.5, 30.369279756503339),
 (7.5, 13.353111124249729),
 (5.0, 6.648661165551907),
 (23.800000000000001, 25.330804633209585),
 (20.5, 20.132686530746504),
 (24.399999999999999, 24.131553755458754),
 (21.699999999999999, 22.933113223950176),
 (21.699999999999999, 20.518679797594025),
 (13.1, 14.495903358523931),
 (18.5, 19.825318615499093),
 (20.5, 24.310227211449455),
 (34.600000000000001, 34.776851045343648),
 (19.899999999999999, 18.216069150986911),
 (15.4, 17.845888343441608),
 (17.5, 16.900583106762983),
 (24.800000000000001, 26.287120751860233),
 (28.100000000000001, 25.048933440380946),
 (20.300000000000001, 19.572863933751218),
 (13.800000000000001, 6.6850762266073946),
 (14.1, 18.937563503528658),
 (23.399999999999999, 23.890521952730168),
 (12.800000000000001, 13.267175880457344),
 (27.899999999999999, 32.553778438494476),
 (23.899999999999999, 27.

Testing Accuracy of Linear Model

In [1000]:
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_pred)

11.937288667976835

In [1001]:
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

0.83318224193402157

Implement LASSO regression model

In [1002]:
from sklearn import linear_model
clf = linear_model.Lasso(alpha = 0.092)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
zip (y_test, y_pred)

mean_squared_error(y_test, y_pred)


11.372064519305649

In [1003]:
r2_score(y_test, y_pred)

0.8410809723667555

Implement Ridge Regression

In [1004]:
from sklearn.linear_model import Ridge
import numpy as np
clf = Ridge(alpha=.9965)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
zip (y_test, y_pred)

mean_squared_error(y_test, y_pred)

11.873534701895254

In [1005]:
r2_score(y_test, y_pred)

0.83407317236096801