## Boston Housing Assignment

In this assignment you'll be using linear regression to estimate the cost of house in boston, using a well known dataset.

Goals:
+  Measure the performance of the model I created using $R^{2}$ and MSE
> Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error
+  Implement a new model using L2 regularization
> Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso 
+  Get the best model you can by optimizing the regularization parameter.   

In [10]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

In [82]:
import math

In [11]:
bean = datasets.load_boston()
print bean.DESCR

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [12]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)
    

In [13]:
X_train, X_test, y_train, y_test = load_boston()

In [14]:
X_train.shape

(379, 13)

### Fitting a Linear Regression

It's as easy as instantiating a new regression object (line 1) and giving your regression object your training data
(line 2) by calling .fit(independent variables, dependent variable)



In [15]:

clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Making a Prediction
X_test is our holdout set of data.  We know the answer (y_test) but the computer does not.   

Using the command below, I create a tuple for each observation, where I'm combining the real value (y_test) with
the value our regressor predicts (clf.predict(X_test))

Use a similiar format to get your r2 and mse metrics working.  Using the [scikit learn api](http://scikit-learn.org/stable/modules/model_evaluation.html) if you need help!

In [16]:
zip (y_test, clf.predict(X_test))

[(23.0, 23.56321722671419),
 (17.100000000000001, 17.773377266990032),
 (11.699999999999999, 13.846027104007737),
 (16.199999999999999, 20.037557738625384),
 (10.800000000000001, 11.609878291184199),
 (23.199999999999999, 17.460049587041055),
 (8.8000000000000007, 6.9155451214172849),
 (13.1, 16.961099790171339),
 (18.600000000000001, 17.047267364618232),
 (16.5, 23.179522640651598),
 (22.0, 27.8978596448794),
 (24.800000000000001, 26.105101779814458),
 (15.0, 26.44494015216787),
 (28.699999999999999, 31.558406887049305),
 (31.600000000000001, 32.842620060030711),
 (18.399999999999999, 18.721656635439103),
 (13.5, 12.222423535291682),
 (16.5, 12.189678952892461),
 (13.6, 12.36500721460234),
 (29.0, 32.795663571268904),
 (26.600000000000001, 28.754019014819633),
 (20.600000000000001, 21.924315459044578),
 (20.699999999999999, 26.171914767654332),
 (25.0, 24.892082539194512),
 (20.199999999999999, 15.653267972240512),
 (24.100000000000001, 29.293807794788961),
 (19.5, 18.530497254636384)

## sklearn r2

In [20]:
from sklearn.metrics import r2_score

sklearn.metrics.r2_score(y_true, y_pred, sample_weight=None, multioutput=None)

In [22]:
r2_score(y_test, clf.predict(X_test))

0.64228572140532181

## sklearn mse

In [23]:
from sklearn.metrics import mean_squared_error

sklearn.metrics.mean_squared_error(y_true, y_pred, sample_weight=None, multioutput='uniform_average')

In [28]:
mean_squared_error(y_test, clf.predict(X_test))

22.685165230187231

## sklearn.linear_model.Ridge

In [25]:
from sklearn.linear_model import Ridge

sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, solver='auto', random_state=None)

In [26]:
clf2 = Ridge(alpha=1.0)

In [27]:
clf2.fit(X_train, y_train)

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [29]:
r2_score(y_test, clf2.predict(X_test))

0.64193827290356653

In [30]:
mean_squared_error(y_test, clf2.predict(X_test))

22.70719937068133

In [41]:
clf3 = Ridge(alpha=0.1)

In [42]:
clf3.fit(X_train, y_train)

Ridge(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [43]:
r2_score(y_test, clf3.predict(X_test))

0.64225033839320567

In [44]:
mean_squared_error(y_test, clf3.predict(X_test))

22.687409114550334

In [45]:
clf4 = Ridge(alpha=0.01)

In [46]:
clf4.fit(X_train, y_train)

Ridge(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [47]:
r2_score(y_test, clf4.predict(X_test))

0.64228217666224663

In [48]:
mean_squared_error(y_test, clf4.predict(X_test))

22.685390027147179

## sklearn.linear_model.Lasso

In [50]:
from sklearn.linear_model import Lasso

sklearn.linear_model.Lasso(alpha=1.0, fit_intercept=True, normalize=False, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')

In [54]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)

In [57]:
X_train, X_test, y_train, y_test = load_boston()

In [58]:
clf5 = Ridge(alpha=1)
clf5.fit(X_train, y_train)

Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [59]:
r2_score(y_test, clf5.predict(X_test))

0.70996388724631809

In [60]:
mean_squared_error(y_test, clf5.predict(X_test))

21.239284655976533

In [62]:
clf6 = Ridge(alpha=0.1)
clf6.fit(X_train, y_train)

Ridge(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [63]:
r2_score(y_test, clf6.predict(X_test))

0.70887765253623169

In [64]:
mean_squared_error(y_test, clf6.predict(X_test))

21.318829399531694

In [71]:
print("coef_: ")
print(clf6.coef_)
print("intercept_: ")
print(clf6.intercept_)

coef_: 
[-0.91412395  0.98173632  0.04337537  0.83431446 -2.25460135  2.32015627
  0.485803   -3.27257904  2.59650328 -1.81640859 -2.24272849  0.68407896
 -4.55444311]
intercept_: 
22.5954473113


In [74]:
print("coef_: ")
print(clf.coef_)
print("intercept_: ")
print(clf.intercept_)

coef_: 
[-0.76050294  0.88068976 -0.02333221  0.86104656 -2.15323341  3.34866137
 -0.26413699 -3.09266854  2.19287107 -1.50248465 -2.14235843  1.00387995
 -3.19356979]
intercept_: 
22.7344483166


## Optimizing with Lasso

### alpha 5

In [75]:
X_train, X_test, y_train, y_test = load_boston()

In [76]:
moo = Lasso(alpha=5)

In [78]:
moo.fit(X_train, y_train)

Lasso(alpha=5, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [79]:
r2_score(y_test, moo.predict(X_test))

0.23570191052586209

In [80]:
mean_squared_error(y_test, moo.predict(X_test))

73.174472463827456

In [83]:
math.sqrt(mean_squared_error(y_test, moo.predict(X_test)))

8.55420788055957

### alpha 10

In [84]:
X_train, X_test, y_train, y_test = load_boston()

In [85]:
moo2 = Lasso(alpha=10)

In [86]:
moo2.fit(X_train, y_train)

Lasso(alpha=10, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [87]:
r2_score(y_test, moo2.predict(X_test))

-0.0025978524853031981

In [88]:
mean_squared_error(y_test, moo2.predict(X_test))

76.916304857138655

In [89]:
math.sqrt(mean_squared_error(y_test, moo2.predict(X_test)))

8.770194117414885

### alpha 0.5

In [96]:
X_train, X_test, y_train, y_test = load_boston()

In [92]:
moo3 = Lasso(alpha=0.5)

In [93]:
moo3.fit(X_train, y_train)

Lasso(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [94]:
r2_score(y_test, moo3.predict(X_test))

0.6395876452922703

In [97]:
mean_squared_error(y_test, moo3.predict(X_test))

25.574328013763054

In [98]:
math.sqrt(mean_squared_error(y_test, moo3.predict(X_test)))

5.0571066840401

### alpha 0.1

In [99]:
X_train, X_test, y_train, y_test = load_boston()


In [100]:
moo4 = Lasso(alpha=0.5)

In [101]:
moo4.fit(X_train, y_train)

Lasso(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [102]:
r2_score(y_test, moo4.predict(X_test))

0.61148091092667167

In [103]:
mean_squared_error(y_test, moo4.predict(X_test))

33.607495220636601

In [104]:
math.sqrt(mean_squared_error(y_test, moo4.predict(X_test)))

5.797197186627051

### Best Output 

Out of alphas 5, 10, 0.5, and 0.1, 