## Boston Housing Assignment
### Marcus Millender CSC 570
#### mmill41

In this assignment you'll be using linear regression to estimate the cost of house in boston, using a well known dataset.

Goals:
+  Measure the performance of the model I created using $R^{2}$ and MSE
> Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error
+  Implement a new model using L2 regularization
> Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso 
+  Get the best model you can by optimizing the regularization parameter.   

In [6]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

In [8]:
bean = datasets.load_boston()
print(bean.DESCR)

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [9]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)
    

In [10]:
X_train, X_test, y_train, y_test = load_boston()

In [11]:
X_train.shape

(379, 13)

### Fitting a Linear Regression

It's as easy as instantiating a new regression object (line 1) and giving your regression object your training data
(line 2) by calling .fit(independent variables, dependent variable)



In [12]:

clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Making a Prediction
X_test is our holdout set of data.  We know the answer (y_test) but the computer does not.   

Using the command below, I create a tuple for each observation, where I'm combining the real value (y_test) with
the value our regressor predicts (clf.predict(X_test))

Use a similiar format to get your r2 and mse metrics working.  Using the [scikit learn api](http://scikit-learn.org/stable/modules/model_evaluation.html) if you need help!

In [13]:
list(zip (y_test, clf.predict(X_test)))

[(33.0, 23.386754584408742),
 (43.5, 38.816841847195789),
 (8.6999999999999993, 8.7711867931775878),
 (33.299999999999997, 36.503318172347754),
 (19.399999999999999, 19.305759898657662),
 (29.100000000000001, 30.142894312553416),
 (19.5, 18.136046941719123),
 (17.5, 16.695240627945374),
 (21.600000000000001, 25.094682395300083),
 (28.699999999999999, 25.194360959081852),
 (8.8000000000000007, 2.9653947381990378),
 (22.0, 27.251699454619899),
 (50.0, 33.359221546580656),
 (16.100000000000001, 22.410830345016628),
 (35.100000000000001, 35.101349761625897),
 (18.5, 18.902500063882655),
 (14.4, 3.0597300466773945),
 (20.300000000000001, 23.463547031640879),
 (16.199999999999999, 20.487485395560526),
 (13.1, 13.627438833354191),
 (25.0, 29.275733166026587),
 (7.0, -4.4141301653489045),
 (48.799999999999997, 40.111206216538697),
 (19.5, 16.619817576657198),
 (20.5, 24.286051892425569),
 (18.600000000000001, 20.186589134134302),
 (7.5, 13.268512107926334),
 (20.0, 23.363712330820299),
 (25.30

# HOMEWORK MODULE 2 - W5
+ Marcus Millender
+ mmill41

### Measure the performance of the model I created using  $R^{2}$ and MSE
+ Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error

In [18]:
r2Score=r2_score(y_test, clf.predict(X_test))
mse=mean_squared_error(y_test, clf.predict(X_test))
print("LinearRegression R2 Score is: ", r2Score)
print("LinearRegression MSE is: ", mse)

LinearRegression R2 Score is:  0.739401396721
LinearRegression MSE is:  23.3506100991


### Implement a new model using L2 regularization
+ Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso

In [17]:
from sklearn.linear_model import Lasso

In [22]:
#alpha 0.1
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
r2Score=r2_score(y_test, lasso.predict(X_test))
mse=mean_squared_error(y_test, lasso.predict(X_test))
print("Lasso R2 Score is: ", r2Score)
print("Lasso MSE is: ", mse)

Lasso R2 Score is:  0.737854497165
Lasso MSE is:  23.4892180883


### Get the best model you can by optimizing the regularization parameter

In [27]:
#alpha 0.05
lasso = Lasso(alpha=0.05)
lasso.fit(X_train, y_train)
r2Score=r2_score(y_test, lasso.predict(X_test))
mse=mean_squared_error(y_test, lasso.predict(X_test))
print("Lasso R2 Score is: ", r2Score)
print("Lasso MSE is: ", mse)

Lasso R2 Score is:  0.74201029497
Lasso MSE is:  23.1168430526


In [28]:
#alpha 0.02
lasso = Lasso(alpha=0.02)
lasso.fit(X_train, y_train)
r2Score=r2_score(y_test, lasso.predict(X_test))
mse=mean_squared_error(y_test, lasso.predict(X_test))
print("Lasso R2 Score is: ", r2Score)
print("Lasso MSE is: ", mse)

Lasso R2 Score is:  0.740927567043
Lasso MSE is:  23.2138595269


In [29]:
#alpha 0.01
lasso = Lasso(alpha=0.01)
lasso.fit(X_train, y_train)
r2Score=r2_score(y_test, lasso.predict(X_test))
mse=mean_squared_error(y_test, lasso.predict(X_test))
print("Lasso R2 Score is: ", r2Score)
print("Lasso MSE is: ", mse)

Lasso R2 Score is:  0.740245694449
Lasso MSE is:  23.2749578631


## Conclusion
+ There is no significant deviation from the original R2Score/MSE.