# Course 2: Machine Learning: Regression
## Regression Week 2: Multiple Regression (gradient descent)

In the first notebook we explored multiple regression using graphlab create. Now we will use graphlab along with numpy to solve for the regression weights with gradient descent.

In this notebook we will cover estimating multiple regression weights via gradient descent. You will:
* Add a constant column of 1's to a graphlab SFrame to account for the intercept
* Convert an SFrame into a Numpy array
* Write a predict_output() function using Numpy
* Write a numpy function to compute the derivative of the regression weights with respect to a single feature
* Write gradient descent function to compute the regression weights given an initial weight vector, step size and tolerance.
* Use the gradient descent function to estimate regression weights for multiple features

# 1.  Fire up graphlab create

Make sure you have the latest version of graphlab (>= 1.7). Start GraphLab Create by importing graphlab, or if u just want to use sframe import sframe.

There is a nice graphlab into [here](http://www.analyticsvidhya.com/blog/2015/12/started-graphlab-python/)

In [1]:
#import graphlab as gl
import sframe as sf  #pip install sframe while notebook was running and it found sframe!!

# 2. Load in house sales data

Dataset is from house sales in King County, the region where the city of Seattle, WA is located.

**IMPORTANT**: use the following types for columns when importing the csv files. Otherwise, they may not be imported correctly: [str, str, float, float, float, float, int, str, int, int, int, int, int, int, int, int, str, float, float, float, float]. If your tool of choice requires a dictionary of types for importing csv files (e.g. Pandas), use:

In [2]:
dtype_dict = {'bathrooms':float, 'waterfront':int, 'sqft_above':int, 'sqft_living15':float, 'grade':int, 'yr_renovated':int, 'price':float, 'bedrooms':float, 'zipcode':str, 'long':float, 'sqft_lot15':float, 'sqft_living':float, 'floors':str, 'condition':int, 'lat':float, 'date':str, 'sqft_basement':int, 'yr_built':int, 'id':str, 'sqft_lot':int, 'view':int}

In [3]:
# sales = gl.SFrame('kc_house_data.gl/')

# This is how to do if not using GraphLab Create

sales = sf.SFrame.read_csv('kc_house_data.csv', column_type_hints = dtype_dict)

[INFO] sframe.cython.cy_server: SFrame v1.10 started. Logging /tmp/sframe_server_1469358822.log


If we want to do any "feature engineering" like creating new features or adjusting existing ones we should do this directly using the SFrames as seen in the other Week 2 notebook. For this notebook, however, we will work with the existing features.

# 3. Convert to Numpy Array

Although SFrames offer a number of benefits to users (especially when using Big Data and built-in graphlab functions) in order to understand the details of the implementation of algorithms it's important to work with *a library that allows for direct (and optimized) matrix operations*. **Numpy** is a Python solution to work with matrices (or any multi-dimensional "array").

Recall that the predicted value given the weights and the features is just the dot product between the feature and weight vector. Similarly, if we put all of the features row-by-row in a matrix then the predicted value for *all* the observations can be computed by **right multiplying** the "feature matrix" by the "weight vector". 

First we need to take the SFrame of our data and convert it into a 2D numpy array (also called a matrix). To do this we use graphlab's built in .to_dataframe() which converts the SFrame into a Pandas (another python library) dataframe. We can then use Panda's .as_matrix() to convert the dataframe into a numpy matrix.

In [4]:
import numpy as np # note this allows us to refer to numpy as np instead 

Now we will write a function that will accept an SFrame, a list of feature names (e.g. ['sqft_living', 'bedrooms']) and a target feature or response/outcome e.g. ('price') and will return two things:
* A numpy matrix whose columns are the desired features plus a constant column (this is how we create an 'intercept')
* A numpy array containing the values of the output/response


**3.** Next write a function that takes a data set, a list of features (e.g. [‘sqft_living’, ‘bedrooms’]), to be used as inputs, and a name of the output (e.g. ‘price’). This function should return a features_matrix (2D array) consisting of first a column of ones followed by columns containing the values of the input features in the data set in the same order as the input list. It should also return an output_array which is an array of the values of the output in the data set (e.g. ‘price’). e.g. if you’re using SFrames and numpy you can complete the following function:

**Please note you will need GraphLab Create version at least 1.7.1 in order for .to_numpy() to work!**

In [5]:
def get_numpy_data(data_sframe, features, output):
    #   this is how we call this function
    #   (example_features, example_output) = 
    #   get_numpy_data(sales, ['sqft_living'], 'price')
    #   sales is the SFrame, features_list = ['sqft_living', ....], response = 'price')
    
    data_sframe['constant'] = 1 # this is how you add a new column to an SFrame, tis case 'constant'
    
    # add the column 'constant' to the front of the features list 
    # so that we can extract it along with the others:
    features = ['constant'] + features # this is how you combine two lists
    # features = 'constant' + features # this wont work!! - string constant add to a list
    
    # print "print features", features, "\n"   prints nothing!!!
    
    # select the columns of data_SFrame given by the features list 
    # into the SFrame features_sframe (now including constant):
    features_sframe = data_sframe[features]
    # features_sframe = gl.SFrame.to_dataframe(data_sframe[features])  #if use gl.SFrame
    # features_sframe = sf.SFrame.to_dataframe(data_sframe[features])
    
    print "get_numpy_data(): print data_sframe = features_sframe(head 5)\n",features_sframe.head(5), "\n"  
    #prints the 2 columns constant and sqft_living
    
    # the following line will convert the features_SFrame into a numpy matrix:
    feature_matrix = features_sframe.to_numpy()
    len(feature_matrix)
    print "get_numpy_data(): print data_sframe converted to feature_matrix[0:5, :]\n",feature_matrix[0:5, :] , "\n" 
    
    # assign the column of data_sframe associated with the output to the SArray output_sarray
    output_sarray = data_sframe[output]
    
    # the following will convert the SArray into a numpy array by first converting it to a list
    output_array = output_sarray.to_numpy()
    
    return(feature_matrix, output_array)

For testing let's use the 'sqft_living' feature and a constant as our features and price as our output:

In [6]:
(example_features, example_output) = get_numpy_data(sales, ['sqft_living'], 'price') # the [] around 'sqft_living' makes it a list
print "First row of the feature matrix --> example_features[0, :]\n", example_features[0, :]  
# this accesses the first row of the data matrix the ':' indicates 'all columns', in tis case only 2
print "\nFirst item in array/list --> example_output[0]\n",example_output[0] # and the corresponding output/response series i.e price

get_numpy_data(): print data_sframe = features_sframe(head 5)
+----------+-------------+
| constant | sqft_living |
+----------+-------------+
|    1     |    1180.0   |
|    1     |    2570.0   |
|    1     |    770.0    |
|    1     |    1960.0   |
|    1     |    1680.0   |
+----------+-------------+
[5 rows x 2 columns]
 

get_numpy_data(): print data_sframe converted to feature_matrix[0:5, :]
[[  1.00000000e+00   1.18000000e+03]
 [  1.00000000e+00   2.57000000e+03]
 [  1.00000000e+00   7.70000000e+02]
 [  1.00000000e+00   1.96000000e+03]
 [  1.00000000e+00   1.68000000e+03]] 

First row of the feature matrix --> example_features[0, :]
[  1.00000000e+00   1.18000000e+03]

First item in array/list --> example_output[0]
221900.0


# 4. Predicting output given regression weights
If the features matrix (including a column of 1s for the constant) is stored as a 2D array (or matrix) and the regression weights are stored as a 1D array then the predicted output is just the dot product between the features matrix and the weights (with the weights on the right). Write a function ‘predict_output’ which accepts a 2D array ‘feature_matrix’ and a 1D array ‘weights’ and returns a 1D array ‘predictions’.

Suppose we had the weights [1.0, 1.0] and the features [1.0, 1180.0] and we wanted to compute the predicted output 1.0\*1.0 + 1.0\*1180.0 = 1181.0 this is the dot product between these two arrays. Note the weights are just the model coefficients, Bo, B1, B2 etc or w0,w1,w2 etc. If both features matrix and weights array are numpy arrays we can use np.dot() to compute this:

In [7]:
my_weights = np.array([1.0, 1.0]) # the example weights, just a simple list, 1D
my_features = example_features[0, ] # we'll use the first data point, 1st row 0, n all cols
print "Model Coefficients or weights", my_weights
print "features matrix", my_features
predicted_value = np.dot(my_features, my_weights)
print "Predicted Value", predicted_value

Model Coefficients or weights [ 1.  1.]
features matrix [  1.00000000e+00   1.18000000e+03]
Predicted Value 1181.0


np.dot() also works when dealing with a matrix and a vector. Recall that the predictions from all the observations is just the RIGHT multiplication of the "feature matrix" by the "weight vector" (as in weights on the right) -- dot product between the features *matrix* and the weights *vector*. With this in mind finish the following predict_output function to compute the predictions for an entire matrix of features given the matrix and the weights:

In [8]:
def predict_output(feature_matrix, weights):
    # assume feature_matrix is a numpy matrix containing the features as columns 
    # and weights is a corresponding numpy array (our list of model coefficients)
    # create the predictions vector by using np.dot()
    
    predictions = np.dot(feature_matrix, weights)
    
    return(predictions)

If you want to test your code run the following cell:

In [9]:
test_predictions = predict_output(example_features, my_weights)
print test_predictions[0] # should be 1181.0
print test_predictions[1] # should be 2571.0

1181.0
2571.0


# 5. Computing the Derivative
If we have the values of a single input feature in an array ‘feature’ and the prediction ‘errors’ (predictions - output) then the derivative of the regression cost function with respect to the weight of ‘feature’ is just twice the dot product between ‘feature’ and ‘errors’. Write a function that accepts a ‘feature’ array and ‘error’ array and returns the ‘derivative’ (a single number). 

We are now going to move to computing the derivative of the regression cost function. Recall that the cost function is the sum over the data points of the squared difference between an observed output and a predicted output.

Since the derivative of a sum is the sum of the derivatives we can compute the derivative for a single data point and then sum over data points. We can write the squared difference between the observed output and predicted output for a single point as follows:

(w[0]\*[CONSTANT] + w[1]\*[feature_1] + ... + w[i] \*[feature_i] + ... +  w[k]\*[feature_k]   -   output)^2

Where we have k features and a constant. So the derivative with respect to weight w[i] by the chain rule is:

2\*(w[0]\*[CONSTANT] + w[1]\*[feature_1] + ... + w[i] \*[feature_i] + ... +  w[k]\*[feature_k]   -   output)\* [feature_i]

The term inside the paranethesis is just the error (difference between prediction and output). So we can re-write this as:

2\*error\*[feature_i]

That is, the derivative for the weight for feature i is the sum (over data points) of 2 times the product of the error and the feature itself. In the case of the constant then this is just twice the sum of the errors!

Recall that twice the sum of the product of two vectors is just twice the dot product of the two vectors. Therefore the derivative for the weight for feature_i is just two times the dot product between the values of feature_i and the current errors. 

With this in mind complete the following derivative function which computes the derivative of the weight given the value of the feature (over all data points) and the errors (over all data points).

In [10]:
def feature_derivative(errors, feature):
    # Assume that errors and feature are both numpy arrays of the same length (number of data points)
    # compute twice the dot product of these vectors as 'derivative' and return the value
    
    derivative = 2 * np.dot(errors,feature)
    return(derivative)

To test your feature derivartive run the following:

In [11]:
(example_features, example_output) = get_numpy_data(sales,     ['sqft_living'], 'price') 
# parameters in called function becomes   get_numpy_data(data_sframe, features,      output)

my_weights = np.array([0.0, 0.0]) # this makes all the predictions 0 as B0=B1=0.0
test_predictions = predict_output(example_features, my_weights) 

# just like SFrames, two numpy arrays can be elementwise subtracted with '-': 
errors = test_predictions - example_output 
# prediction errors in this case is just the -example_output, since subtracting from 0

# let's compute the derivative with respect to 'constant', the ":" indicates "all rows"
feature = example_features[: , 0]    #all rows and 1st column
derivative = feature_derivative(errors, feature)

print derivative
print -np.sum(example_output)*2 # should be the same as derivative

get_numpy_data(): print data_sframe = features_sframe(head 5)
+----------+-------------+
| constant | sqft_living |
+----------+-------------+
|    1     |    1180.0   |
|    1     |    2570.0   |
|    1     |    770.0    |
|    1     |    1960.0   |
|    1     |    1680.0   |
+----------+-------------+
[5 rows x 2 columns]
 

get_numpy_data(): print data_sframe converted to feature_matrix[0:5, :]
[[  1.00000000e+00   1.18000000e+03]
 [  1.00000000e+00   2.57000000e+03]
 [  1.00000000e+00   7.70000000e+02]
 [  1.00000000e+00   1.96000000e+03]
 [  1.00000000e+00   1.68000000e+03]] 

-23345850016.0
-23345850016.0


# 6. Gradient Descent
Now we will use our predict_output and feature_derivative to write a gradient descent function. Although we can compute the derivative for all the features simultaneously (the gradient) we will explicitly loop over the features individually for simplicity. Write a gradient descent function that does the following:

*    Accepts a numpy feature_matrix 2D array, a 1D output array, an array of initial weights, a step size and a convergence tolerance.
*    While not converged updates each feature weight by subtracting the step size times the derivative for that feature given the current weights
*    At each step computes the magnitude/length of the gradient (square root of the sum of squared components)
*    When the magnitude of the gradient is smaller than the input tolerance returns the final weight vector.

e.g. if you’re using SFrames and numpy you can complete the following function
*regression_gradient_descent()*  - see below

Now we will write a function that performs a gradient descent. The basic premise is simple. Given a starting point we update the current weights by moving in the negative gradient direction. Recall that the gradient is the direction of *increase* and therefore the negative gradient is the direction of *decrease* and we're trying to *minimize* a cost function. 

The amount by which we move in the negative gradient *direction*  is called the 'step size'. We stop when we are 'sufficiently close' to the optimum. We define this by requiring that the magnitude (length) of the gradient vector to be smaller than a fixed 'tolerance'.

With this in mind, complete the following gradient descent function below using your derivative function above. For each step in the gradient descent we update the weight for each feature befofe computing our stopping criteria

In [12]:
from math import sqrt # recall that the magnitude/length of a vector [g[0], g[1], g[2]] is sqrt(g[0]^2 + g[1]^2 + g[2]^2)

In [13]:
def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
    converged = False 
    weights = np.array(initial_weights) # make sure it's a numpy array
    
    while not converged:
        # compute the predictions based on feature_matrix and weights using your predict_output() function
        # print "\nWhile: Model Coefficients or weights", weights
        predictions = predict_output(feature_matrix, weights)
        # predictions = np.dot(feature_matrix, weights)   #or simple do this
        # compute the errors
        errors = predictions - output

        gradient_sum_squares = 0 # initialize the gradient sum of squares
        # while we haven't reached the tolerance yet, update each feature's weight
        for i in range(len(weights)): # loop over each weight
            # Recall that feature_matrix[:, i] is the feature column associated with weights[i]
            # compute the derivative for weight[i]:
            feature = feature_matrix[: ,i] 
            derivative = feature_derivative(errors, feature)

            # add the squared value of the derivative to the gradient sum of squares (for assessing convergence)
            gradient_sum_squares += derivative**2
            # print "\t for: gradient_sum_squares = ", gradient_sum_squares
            # subtract the step size times the derivative from the current weight
            weights -= (step_size*derivative)
            print "\t derivative = ", derivative  
            print "\t Weights = ", weights
  
        # compute the square-root of the gradient sum of squares to get the gradient matnigude:
        gradient_magnitude = sqrt(gradient_sum_squares)
        # print "While: gradient_magnitude =", gradient_magnitude, "Tolerance=",tolerance
        if gradient_magnitude < tolerance:
            converged = True
            
    return(weights)

A few things to note before we run the gradient descent. Since the gradient is a sum over all the data points and involves a product of an error and a feature the gradient itself will be very large since the features are large (squarefeet) and the output is large (prices). So while you might expect "tolerance" to be small, small is only relative to the size of the features. 

For similar reasons the step size will be much smaller than you might expect but this is because the gradient has such large values.

# 7. Running the Gradient Descent as Simple Regression
Now split the sales data into training and test data. Like previous notebooks it’s important to use the same seed.

In [14]:
train_data,test_data = sales.random_split(.8,seed=0)

For those students not using SFrames please download the training and testing data csv files!!

# 8. Run the regression_gradient_descent function.
In particular we will use the gradient descent to estimate the model from Week 1 using just an intercept and slope. Use the following parameters:

*    features: ‘sqft_living’
*    output: ‘price’
*    initial weights: -47000, 1 (intercept, sqft_living respectively)
*    step_size = 7e-12
*    tolerance = 2.5e7

Although the gradient descent is designed for multiple regression since the constant is now a feature we can use the gradient descent function to estimat the parameters in the simple regression on squarefeet. The folowing cell sets up the feature_matrix, output, initial weights and step size for the first model:

In [15]:
# let's test out the gradient descent
simple_features = ['sqft_living']
simple_output = 'price'

# build a 2D matrix, 1st column constant 1s', 2nd col sqft_living, n a 1D target/response array
(simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, simple_output)
initial_weights = np.array([-47000., 1.])
step_size = 7e-12
#step_size = 0.5
tolerance = 2.5e9
#tolerance = 1.0e2

print "simple feature matrix = \n", simple_feature_matrix[0:5 , :]
print "\noutput = \n", output[0:5]

get_numpy_data(): print data_sframe = features_sframe(head 5)
+----------+-------------+
| constant | sqft_living |
+----------+-------------+
|    1     |    1180.0   |
|    1     |    2570.0   |
|    1     |    770.0    |
|    1     |    1960.0   |
|    1     |    1680.0   |
+----------+-------------+
[5 rows x 2 columns]
 

get_numpy_data(): print data_sframe converted to feature_matrix[0:5, :]
[[  1.00000000e+00   1.18000000e+03]
 [  1.00000000e+00   2.57000000e+03]
 [  1.00000000e+00   7.70000000e+02]
 [  1.00000000e+00   1.96000000e+03]
 [  1.00000000e+00   1.68000000e+03]] 

simple feature matrix = 
[[  1.00000000e+00   1.18000000e+03]
 [  1.00000000e+00   2.57000000e+03]
 [  1.00000000e+00   7.70000000e+02]
 [  1.00000000e+00   1.96000000e+03]
 [  1.00000000e+00   1.68000000e+03]]

output = 
[ 221900.  538000.  180000.  604000.  510000.]


Next run your gradient descent with the above parameters.

In [16]:
simple_weights = regression_gradient_descent(simple_feature_matrix, output, initial_weights, step_size, tolerance)

	 derivative =  -20314476454.0
	 Weights =  [ -4.69998578e+04   1.14220134e+00]
	 derivative =  -5.05515267032e+13
	 Weights =  [-46645.99711174    355.00288826]
	 derivative =  5298777356.79
	 Weights =  [-46646.03420318    354.96579682]
	 derivative =  1.31786304558e+13
	 Weights =  [-46738.28461637    262.71538363]
	 derivative =  -1378522061.16
	 Weights =  [-46738.27496672    262.72503328]
	 derivative =  -3.43563278644e+12
	 Weights =  [-46714.22553722    286.77446278]
	 derivative =  362230123.091
	 Weights =  [-46714.22807283    286.77192717]
	 derivative =  895656439713.0
	 Weights =  [-46720.4976679    280.5023321]
	 derivative =  -91578816.3181
	 Weights =  [-46720.49702685    280.50297315]
	 derivative =  -233497834960.0
	 Weights =  [-46718.86254201    282.13745799]
	 derivative =  26727830.286
	 Weights =  [-46718.8627291    282.1372709]
	 derivative =  60869340594.6
	 Weights =  [-46719.28881449    281.71118551]
	 derivative =  -4114362.13299
	 Weights =  [-46719.2887856

How do your weights compare to those achieved in week 1 (don't expect them to be exactly the same)? 

**Quiz Question: What is the value of the weight for sqft_living -- the second element of ‘simple_weights’ (rounded to 1 decimal place)?**

In [17]:
print simple_weights  #slope term was 282.6 earlier

[-46719.19910466    281.80089534]


Use your newly estimated weights and your predict_output() function to compute the predictions on all the TEST data (you will need to create a numpy array of the test feature_matrix and test output first:

In [18]:
(test_simple_feature_matrix, test_output) = get_numpy_data(test_data, simple_features, simple_output)

get_numpy_data(): print data_sframe = features_sframe(head 5)
+----------+-------------+
| constant | sqft_living |
+----------+-------------+
|    1     |    1430.0   |
|    1     |    2950.0   |
|    1     |    1710.0   |
|    1     |    2320.0   |
|    1     |    1090.0   |
+----------+-------------+
[5 rows x 2 columns]
 

get_numpy_data(): print data_sframe converted to feature_matrix[0:5, :]
[[  1.00000000e+00   1.43000000e+03]
 [  1.00000000e+00   2.95000000e+03]
 [  1.00000000e+00   1.71000000e+03]
 [  1.00000000e+00   2.32000000e+03]
 [  1.00000000e+00   1.09000000e+03]] 



Now compute your predictions using test_simple_feature_matrix and your weights from above.

In [19]:
simple_predictions = predict_output(test_simple_feature_matrix,  simple_weights) 

**Quiz Question: What is the predicted price for the 1st house in the TEST data set for model 1 (round to nearest dollar)?**
\$356256

In [20]:
print simple_predictions[0:5]

[ 356256.08122763  784593.44214027  435160.33192206  607058.87807779
  260443.77681296]


In [21]:
print test_data['price'][0:5]  #print test_data$price[0:5] SyntaxError: invalid syntax

[310000.0, 650000.0, 233000.0, 580500.0, 535000.0]


In [22]:
np.array(test_data['price'])[0:5]

array([ 310000.,  650000.,  233000.,  580500.,  535000.])

Now that you have the predictions on test data, compute the RSS on the test data set. Save this value for comparison later. Recall that RSS is the sum of the squared errors (difference between prediction and output).

In [23]:
# then compute the residuals (since we are squaring it doesn't matter which order you subtract)
# residuals = test_data['price'] - predictions  -- this works but gives v.long RSS 
# fix convert SFrame column to array or list 1st
residuals1 = np.array(test_data['price'])  - simple_predictions  
# square the residuals and add them up
RSS1 = (residuals1*residuals1).sum()
print RSS1
# This is what we got earlier using graphlabs regression function 
# 1.2012677033e+15

#So appears gradient descent has lower RSS i.e better fit!!

2.75393109803e+14


# Running a multiple regression

Now we will use more than one actual feature. Use the following code to produce the weights for a second model with the following parameters:

In [24]:
model_features = ['sqft_living', 'sqft_living15'] 
# sqft_living15 is the average squarefeet for the nearest 15 neighbors. 
my_output = 'price'
(multi_feature_matrix, multi_output) = get_numpy_data(train_data, model_features, my_output)
multi_initial_weights = np.array([-100000., 1., 1.])
step_size = 1e-12
tolerance = 1e12

get_numpy_data(): print data_sframe = features_sframe(head 5)
+----------+-------------+---------------+
| constant | sqft_living | sqft_living15 |
+----------+-------------+---------------+
|    1     |    1180.0   |     1340.0    |
|    1     |    2570.0   |     1690.0    |
|    1     |    770.0    |     2720.0    |
|    1     |    1960.0   |     1360.0    |
|    1     |    1680.0   |     1800.0    |
+----------+-------------+---------------+
[5 rows x 3 columns]
 

get_numpy_data(): print data_sframe converted to feature_matrix[0:5, :]
[[  1.00000000e+00   1.18000000e+03   1.34000000e+03]
 [  1.00000000e+00   2.57000000e+03   1.69000000e+03]
 [  1.00000000e+00   7.70000000e+02   2.72000000e+03]
 [  1.00000000e+00   1.96000000e+03   1.36000000e+03]
 [  1.00000000e+00   1.68000000e+03   1.80000000e+03]] 



Use the above parameters to estimate the model weights. Record these values for your quiz.

In [25]:
multi_weights = regression_gradient_descent(multi_feature_matrix, multi_output, multi_initial_weights, step_size, tolerance)

	 derivative =  -22088131380.0
	 Weights =  [ -9.99999779e+04   1.02208813e+00   1.02208813e+00]
	 derivative =  -5.42241456419e+13
	 Weights =  [ -9.99457538e+04   5.52462338e+01   5.52462338e+01]
	 derivative =  -4.8982259336e+13
	 Weights =  [-99896.77150689    104.22849311    104.22849311]
	 derivative =  -7491384202.99
	 Weights =  [-99896.76401551    104.23598449    104.23598449]
	 derivative =  -1.90968331561e+13
	 Weights =  [-99877.66718235    123.33281765    123.33281765]
	 derivative =  -1.65844989607e+13
	 Weights =  [-99861.08268339    139.91731661    139.91731661]
	 derivative =  -2444902190.15
	 Weights =  [-99861.08023849    139.91976151    139.91976151]
	 derivative =  -6.95239107806e+12
	 Weights =  [-99854.12784741    146.87215259    146.87215259]
	 derivative =  -5.38373532424e+12
	 Weights =  [-99848.74411209    152.25588791    152.25588791]
	 derivative =  -700199730.268
	 Weights =  [-99848.74341189    152.25658811    152.25658811]
	 derivative =  -2.75373587361e

Use your newly estimated weights and the predict_output function to compute the predictions on the TEST data. Don't forget to create a numpy array for these features from the test set first!

In [26]:
print multi_weights

(multi_feature_matrix, multi_output) = get_numpy_data(test_data, model_features, my_output)

multi_predictions = predict_output(multi_feature_matrix,  multi_weights) 


[-99842.49367741    158.50632259    158.50632259]
get_numpy_data(): print data_sframe = features_sframe(head 5)
+----------+-------------+---------------+
| constant | sqft_living | sqft_living15 |
+----------+-------------+---------------+
|    1     |    1430.0   |     1780.0    |
|    1     |    2950.0   |     2140.0    |
|    1     |    1710.0   |     1030.0    |
|    1     |    2320.0   |     2580.0    |
|    1     |    1090.0   |     1570.0    |
+----------+-------------+---------------+
[5 rows x 3 columns]
 

get_numpy_data(): print data_sframe converted to feature_matrix[0:5, :]
[[  1.00000000e+00   1.43000000e+03   1.78000000e+03]
 [  1.00000000e+00   2.95000000e+03   2.14000000e+03]
 [  1.00000000e+00   1.71000000e+03   1.03000000e+03]
 [  1.00000000e+00   2.32000000e+03   2.58000000e+03]
 [  1.00000000e+00   1.09000000e+03   1.57000000e+03]] 



**Quiz Question: What is the predicted price for the 1st house in the TEST data set for model 2 (round to nearest dollar)?**

In [27]:
print multi_predictions[0]

408962.801836


What is the actual price for the 1st house in the test data set?

In [28]:
print test_data['price'][0]  #print test_data$price[0:5] SyntaxError: invalid syntax

310000.0


**Quiz Question: Which estimate was closer to the true price for the 1st house on the Test data set, model 1 or model 2?**  Model 1 \$356256, Model 2 had \$408962.  So model 1, the simple model with only 1 predictor/feature 'sqft_living', had predictions closer to true price.

Now use your predictions and the output to compute the RSS for model 2 on TEST data.

In [29]:
# then compute the residuals (since we are squaring it doesn't matter which order you subtract)
# residuals = test_data['price'] - predictions  -- this works but gives v.long RSS 
# fix convert SFrame column to array or list 1st
residuals2 = np.array(test_data['price'])  - multi_predictions  
# square the residuals and add them up
RSS_2 = (residuals2*residuals2).sum()
print RSS_2
# This is what we got earlier using Model 1
# 2.75393109803e+14

2.77427930983e+14


**Quiz Question: Which model (1 or 2) has lowest RSS on all of the TEST data? **
Model 1 which only had feature 'sqft_living'. So appears extra feature 'sqft_living15' makes slighltly worse fit. So the more complex model had higher test RSS.