# Regression Week 2: Multiple Regression (Interpretation)

The goal of this first notebook is to explore multiple regression and feature engineering with existing Turi Create functions.

In this notebook you will use data on house sales in King County to predict prices using multiple regression. You will:
* Use SFrames to do some feature engineering
* Use built-in Turi Create functions to compute the regression weights (coefficients/parameters)
* Given the regression weights, predictors and outcome write a function to compute the Residual Sum of Squares
* Look at coefficients and interpret their meanings
* Evaluate multiple models via RSS

# Fire up Turi Create

In [2]:
import turicreate as tc
from turicreate import SFrame

import math as mt
import numpy as np

# Load in house sales data

Dataset is from house sales in King County, the region where the city of Seattle, WA is located.

In [3]:
sales = tc.SFrame('home_data.sframe/')

In [4]:
sales

id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront
7129300520,2014-10-13 00:00:00+00:00,221900.0,3.0,1.0,1180.0,5650.0,1.0,0
6414100192,2014-12-09 00:00:00+00:00,538000.0,3.0,2.25,2570.0,7242.0,2.0,0
5631500400,2015-02-25 00:00:00+00:00,180000.0,2.0,1.0,770.0,10000.0,1.0,0
2487200875,2014-12-09 00:00:00+00:00,604000.0,4.0,3.0,1960.0,5000.0,1.0,0
1954400510,2015-02-18 00:00:00+00:00,510000.0,3.0,2.0,1680.0,8080.0,1.0,0
7237550310,2014-05-12 00:00:00+00:00,1225000.0,4.0,4.5,5420.0,101930.0,1.0,0
1321400060,2014-06-27 00:00:00+00:00,257500.0,3.0,2.25,1715.0,6819.0,2.0,0
2008000270,2015-01-15 00:00:00+00:00,291850.0,3.0,1.5,1060.0,9711.0,1.0,0
2414600126,2015-04-15 00:00:00+00:00,229500.0,3.0,1.0,1780.0,7470.0,1.0,0
3793500160,2015-03-12 00:00:00+00:00,323000.0,3.0,2.5,1890.0,6560.0,2.0,0

view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat
0,3,7.0,1180.0,0.0,1955.0,0.0,98178,47.51123398
0,3,7.0,2170.0,400.0,1951.0,1991.0,98125,47.72102274
0,3,6.0,770.0,0.0,1933.0,0.0,98028,47.73792661
0,5,7.0,1050.0,910.0,1965.0,0.0,98136,47.52082
0,3,8.0,1680.0,0.0,1987.0,0.0,98074,47.61681228
0,3,11.0,3890.0,1530.0,2001.0,0.0,98053,47.65611835
0,3,7.0,1715.0,0.0,1995.0,0.0,98003,47.30972002
0,3,7.0,1060.0,0.0,1963.0,0.0,98198,47.40949984
0,3,7.0,1050.0,730.0,1960.0,0.0,98146,47.51229381
0,3,7.0,1890.0,0.0,2003.0,0.0,98038,47.36840673

long,sqft_living15,sqft_lot15
-122.25677536,1340.0,5650.0
-122.3188624,1690.0,7639.0
-122.23319601,2720.0,8062.0
-122.39318505,1360.0,5000.0
-122.04490059,1800.0,7503.0
-122.00528655,4760.0,101930.0
-122.32704857,2238.0,6819.0
-122.31457273,1650.0,9711.0
-122.33659507,1780.0,8113.0
-122.0308176,2390.0,7570.0


# Split data into training and testing.
We use seed=0 so that everyone running this notebook gets the same results.  In practice, you may set a random seed (or let Turi Create pick a random seed for you).  

In [5]:
train_data,test_data = sales.random_split(.8,seed=0)

# Learning a multiple regression model

Recall we can use the following code to learn a multiple regression model predicting 'price' based on the following features:
example_features = ['sqft_living', 'bedrooms', 'bathrooms'] on training data with the following code:

(Aside: We set validation_set = None to ensure that the results are always the same)

In [6]:
example_features = ['sqft_living', 'bedrooms', 'bathrooms']
example_model = tc.linear_regression.create(train_data, target = 'price', features = example_features, 
                                                    validation_set = None)

Now that we have fitted the model we can extract the regression weights (coefficients) as an SFrame as follows:

In [7]:
example_weight_summary = example_model.coefficients
print (example_weight_summary)

+-------------+-------+---------------------+--------------------+
|     name    | index |        value        |       stderr       |
+-------------+-------+---------------------+--------------------+
| (intercept) |  None |   87910.0724923957  | 7873.338143401634  |
| sqft_living |  None |  315.40344055210005 | 3.4557003258547296 |
|   bedrooms  |  None | -65080.215552827525 | 2717.4568544207045 |
|  bathrooms  |  None |  6944.020192638836  | 3923.114931441481  |
+-------------+-------+---------------------+--------------------+
[4 rows x 4 columns]



# Making Predictions

In the gradient descent notebook we use numpy to do our regression. In this book we will use existing Turi Create functions to analyze multiple regressions. 

Recall that once a model is built we can use the .predict() function to find the predicted values for data we pass. For example using the example model above:

In [8]:
example_predictions = example_model.predict(train_data)
print (example_predictions[0]) # should be 271789.505878

271789.5058780301


# Compute RSS

Now that we can make predictions given the model, let's write a function to compute the RSS of the model. Complete the function below to calculate RSS given the model, data, and the outcome.

In [9]:
def get_residual_sum_of_squares(model, data, outcome):
    # First get the predictions
    predictions = model.predict(data)

    # Then compute the residuals/errors
    residuals = outcome - predictions

    # Then square and add them up
    residuals_squared = residuals*residuals
    RSS = residuals_squared.sum()

    return(RSS)    

Test your function by computing the RSS on TEST data for the example model:

In [10]:
rss_example_train = get_residual_sum_of_squares(example_model, test_data, test_data['price'])
print (rss_example_train) # should be 2.7376153833e+14

273761538330193.0


# Create some new features

Although we often think of multiple regression as including multiple different features (e.g. # of bedrooms, squarefeet, and # of bathrooms) but we can also consider transformations of existing features e.g. the log of the squarefeet or even "interaction" features such as the product of bedrooms and bathrooms.

You will use the logarithm function to create a new feature. so first you should import it from the math library.

In [11]:
from math import log

Next create the following 4 new features as column in both TEST and TRAIN data:
* bedrooms_squared = bedrooms\*bedrooms
* bed_bath_rooms = bedrooms\*bathrooms
* log_sqft_living = log(sqft_living)
* lat_plus_long = lat + long 
As an example here's the first one:

Explain:
* bedrooms_squared = Squaring bedrooms aumentará la separación entre pocos dormitorios (p. ej., 1) y muchos dormitorios (p. ej., 4) ya que 1^2 = 1 pero 4^2 = 16. En consecuencia, esta variable afectará principalmente a las casas con muchos dormitorios.
* bed_bath_rooms = Dormitorios por baños es lo que se llama una variable de "interacción". Es grande cuando ambos son grandes.
* log_sqft_living = Tomar el logaritmo de sqft_living tiene el efecto de acercar los valores grandes y dispersar los valores pequeños.
* lat_plus_long = Sumar latitud a longitud no tiene sentido, pero lo haremos de todos modos (verá por qué)

In [12]:
train_data['bedrooms_squared'] = train_data['bedrooms'].apply(lambda x: x**2)
test_data['bedrooms_squared'] = test_data['bedrooms'].apply(lambda x: x**2)

In [13]:
# create the remaining 3 features in both TEST and TRAIN data
train_data['bed_bath_rooms'] = train_data['bedrooms']* train_data['bathrooms']
test_data['bed_bath_rooms'] = test_data['bedrooms'] * test_data['bathrooms']

train_data['log_sqft_living'] = train_data['sqft_living'].apply(lambda x: mt.log(x))
test_data['log_sqft_living'] = test_data['sqft_living'].apply(lambda x: mt.log(x))

train_data['lat_plus_long'] = train_data['lat'] + train_data['long']
test_data['lat_plus_long'] = test_data['lat'] + test_data['long']


* Squaring bedrooms will increase the separation between not many bedrooms (e.g. 1) and lots of bedrooms (e.g. 4) since 1^2 = 1 but 4^2 = 16. Consequently this feature will mostly affect houses with many bedrooms.
* bedrooms times bathrooms gives what's called an "interaction" feature. It is large when *both* of them are large.
* Taking the log of squarefeet has the effect of bringing large values closer together and spreading out small values.
* Adding latitude to longitude is totally non-sensical but we will do it anyway (you'll see why)

**Quiz Question: What is the mean (arithmetic average) value of your 4 new features on TEST data? (round to 2 digits)**

In [14]:
print (('bedrooms_squared: ') + str(test_data['bedrooms_squared'].mean()))
print (('bed_bath_rooms: ') + str(test_data['bed_bath_rooms'].mean()))
print (('log_sqft_living: ') + str(test_data['log_sqft_living'].mean()))
print (('lat_plus_long: ') + str(test_data['lat_plus_long'].mean()))

bedrooms_squared: 12.446677701584301
bed_bath_rooms: 7.503901631591394
log_sqft_living: 7.550274679645938
lat_plus_long: -74.65333497217307


# Learning Multiple Models

Now we will learn the weights for three (nested) models for predicting house prices. The first model will have the fewest features the second model will add one more feature and the third will add a few more:
* Model 1: squarefeet, # bedrooms, # bathrooms, latitude & longitude
* Model 2: add bedrooms\*bathrooms
* Model 3: Add log squarefeet, bedrooms squared, and the (nonsensical) latitude + longitude

In [15]:
model_1_features = ['sqft_living', 'bedrooms', 'bathrooms', 'lat', 'long']
model_2_features = model_1_features + ['bed_bath_rooms']
model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long']

Now that you have the features, learn the weights for the three different models for predicting target = 'price' using turicreate.linear_regression.create() and look at the value of the weights/coefficients:

In [16]:
# Learn the three models: (don't forget to set validation_set = None)
model_1 = tc.linear_regression.create(train_data, target= 'price', features= model_1_features, validation_set= None)
model_2 = tc.linear_regression.create(train_data, target= 'price', features= model_2_features, validation_set= None)
model_3 = tc.linear_regression.create(train_data, target= 'price', features= model_3_features, validation_set= None)

In [17]:
# Examine/extract each model's coefficients:
model_1_coefficients = model_1.coefficients
model_2_coefficients = model_2.coefficients
model_3_coefficients = model_3.coefficients

print (model_1_coefficients)
print (model_2_coefficients)
print (model_3_coefficients)

+-------------+-------+---------------------+--------------------+
|     name    | index |        value        |       stderr       |
+-------------+-------+---------------------+--------------------+
| (intercept) |  None |  -56140675.74114427 | 1649985.420135553  |
| sqft_living |  None |  310.26332577692136 | 3.1888296040737765 |
|   bedrooms  |  None |  -59577.11606759667 | 2487.2797732245012 |
|  bathrooms  |  None |  13811.840541653264 | 3593.5421329670735 |
|     lat     |  None |  629865.7894714845  | 13120.710032363884 |
|     long    |  None | -214790.28516471002 | 13284.285159576597 |
+-------------+-------+---------------------+--------------------+
[6 rows x 4 columns]

+----------------+-------+---------------------+--------------------+
|      name      | index |        value        |       stderr       |
+----------------+-------+---------------------+--------------------+
|  (intercept)   |  None |  -54410676.1071702  | 1650405.1652726454 |
|  sqft_living   |  None |  

**Quiz Question: What is the sign (positive or negative) for the coefficient/weight for 'bathrooms' in model 1?**

**Quiz Question: What is the sign (positive or negative) for the coefficient/weight for 'bathrooms' in model 2?**

Think about what this means.

### Positivo en el modelo 1
### Negativo en el model 2

### Now using your three estimated models compute the RSS (Residual Sum of Squares) on the Training data.

# Comparing multiple models

Now that you've learned three models and extracted the model weights we want to evaluate which model is best.

First use your functions from earlier to compute the RSS on TRAINING Data for each of the three models.

In [18]:
# Compute the RSS on TRAINING data for each of the three models and record the values:
rss_model_1_train = get_residual_sum_of_squares(model_1,train_data,train_data['price'])
rss_model_2_train = get_residual_sum_of_squares(model_2,train_data,train_data['price'])
rss_model_3_train = get_residual_sum_of_squares(model_3,train_data,train_data['price'])

print (('RSS model 1: ') + str(rss_model_1_train))
print (('RSS model 2: ') + str(rss_model_2_train))
print (('RSS model 3: ') + str(rss_model_3_train))

RSS model 1: 971328233545434.4
RSS model 2: 961592067859822.1
RSS model 3: 905276314551640.9


**Quiz Question: Which model (1, 2 or 3) has lowest RSS on TRAINING Data?** Is this what you expected?

Now compute the RSS on on TEST data for each of the three models.

In [19]:
# Compute the RSS on TESTING data for each of the three models and record the values:
rss_model_1_test = get_residual_sum_of_squares(model_1,test_data,test_data['price'])
rss_model_2_test = get_residual_sum_of_squares(model_2,test_data,test_data['price'])
rss_model_3_test = get_residual_sum_of_squares(model_3,test_data,test_data['price'])

print (('RSS model 1 test: ') + str(rss_model_1_test))
print (('RSS model 2 test: ') + str(rss_model_2_test))
print (('RSS model 3 test: ') + str(rss_model_3_test))

RSS model 1 test: 226568089093160.56
RSS model 2 test: 224368799994313.0
RSS model 3 test: 251829318963157.28


**Quiz Question: Which model (1, 2 or 3) has lowest RSS on TESTING Data?** Is this what you expected? Think about the features that were added to each model from the previous.

### Model 2

## This function should return a features_matrix (2D array) consisting of first a column of ones followed by columns containing the values of the input features in the data set in the same order as the input list

In [22]:
def get_numpy_data(data_sframe, features, output):
    data_sframe['constant'] = 1 # add a constant column to an SFrame
    
    # prepend variable 'constant' to the features list
    features = ['constant'] + features
    
    # select the columns of data_SFrame given by the ‘features’ list into the SFrame ‘features_sframe’
    features_sframe = data_sframe[features]

    # this will convert the features_sframe into a numpy matrix:
    features_matrix = features_sframe.to_numpy()
   
    # assign the column of data_sframe associated with the target to the variable ‘output_sarray’
    output_sarray = data_sframe[output]

    # this will convert the SArray into a numpy array:
    output_array = output_sarray.to_numpy()
    return(features_matrix, output_array)

In [30]:
(featuree_matrixx, out_array) = get_numpy_data(sales, ['sqft_living', 'bedrooms'],'price')

In [33]:
featuree_matrixx

array([[1.00e+00, 1.18e+03, 3.00e+00],
       [1.00e+00, 2.57e+03, 3.00e+00],
       [1.00e+00, 7.70e+02, 2.00e+00],
       ...,
       [1.00e+00, 1.02e+03, 2.00e+00],
       [1.00e+00, 1.60e+03, 3.00e+00],
       [1.00e+00, 1.02e+03, 2.00e+00]])

In [29]:
out_array

array([221900., 538000., 180000., ..., 402101., 400000., 325000.])

## Write a function ‘predict_output’ which accepts a 2D array ‘feature_matrix’ and a 1D array ‘weights’ and returns a 1D array ‘predictions’.

In [34]:
def predict_outcome(feature_matrix, weights):
    predictions = np.dot(feature_matrix, weights)
    return(predictions)

In [36]:
my_weights = np.array([1., 1., 1.]) # the example weights
my_weights

array([1., 1., 1.])

In [37]:
featuree_matrixx

array([[1.00e+00, 1.18e+03, 3.00e+00],
       [1.00e+00, 2.57e+03, 3.00e+00],
       [1.00e+00, 7.70e+02, 2.00e+00],
       ...,
       [1.00e+00, 1.02e+03, 2.00e+00],
       [1.00e+00, 1.60e+03, 3.00e+00],
       [1.00e+00, 1.02e+03, 2.00e+00]])

In [39]:
predictions = predict_outcome(featuree_matrixx, my_weights)# A *v

In [40]:
predictions

array([1184., 2574.,  773., ..., 1023., 1604., 1023.])

## Computing the Derivative

Si tenemos los valores de una sola 'feature' de entrada en una matriz 'feature' y los 'errores' de predicción (predicciones - salida), entonces la derivada de la función de costo de regresión con respecto al peso de 'feature' es solo el doble del producto punto entre 'feature' y 'errores'. Escriba una función que acepte una matriz de 'features' y una matriz de 'errores' y devuelva la 'derivada' (un solo número). por ejemplo en python:

(w[0]*[CONSTANT] + w[1]*[feature_1] + ... + w[i] *[feature_i] + ... + w[k]*[feature_k] - output)^2

Where we have k features and a constant. So the derivative with respect to weight w[i] by the chain rule is:

2*(w[0]*[CONSTANT] + w[1]*[feature_1] + ... + w[i] *[feature_i] + ... + w[k]*[feature_k] - output)* [feature_i]

The term inside the paranethesis is just the error (difference between prediction and output). So we can re-write this as:

2*error*[feature_i]

In [41]:
# Derivada del j-esimo termino

def feature_derivative(errors, feature):
    derivative = np.dot(errors,feature) * 2
    return(derivative)

In [43]:
(example_features, example_output) = get_numpy_data(sales, ['sqft_living'], 'price')

my_weights = np.array([0., 0.]) # this makes all the predictions 0

test_predictions = predict_outcome(example_features, my_weights) 

# just like SFrames 2 numpy arrays can be elementwise subtracted with '-': 
errors = test_predictions - example_output # prediction errors in this case is just the -example_output

feature = example_features[:,0] # let's compute the derivative with respect to 'constant', the ":" indicates "all rows"

derivative = feature_derivative(errors, feature)

print (derivative)
print (-np.sum(example_output)*2) # should be the same as derivative

-23345850022.0
-23345850022.0


## Gradient Descent

Now we will write a function that performs a gradient descent. The basic premise is simple. Given a starting point we update the current weights by moving in the negative gradient direction. Recall that the gradient is the direction of increase and therefore the negative gradient is the direction of decrease and we're trying to minimize a cost function.

The amount by which we move in the negative gradient direction is called the 'step size'. We stop when we are 'sufficiently close' to the optimum. We define this by requiring that the magnitude (length) of the gradient vector to be smaller than a fixed 'tolerance'.

With this in mind, complete the following gradient descent function below using your derivative function above. For each step in the gradient descent we update the weight for each feature befofe computing our stopping criteria

In [48]:
from math import sqrt

In [49]:
def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
    converged = False
    weights = np.array(initial_weights)
    while not converged:
        
        # compute the predictions based on feature_matrix and weights:
        predictions = predict_outcome(feature_matrix,weights)
        
        # compute the errors as predictions - output:
        errors = predictions - output
        
        gradient_sum_squares = 0 # initialize the gradient sum of squares
        
        # while not converged, update each weight individually:
        for i in range(len(weights)):
            # Recall that feature_matrix[:, i] is the feature column associated with weights[i]
            feature = feature_matrix[:,i]
            
            # compute the derivative for weight[i]:
            derivative = feature_derivative(errors, feature)
            
            # add the squared derivative to the gradient magnitude
            #gradient_sum_squares +=  (derivative**2)
            gradient_sum_squares =  gradient_sum_squares + derivative**2
            
            # update the weight based on step size and derivative:
            #weights[i] -= (step_size * derivative)
            weights[i] = weights[i] - step_size * derivative
            
        gradient_magnitude = sqrt(gradient_sum_squares)
        if gradient_magnitude < tolerance:
            converged = True
    return(weights)

Now split the sales data into training and test data.

In [50]:
train_data,test_data = sales.random_split(.8,seed=0)

Now we will run the regression_gradient_descent function on some actual data. In particular we will use the gradient descent to estimate the model from Week 1 using just an intercept and slope. Use the following parameters:

features: ‘sqft_living’

output: ‘price’

initial weights: -47000, 1 (intercept, sqft_living respectively)

step_size = 7e-12

tolerance = 2.5e7

In [51]:
simple_features = ['sqft_living']
my_output= 'price'
(simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, my_output)
initial_weights = np.array([-47000., 1.])
step_size = 7e-12
tolerance = 2.5e7

Estimar la pendiente y el intercepto para el feature 'price = b0 + b1 * sqft_living + e' con la funciòn de gradiente descendente

In [52]:
simple_weights = regression_gradient_descent(simple_feature_matrix,
                                             output,
                                             initial_weights,
                                             step_size,
                                             tolerance)

In [53]:
print (simple_weights)

[-46999.88716555    281.91211912]


### What is the value of the weight for sqft_living -- the second element of ‘simple_weights’ (rounded to 1 decimal place)?

In [54]:
281.9

281.9

Now build a corresponding ‘test_simple_feature_matrix’ and ‘test_output’ using test_data. Using ‘test_simple_feature_matrix’ and ‘simple_weights’ compute the predicted house prices on all the test data.

In [55]:
#(test_simple_feature_matrix, test_output) = get_numpy_data(test_data, ['sqft_living'], 'price')
(test_simple_feature_matrix, test_output) = get_numpy_data(test_data, simple_features, my_output)

In [56]:
test_simple_feature_matrix

array([[1.00e+00, 1.43e+03],
       [1.00e+00, 2.95e+03],
       [1.00e+00, 1.71e+03],
       ...,
       [1.00e+00, 2.52e+03],
       [1.00e+00, 2.31e+03],
       [1.00e+00, 1.02e+03]])

In [57]:
test_output

array([310000., 650000., 233000., ..., 610685., 400000., 402101.])

In [58]:
test_predictions = predict_outcome(test_simple_feature_matrix, simple_weights)
print (test_predictions)

[356134.44317093 784640.86422788 435069.83652353 ... 663418.65300782
 604217.10799338 240550.4743332 ]


### What is the predicted price for the 1st house in the Test data set for model 1 (round to nearest dollar)?

In [59]:
test_predictions[0]

356134.4431709297

Now compute RSS on all test data for this model. Record the value and store it for later

In [60]:
test_residuals = test_output - test_predictions
test_RSS = (test_residuals * test_residuals).sum()
print (test_RSS)

275400047593155.94


### Multiple

Now we will use the gradient descent to fit a model with more than 1 predictor variable (and an intercept). Use the following parameters:

model features = ‘sqft_living’, ‘sqft_living15’

output = ‘price’

initial weights = [-100000, 1, 1] (intercept, sqft_living, and sqft_living_15 respectively)

step size = 4e-12

tolerance = 1e9

In [62]:
multi_features = ['sqft_living','sqft_living15']
my_output= 'price'
(multi_feature_matrix, output) = get_numpy_data(train_data, multi_features, my_output)
initial_weights = np.array( [-100000, 1., 1.] )
step_size = 4e-12
tolerance = 1e9

In [63]:
multi_feature_matrix

array([[1.00e+00, 1.18e+03, 1.34e+03],
       [1.00e+00, 2.57e+03, 1.69e+03],
       [1.00e+00, 7.70e+02, 2.72e+03],
       ...,
       [1.00e+00, 1.53e+03, 1.53e+03],
       [1.00e+00, 1.60e+03, 1.41e+03],
       [1.00e+00, 1.02e+03, 1.02e+03]])

In [65]:
output

array([221900., 538000., 180000., ..., 360000., 400000., 325000.])

Estimar los weights para el feature 'price = b0 + b1 * sqft_living + b2 * sqft_living15 + e' con la funciòn de gradiente descendente

In [66]:
multi_weights = regression_gradient_descent(multi_feature_matrix,
                                             output,
                                             initial_weights,
                                             step_size,
                                             tolerance)

In [67]:
multi_weights # b0, b1, b2

array([-9.99999688e+04,  2.45072603e+02,  6.52795277e+01])

### Use the regression weights from this second model (using sqft_living and sqft_living_15) and predict the outcome of all the house prices on the TEST data. 

In [68]:
#(test_multi_feature_matrix, test_output) = get_numpy_data(test_data, ['sqft_living','sqft_living15'], 'price')
(test_multi_feature_matrix, test_output) = get_numpy_data(test_data, multi_features, my_output)

In [69]:
test_multi_feature_matrix

array([[1.00e+00, 1.43e+03, 1.78e+03],
       [1.00e+00, 2.95e+03, 2.14e+03],
       [1.00e+00, 1.71e+03, 1.03e+03],
       ...,
       [1.00e+00, 2.52e+03, 2.52e+03],
       [1.00e+00, 2.31e+03, 1.83e+03],
       [1.00e+00, 1.02e+03, 1.02e+03]])

In [70]:
test_output

array([310000., 650000., 233000., ..., 610685., 400000., 402101.])

In [71]:
test_predictions = predict_outcome(test_multi_feature_matrix, multi_weights)
print (test_predictions)

[366651.41203656 762662.39786164 386312.09499712 ... 682087.39928241
 585579.27865729 216559.20396617]


In [72]:
# What is the predicted price for the 1st house in the TEST data set for model 2 (round to nearest dollar)?

test_predictions[0]

366651.4120365591

In [79]:
# What is the actual price for the 1st house in the Test data set?
test_data['price'][0]

310000.0

In [80]:
train_data['price'][0]

221900.0

Which estimate was closer to the true price for the 1st house on the TEST data set, model 1 or model 2?

### Model 2

In [83]:
# Now compute RSS on all test data for the second model.

test_residuals = test_output - test_predictions
test_RSS = (test_residuals * test_residuals).sum()
print (sqrt(test_RSS))

16439691.191298092


Which model (1 or 2) has lowest RSS on all of the TEST data?  

### Model 2

In [82]:
error = test_output - test_predictions
SS = np.dot(error,error)
RSS = sqrt(SS)
print (RSS)

16439691.191298092
