## 1. Load the data

In [75]:
import graphlab as gl

In [76]:
sales = gl.SFrame('../data/kc_house_data.gl/')

## 2. Split into training data and testing data

In [5]:
train_data, test_data = sales.random_split(.8, seed=0)

17384

## 3. A generic function that returns Simple Linear Regression parameters (slope and intercept)


    numerator = (sum of X*Y) - (1/N)*((sum of X) * (sum of Y))
    denominator = (sum of X^2) - (1/N)*((sum of X) * (sum of X))

    slope = numerator/denominator
    intercept = (mean of Y) - slope * (mean of X)

In [82]:
def simple_linear_regression(input_feature, output):
    x = input_feature
    y = output
    n = len(input_feature)
    
    xandysum = (x*y).sum()
    xsum = x.sum()
    ysum = y.sum()
    xsquaresum = (x*x).sum()
    xandxsum = ((x.sum()) * (x.sum()))
    
    numerator = (xandysum - ((ysum*xsum)/n))
    denominator = (xsquaresum - (xandxsum/n))
    
    slope = numerator/denominator
    
    ymean = y.mean()
    xmean = x.mean()
    intercept = (ymean - slope*xmean)
    return(intercept, slope)

## 4. Using simple_linear_regression to calculate the estimated slope and intercept on the training data to predict ‘price’ given ‘sqft_living’.

In [83]:
input_feature = train_data['sqft_living']
output = train_data['price']
squarefeet_intercept, sqaurefeet_slope = simple_linear_regression(input_feature, output) 
print(sqaurefeet_slope, squarefeet_intercept)

(281.9588385676974, -47116.07657494047)


## 5. Function that predicts the output based on ‘input_feature’, the ‘slope’, and the ‘intercept’

In [84]:
def get_regression_predictions(input_feature, intercept, slope):
    predicted_output = slope*input_feature + intercept
    return (predicted_output)

## 6. Quiz Question: Using your Slope and Intercept from (4), What is the predicted price for a house with 2650 sqft?

In [85]:
get_regression_predictions(2650, squarefeet_intercept, sqaurefeet_slope)

700074.8456294576

## 7. Function that calculates RSS.

    RSS = Sum of all ((Actual Output - Predicted Output)**2)

In [86]:
def get_residual_sum_of_squares(input_feature, output, intercept,slope):
    prediction = get_regression_predictions(input_feature, intercept, slope)
    RSS = ((output-prediction)**2).sum()
    return(RSS)

## 8. Quiz Question: According to this function and the slope and intercept from (4) What is the RSS for the simple linear regression using squarefeet to predict prices on TRAINING data?

In [87]:
get_residual_sum_of_squares(input_feature, output, squarefeet_intercept, sqaurefeet_slope)

1201918356321967.5

## 9. Calculating input from given output (inverse regression)

In [88]:
def inverse_regression_predictions(output, intercept, slope):
    estimated_input = (output - intercept)/slope
    return(estimated_input)

##  10. Quiz Question: According to this function and the regression slope and intercept from (3) what is the estimated square-feet for a house costing $800,000?

In [89]:
inverse_regression_predictions(800000, squarefeet_intercept, sqaurefeet_slope)

3004.3962476159463

## 11. Instead of using ‘sqft_living’ to estimate prices we could use ‘bedrooms’ (a count of the number of bedrooms in the house) to estimate prices. Using your function from (3) calculate the Simple Linear Regression slope and intercept for estimating price based on bedrooms.

In [90]:
input_feature_bed = train_data['bedrooms']
output = train_data['price']
bedroom_intercept, bedroom_slope = simple_linear_regression(input_feature_bed, output) 
print(bedroom_intercept, bedroom_slope)

(109473.18046928692, 127588.95217458377)


## 12. Now that we have 2 different models compute the RSS from BOTH models on TEST data.

In [91]:
get_residual_sum_of_squares(test_data['sqft_living'], test_data['price'], squarefeet_intercept, sqaurefeet_slope)

275402936247141.3

In [92]:
get_residual_sum_of_squares(test_data['bedrooms'], test_data['price'], bedroom_intercept, bedroom_slope)

493364582868287.8

## 13. Quiz Question: Which model (square feet or bedrooms) has lowest RSS on TEST data? Think about why this might be the case.

Squarefeet living model has lower RSS.