# Housing Pricing Prediction

Suppose you are selling your house and you
want to know what a good market price would be. One way to do this is to
first collect information on recent houses sold and make a model of housing
prices.
The file ex1data2.txt contains a training set of housing prices in Port-
land, Oregon. The first column is the size of the house (in square feet), the
second column is the number of bedrooms, and the third column is the price
of the house.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split 

In [2]:
df = pd.read_csv('ex1data2.txt', sep=",", header=None)
df.rename(columns={0: 'size', 1: '# bedrooms', 2: 'price'}, inplace=True)
df.head()

Unnamed: 0,size,# bedrooms,price
0,2104,3,399900
1,1600,3,329900
2,2400,3,369000
3,1416,2,232000
4,3000,4,539900


Note that the house sizes (first colums in X) are about 1000 times the number of bedrooms (second column in X). The feature_normalize() normalizes the input features and set then to zero mean.

In [3]:
def feature_normalize(X):
    return (X - X.mean()) / (X[X.idxmax()] - X[X.idxmin()])

In [8]:
df['normalized_size'] = feature_normalize(df['size'])
df['normalized_rooms'] = feature_normalize(df['# bedrooms'])

In [10]:
df.head()

Unnamed: 0,size,# bedrooms,price,normalized_size,normalized_rooms
0,2104,3,399900,0.028494,-0.042553
1,1600,3,329900,-0.110502,-0.042553
2,2400,3,369000,0.110127,-0.042553
3,1416,2,232000,-0.161247,-0.292553
4,3000,4,539900,0.275598,0.207447


In [40]:
# Gradient descent settings
iterations = 500
alpha = 0.1
theta = np.zeros(3)
y = np.array(df['price'])
X = df[['normalized_size', 'normalized_rooms']].values
ones = np.ones(len(X)).reshape(len(X), 1)
X = np.hstack((ones, X))

In [41]:
# Compute the square error
def compute_square_error(X, y, theta):
    J = (1/(2*len(X))) * (sum((X.dot(theta) - y)**2)) #Vectorized form of the square error formula
    return J

In [None]:
 # Gradient Descent Implementation
def gradient_descent(theta, alpha, X, y, iterations):
    m = len(X)
    J_history = np.zeros(iterations)
    for i in range(iterations):
        theta -= alpha / m * ((X.dot(theta) - y).T.dot(X)) # Vectorized form of the gradient descent formula
        J_history[i] = compute_square_error(X, y, theta)
    return theta
