## Your neighbor is a real estate agent and wants some help predicting housing prices for regions in the USA. It would be great if you could somehow create a model for her that allows her to put in a few features of a house and returns back an estimate of what the house would sell for.

## She has asked you if you could help her out with your new data science skills. You say yes, and decide that Linear Regression might be a good path to solve this problem!

## Your neighbor then gives you some information about a bunch of houses in regions of theUnited States, it is all in the data set: USA_Housing.csv.


## The data contains the following columns:
'Avg. Area Income': Avg. Income of residents of the city house is located in.<br>
'Avg. Area House Age': Avg Age of Houses in same city 'Avg'.<br>
'Area Number of Rooms': Avg Number of Rooms for Houses in same city 'Avg'.<br>
'Area Number of Bedrooms': Avg Number of Bedrooms for Houses in same city.<br>
'Area Population': Population of city house is located in.<br>
'Price': Price that the house sold at.<br>
'Address': Address for the house.<br>


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

In [2]:
df = pd.read_csv("USA_Housing.csv.")
df.head()

Unnamed: 0,Avg. Area Income,Avg. Area House Age,Avg. Area Number of Rooms,Avg. Area Number of Bedrooms,Area Population,Price,Address
0,79545.458574,5.682861,7.009188,4.09,23086.800503,1059034.0,"208 Michael Ferry Apt. 674\nLaurabury, NE 3701..."
1,79248.642455,6.0029,6.730821,3.09,40173.072174,1505891.0,"188 Johnson Views Suite 079\nLake Kathleen, CA..."
2,61287.067179,5.86589,8.512727,5.13,36882.1594,1058988.0,"9127 Elizabeth Stravenue\nDanieltown, WI 06482..."
3,63345.240046,7.188236,5.586729,3.26,34310.242831,1260617.0,USS Barnett\nFPO AP 44820
4,59982.197226,5.040555,7.839388,4.23,26354.109472,630943.5,USNS Raymond\nFPO AE 09386


In [4]:
df.columns

Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
       'Avg. Area Number of Bedrooms', 'Area Population', 'Price', 'Address'],
      dtype='object')

In [17]:
X = df[["Avg. Area Number of Rooms", "Avg. Area Number of Bedrooms", "Avg. Area House Age"]].values.reshape(-1, 1)
X = X[:5000, :]
y = df["Price"].values.reshape(-1, 1)

In [18]:
X.shape, y.shape

((5000, 1), (5000, 1))

In [20]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=32)
model = LinearRegression()

In [21]:
model.fit(X_train, y_train)

LinearRegression()

In [22]:
print(model.intercept_)
print(model.coef_)

[1241861.41839117]
[[-910.97684208]]


In [34]:
y_pred = model.predict(X_test)
for i in range(10):
    print(f"Actual Price of the House {i+1}: {df['Price'][i]}")
    print(f"Predicted Price of the House {i+1}: {y_pred[i]}\n")

Actual Price of the House 1: 1059033.5578701235
Predicted Price of the House 1: [1236784.89942955]

Actual Price of the House 2: 1505890.91484695
Predicted Price of the House 2: [1235910.37955868]

Actual Price of the House 3: 1058987.9878760849
Predicted Price of the House 3: [1236056.00809111]

Actual Price of the House 4: 1260616.8066294468
Predicted Price of the House 4: [1236782.98271739]

Actual Price of the House 5: 630943.4893385402
Predicted Price of the House 5: [1234824.37428654]

Actual Price of the House 6: 1068138.0743935304
Predicted Price of the House 6: [1236231.58150712]

Actual Price of the House 7: 1502055.8173744078
Predicted Price of the House 7: [1236413.39485953]

Actual Price of the House 8: 1573936.5644777217
Predicted Price of the House 8: [1235403.85028395]

Actual Price of the House 9: 798869.5328331633
Predicted Price of the House 9: [1237980.65704391]

Actual Price of the House 10: 1545154.8126419624
Predicted Price of the House 10: [1235962.07502501]



In [37]:
# it does not generalize well

In [36]:
model.score(X_train, y_train), model.score(X_test, y_test)

(1.867965498691948e-05, -0.0037759919006334908)

In [40]:
y_train = y_train[:1000, :]
y_train.shape

(1000, 1)

In [46]:
print("\033[1mEvaluation of training performance:\n")
print("\033[0mMAE:",round(mean_absolute_error(y_train,y_pred),2))
print("RMSE:",round(np.sqrt(mean_squared_error(y_train,y_pred)),2))
print("R**2:",round(r2_score(y_train,y_pred),2))

[1mEvaluation of training performance:

[0mMAE: 284251.45
RMSE: 353146.15
R**2: 0.0
