### Factors Affecting House Prices

The factors affecting the prices of property could be broadly classified into Macro-Economic Factors, Govt Policies, Bank Policies and properties of individual properties.



#### 1. Individual Properites

As we are trying to predict the overall prices most properties of individual houses such as location crime rate in a location would not matter or shall not play that significant a role on the overall. But some other properties such as the average area of the house increasing over a long time could lead to a rise in property prices

Median House Size - https://fred.stlouisfed.org/series/MEDSQUFEEUS - Data only available since 2016


#### 2. Macro-economic Factors

1. PerCapita GDP - https://fred.stlouisfed.org/series/GDP
2. Unemployment - https://fred.stlouisfed.org/series/UNRATE
3. Supply and Demand of houses - https://fred.stlouisfed.org/series/MSACSR
4. Median Income - https://fred.stlouisfed.org/series/MEHOINUSA672N
5. Performance of the stock market - https://fred.stlouisfed.org/series/SPASTT01USM657N
6. Consumer Price Index - https://fred.stlouisfed.org/series/CPIAUCSL


#### 3. Government Policies

1. Property Taxes - https://fred.stlouisfed.org/series/S210401A027NBEA
2. Intrest Rates - https://fred.stlouisfed.org/series/FEDFUNDS
3. Subsidies - https://fred.stlouisfed.org/series/L312051A027NBEA

#### 4. Bank Policies

1. 30 Year Mortgage Rates - https://fred.stlouisfed.org/series/MORTGAGE30US
2. Mortgage Availabilities

In [7]:
import pandas as pd

In [8]:
percapita_df = pd.read_csv('content/GDP PerCapita.csv', names = ["DATE", "Per_Capita_GDP"], skiprows = 1)
unemployment_df = pd.read_csv('content/Unemployment.csv')
supply_df = pd.read_csv('content/Monthly New Houses.csv')
median_income_df = pd.read_csv('content/Median Income.csv')
stocks_df = pd.read_csv('content/Market Performance.csv')
cpi_df = pd.read_csv('content/Consumer Price Index.csv')
taxes_df = pd.read_csv('content/Property Taxes.csv')
intrest_df = pd.read_csv('content/Federal Intrest Rates.csv')
subsidies_df = pd.read_csv('content/Housing Subsidies.csv')
mortgage_df = pd.read_csv('content/MORTGAGE30US.csv')


In [9]:
CS_df = pd.read_csv('content/CSUSHPISA.csv')

CS_df["DATE"] = pd.to_datetime(CS_df["DATE"])

data_after_2000 = CS_df["DATE"] >= "2000-01-01"
CS_df = CS_df[data_after_2000]


CS_df.reset_index(inplace = True)
CS_df.drop(columns = ["index"], inplace = True)

CS_df["Year"] = pd.DatetimeIndex(CS_df["DATE"]).year
CS_df["Month"] = pd.DatetimeIndex(CS_df["DATE"]).month

CS_df

Unnamed: 0,DATE,CSUSHPISA,Year,Month
0,2000-01-01,100.552,2000,1
1,2000-02-01,101.339,2000,2
2,2000-03-01,102.126,2000,3
3,2000-04-01,102.922,2000,4
4,2000-05-01,103.678,2000,5
...,...,...,...,...
285,2023-10-01,312.946,2023,10
286,2023-11-01,313.629,2023,11
287,2023-12-01,314.338,2023,12
288,2024-01-01,315.297,2024,1


#### Merging all monthly data

In [10]:
merged_df = pd.DataFrame()
monthly_data = [CS_df, unemployment_df, supply_df, cpi_df, intrest_df,stocks_df,mortgage_df]
merged_df["DATE"] = CS_df["DATE"]
for data in monthly_data:
  data["DATE"] = pd.to_datetime(data["DATE"])
  merged_df = pd.merge(merged_df, data, on='DATE', how='inner')



In [11]:
percapita_df['DATE'] = pd.to_datetime(percapita_df["DATE"])
merged_df = pd.merge(merged_df,percapita_df, how = "left")
merged_df["Per_Capita_GDP"] = merged_df["Per_Capita_GDP"].interpolate()

In [12]:
merged_df

Unnamed: 0,DATE,CSUSHPISA,Year,Month,UNRATE,MSACSR,CPIAUCSL,FEDFUNDS,SPASTT01USM657N,MORTGAGE30US,Per_Capita_GDP
0,2000-01-01,100.552,2000,1,4.0,4.3,169.300,5.45,-0.526490,8.2100,49335.000000
1,2000-02-01,101.339,2000,2,4.1,4.3,170.000,5.73,-4.680037,8.3250,49593.000000
2,2000-03-01,102.126,2000,3,4.0,4.3,171.000,5.85,2.838298,8.2400,49851.000000
3,2000-04-01,102.922,2000,4,3.8,4.4,170.900,6.02,3.884009,8.1525,50109.000000
4,2000-05-01,103.678,2000,5,4.0,4.4,171.200,6.27,-1.061978,8.5150,50080.666667
...,...,...,...,...,...,...,...,...,...,...,...
285,2023-10-01,312.946,2023,10,3.8,7.9,307.531,5.33,-4.121365,7.6200,67513.000000
286,2023-11-01,313.629,2023,11,3.7,8.8,308.024,5.33,3.364690,7.4420,67576.000000
287,2023-12-01,314.338,2023,12,3.7,8.2,308.742,5.33,5.603572,6.8150,67639.000000
288,2024-01-01,315.297,2024,1,3.7,8.3,309.685,5.33,1.614541,6.6425,67702.000000


In [13]:
datas = [median_income_df,subsidies_df]
for data in datas:
    data["Year"] = pd.DatetimeIndex(data["DATE"]).year
    data = data.drop(columns=['DATE'])
    merged_df = pd.merge(merged_df, data, how = "left", on = "Year")


In [14]:
df = merged_df.dropna()

In [15]:
df = df.set_index("DATE")

In [16]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

In [17]:
df.drop(columns = ["Year", "Month"], inplace = True)

In [18]:
y = df.pop("CSUSHPISA")
X = df

In [19]:
scalar = MinMaxScaler()
X = scalar.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size= 0.2, random_state= 42)

In [20]:
model = LinearRegression()
model.fit(X_train, y_train)
pred = model.predict(X_test)
score = r2_score(pred, y_test)

In [21]:
print(score)

0.9299127972820801


In [22]:
from sklearn.metrics import mean_squared_error

score = mean_squared_error(pred,y_test)
score

115.06276426287084

In [23]:
from sklearn.metrics import mean_absolute_error

score = mean_absolute_error(pred,y_test)
score

8.92195434301409

In [24]:
coefs = model.coef_
cols = df.columns
for i in range(len(coefs)):
    print(f"The coefficient for {cols[i]} is {coefs[i]}")

The coefficient for UNRATE is 1.0810018510361155
The coefficient for MSACSR is 0.18349428222120484
The coefficient for CPIAUCSL is -148.02292367763243
The coefficient for FEDFUNDS is -1.8201193030442813
The coefficient for SPASTT01USM657N is -2.499907826897885
The coefficient for MORTGAGE30US is 81.44765306635736
The coefficient for Per_Capita_GDP is 182.35515726979628
The coefficient for MEHOINUSA672N is -12.409870671812188
The coefficient for L312051A027NBEA is 178.2744155997099


#### The high R2 score implies this model performs well and the model eplains the majority if the variability


Linear Regression was used and many of the basic variables used were found in the following paper: Macroeconomic Factors Affecting Housing Prices: Take the United States as an Example - Xinying Ding