# Linear Regression

To predict our rental housing prices, we will begin with a linear regression model. To find out more about the maths behind regression, check out the [following article](https://onlinestatbook.com/2/regression/regression.html).

In [5]:
"""
A simple linear regression model with all predictors and target variable of
price of unit area.
"""

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

import pandas as pd
import pickle

In [11]:
df = pd.read_excel("../../data/processed/real_estate.xlsx")

# Separates data into "X" and "y"
X = df.drop(columns="price_unit_area")
y = df["price_unit_area"]

# Creates training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Creates linear regression based on testing data
reg = LinearRegression().fit(X_train, y_train)

# Calculates r-squared and means squared error for the linear regression
r_squared = reg.score(X_test, y_test)
print("R-squared:", r_squared)

y_pred = reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("MSE:", mse)


# Serializes and saves the regression model
pickle.dump(reg, open("saved_models/linreg.sav", 'wb'))

R-squared: 0.4894348008682481
MSE: 129.85749176246205
