<a href="https://colab.research.google.com/github/socialx-indonesia/bda-tpcc/blob/main/python/002_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

*SocialX Indonesia - Muhammad Apriandito*


---






# **Regression: House Price Prediction**

A property developer in Raccoon city wants to create a model that can predict house prices based on land area (square feet) using simple linear regression.

#### **Setup**
To create the model. we need to install and load python pakcage to be used. Because we use Google Collaboratory where all the packages have been installed, we just need to load the package.

In [None]:
# Load packages
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

# Load modules
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Set Parameter
plt.rcParams['figure.figsize'] = (16, 9)
plt.style.use('ggplot')
warnings.filterwarnings('ignore')

After the required python packages are installed and loaded, we can load our data into the python environment.

In [None]:
# Import Dataset
df = pd.read_csv('https://raw.githubusercontent.com/socialx-indonesia/bda-tpcc/main/data/house-price.csv', sep =";")

#### **Data Exploration**

In [None]:
# Print Data
df.head(5)

In [None]:
# Check Data Information
df.info()

In [None]:
# Descriptive Statistics
df.describe()

In [None]:
# Visualize the data using scatterplot
sns.scatterplot(x="square_feet", y="house_price", data= df)

#### **Create Simple Linear Regression Model**

In [None]:
# Set Variable
X = df[['square_feet']]
y = df['house_price']

In [None]:
# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [None]:
# Set model
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)

In [None]:
# Get Intercept and Coefficent 
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

In [None]:
# Visualize the regression model that has been created
plt.scatter(X_train, y_train)
plt.plot(X_train, regr.predict(X_train), color = "green")
plt.show()

#### **Model Evaluation**

In [None]:
# Make a prediction to test data
y_pred = regr.predict(X_test)

In [None]:
# Create a dataframe
prediction_result = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
prediction_result

In [None]:
print("MSE:",  mean_squared_error(y_test, y_pred))
print("R2:",  r2_score(y_test, y_pred))
print('RMSE:', np.sqrt(mean_squared_error(y_test, y_pred)))

#### **Deployment**

In [None]:
# Make a prediction to a new data
new_square_feet = 2000
print ('Predicted  house price: \n', regr.predict([[new_square_feet]]))