# Example of Predictive Modelling Using Linear and polynomial Regression
1. > We want to predict the housing price (per unit area) by using Real estate database.
1. > real estate database has six features consist of (transaction date, house age, distance to the nearest, number of convenience stores, latitude) MRT station, and 414 observations with house price of the unit area as targets
1. > Prediction Modelling is like: y = coeff1*featres1 + coeff2*featres2 + coeff3*featres3 + coeff4*featres4...
1. > Polynomial Modelling is like: y = coeff1*featres1 + coeff2*featres2 + coeff3*featres1^2 + coeff4*featres2^2 + coeff5*featres1*featres2

* Add libraries 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import pandas as pd

from sklearn.model_selection import train_test_split as tts
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn.preprocessing import PolynomialFeatures

* Load dataset and show info

In [None]:
df = pd.read_csv('../input/real-estate-price-prediction/Real estate.csv')
display(df)
df.info()
df.describe()

* Separated features and target
* Separated test and train data

In [None]:
X = df.drop(["No","Y house price of unit area"],axis = 1)
y = df["Y house price of unit area"]
X_train, X_test, y_train, y_test = tts(X,y,test_size=0.1,random_state=101)

* Data analysis
1. Distribution between features
    * Transaction date is a useless features, maybe if it was a statistical feature of date will be useful.

In [None]:
sns.pairplot(X_train, diag_kind = "hist")

2. Correlation between features and target
* The best featurea for predict are distance to the nearest MRT station, number of convenience stores, latitude,longitude.

In [None]:
df.corr()

* Predictive Model: Linear
 1. make a class

In [None]:
model_linear = LinearRegression()

     2. Fit your data on it

In [None]:
model_linear.fit(X_train,y_train)

    3. predict your test data

In [None]:
y_actual = model_linear.predict(X_test)

* Calculating regression metrics

In [None]:
MAE = metrics.mean_absolute_error(y_test,y_actual)
MSE = metrics.mean_squared_error(y_test,y_actual)
RMSE = np.sqrt(MSE)

print(pd.DataFrame([MAE,MSE,RMSE],['MAE','MSE','RMSE'],columns = ['Metrics']))


    1. Compare RMSE with average of y_actual

In [None]:
print(f'Error is: {y_actual.mean()} +- {RMSE}')

* Residual plots

In [None]:
sns.scatterplot(x = y_test,y = y_test-y_actual)
plt.xlabel("y_test")
plt.ylabel("Residual")
plt.axhline(y=0, color = 'r', ls = '-')

* Predictive Model: Polynominal
 1. preparation features

In [None]:
def modelpoly(degree):
    PolynomialConverter = PolynomialFeatures(degree = degree, include_bias= False)
    return PolynomialConverter
modelpoly_deg2 = modelpoly(2)
poly_deg2_fea = modelpoly_deg2.fit(X)
poly_deg2_fea = modelpoly_deg2.transform(X)
print('shpape of new features = ',poly_deg2_fea.shape)

* Predictive Model: Polynominal
 1. preparation features

* Separated features and target
* Separated test and train data

In [None]:
X = df.drop(["No","Y house price of unit area"],axis = 1)
y = df["Y house price of unit area"]
X_train, X_test, y_train, y_test = tts(poly_deg2_fea,y,test_size=0.1,random_state=101)

* Data analysis
1. Distribution between features
    * Transaction date is a useless features, maybe if it was a statistical feature of date will be useful.

In [None]:
sns.pairplot(pd.DataFrame(X_train), diag_kind = "hist")

2- Calculated correlation 

In [None]:
pd.DataFrame(X_train).corr()

* Predictive Model: Linear
 1. make a class

In [None]:
model_linear = LinearRegression()

     2. Fit your data on it

In [None]:
model_linear.fit(X_train,y_train)

    3. predict your test data

In [None]:
y_actual = model_linear.predict(X_test)

* Calculating regression metrics

In [None]:
MAE = metrics.mean_absolute_error(y_test,y_actual)
MSE = metrics.mean_squared_error(y_test,y_actual)
RMSE = np.sqrt(MSE)

print(pd.DataFrame([MAE,MSE,RMSE],['MAE','MSE','RMSE'],columns = ['Metrics']))


    1. Compare RMSE with average of y_actual

In [None]:
print(f'Error is: {y_actual.mean()} +- {RMSE}')

* Residual plots

In [None]:
sns.scatterplot(x = y_test,y = y_test-y_actual)
plt.xlabel("y_test")
plt.ylabel("Residual")
plt.axhline(y=0, color = 'r', ls = '-')