**Hello everyone!**

In this notebook we continue to know one more ML algorithms. I recently write a notebook for using simple Linear Regression that you can check it here : https://www.kaggle.com/javadmaddah/linear-regression-on-real-estate-ds-week-3-1

But now we are gonna use Polynomial Regression that gives us more power to predict Target variables when realationship with features are non-linear.

You can learn more about Polynomial Regression here: 

https://en.wikipedia.org/wiki/Polynomial_regression (English)

https://b.fdrs.ir/3xe (Persian)
    
like before, We use Real Estate dataset to make a model that predict house prices based on 6 features. In the following, You can realize more about dataset.

In [None]:
#first we import all the libraries we're gonna need.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# 1. Import & getting basic info of Dataset

In [None]:
df = pd.read_csv('../input/real-estate-price-prediction/Real estate.csv')

In [None]:
df.info()

In [None]:
df.head()

# 2. Exploratory Data Analysis

In [None]:
sns.pairplot(df)

As we could see, there are 5 factors that maybe have effects on price. Transaction date, house age, distance to the nearest MRT station, number of convenience stores, latitude and longitude.

with correlation analysis we can figure out the basis information about dependence between features.

In [None]:
df.corr()

Based on this information X3 factor had the most negetive correlation with price. (Although in Linear model the Coefficient of this feature was near 0) 

Also we can see the strong correlation between longitude and distance to the nearest MRT station.

# 3. Features and Target Variable

In [None]:
#we can delet 'No' column to increase power of model.

df.drop(['No'], axis = 1, inplace = True)

#X : Features
#y : Target variable

X = df.drop(['Y house price of unit area'], axis = 1)
y = df['Y house price of unit area']


# 4. Train & Test set

In [None]:
#we use train_test_split from sklearn.model_selection to devide dataset to train and test set.

from sklearn.model_selection import train_test_split 

#train set in a bigger sample of dataset that model uses to learn.
#test set in smaller sample of dataset that model should be evaluated in.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

# 5. Adjusting Model Parameters

When we want to create a Polynomial Regression model we should know the "degree" parametere. Degree parametere consider interaction terms between features. In consider the interaction of 2 or more features Coefficient on the target variable. 

So here we make a loop that build model with a range of degrees to realize which degree cause the minimum error of all.

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures #we use this for creating new columns of interaction terms
from sklearn import metrics ##we use this function to compare sets

# Train List of RMSE per degree

train_RMSE_list = []

#Test List of RMSE per degree

test_RMSE_list = []

for d in range(1,10):
    
    #1: Preprocessing
    #1-1 : create poly data set for degree (d)
    
    polynomial_converter = PolynomialFeatures(degree=d, include_bias=False) #it makes new dataset with considernig interaction terms
    poly_features= polynomial_converter.fit(X) 
    poly_features= polynomial_converter.transform(X)
    
    #2: Split the dataset
    
    X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=101)
    
    #3: Train the Model
    
    polymodel = LinearRegression() #our model is a Linear Regression
    polymodel.fit(X_train, y_train) #model.fit builds the model base on train set and returns the Coefficient of each feature.
    
    #4: Predicting on both Train & Test Data
    
    y_train_pred = polymodel.predict(X_train) 
    y_test_pred = polymodel.predict(X_test)
    
    #5: Evaluating the Model
    
    #5-1: RMSE of Train set
    train_RMSE = np.sqrt(metrics.mean_squared_error(y_train, y_train_pred))
    
    #5-2: RMSE of Test Set
    test_RMSE=np.sqrt(metrics.mean_squared_error(y_test, y_test_pred))
    
    #Append the RMSE to the Train and Test List
    
    train_RMSE_list.append(train_RMSE)
    test_RMSE_list.append(test_RMSE)

**Plot the Polynomial degree VS RMSE**

Now we have to list with different values of total error for each value of degree in our Specified range. We can plot a line bar to see which degree is the best choice for builing model.

In [None]:
plt.plot(range(1,6), train_RMSE_list[:5], label='Train RMSE')
plt.plot(range(1,6), test_RMSE_list[:5], label='Test RMSE')

plt.xlabel('Polynomial Degree')
plt.ylabel('RMSE')
plt.legend()

It seems degree = 2 is the best choice for model.

# 6. Build the Model

In this step we know what's the best parameter for model and we can build model. Of course we created 10 model in last step, But for generating the model compeletly we build the Specific model again.

In [None]:
#1: Preprocessing

polynomial_converter = PolynomialFeatures(degree = 2, include_bias=False)
poly_features= polynomial_converter.fit(X) 
poly_features= polynomial_converter.transform(X)

#2: Split the dataset
    
X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3, random_state=101)
    
#3: Train the Model
    
polymodel = LinearRegression() #our model is a Linear Regression
polymodel.fit(X_train, y_train) #model.fit builds the model base on train set and returns the Coefficient of each feature.
    
#4: Predicting on both Train & Test Data
    
y_pred = polymodel.predict(X_test)

#4: Evaluating the Model

MAE = metrics.mean_absolute_error(y_test, y_pred) 
MSE = metrics.mean_squared_error(y_test, y_pred)  
RMSE = np.sqrt(MSE) 

pd.DataFrame([MAE, MSE, RMSE], index=['MAE', 'MSE', 'RMSE'], columns=['Metrics'])

In [None]:
print('Predict mean =',np.mean(y_pred),'\nReal mean =',np.mean(df['Y house price of unit area']))
print(abs(np.mean(y_pred) - np.mean(df['Y house price of unit area'])),' is diffrence.')