# Polynomial Regression 



**For knowing more about Linear Regression on this Dataset I invite you to refer to my previous notebook about [Linear Regression Model for Real Estate](https://www.kaggle.com/amirkonjkav/linear-regression-model-for-real-estate).**


## Linear regression
#### requires the relation between the dependent variable and the independent variable to be linear. What if the distribution of the data was more complex as shown in the below figure? Can linear models be used to fit non-linear data? How can we generate a curve that best captures the data as shown below?

In [None]:
from IPython.display import Image
Image("../input/imagefolder/A.png")

#### It is very difficult to fit a linear regression line in the above graph with a low value of error. Hence we can try to use the polynomial regression to fit a polynomial line so that we can achieve a minimum error or minimum cost function

# Polynomial

Polynomial regression is a form of Linear regression where only due to the Non-linear relationship between dependent and independent variables we add some polynomial terms to linear regression to convert it into Polynomial regression.

## Import Libraries

In [None]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
df=pd.read_csv('/kaggle/input/real-estate-price-prediction/Real estate.csv')
df.head()

In [None]:
df.describe()

## Define X and y

In [None]:
X = df.drop('Y house price of unit area', axis=1)

y = df['Y house price of unit area']

## Preprocessing

The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators

In [None]:
poly_converter=PolynomialFeatures(degree=2, include_bias=True)

poly_features= poly_converter.fit_transform(X)

In [None]:
print('shape of X is :',X.shape)
print('shape of X after using polynomial :',poly_features.shape)

### compare of these shapes show us that our features expand from 7 to 36!


## split data for Train and Test

In [None]:
X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3)

## Polynomial Regression Model

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

##  Prediction

In [None]:
y_pred = model.predict(X_test)
pd.DataFrame({'Y_Test': y_test,'Y_Pred':y_pred, 'Residuals':(y_test-y_pred) }).head()

In [None]:
MAE_Poly = metrics.mean_absolute_error(y_test, y_pred)
MSE_Poly = metrics.mean_squared_error(y_test, y_pred)
RMSE_Poly = np.sqrt(MSE_Poly)

pd.DataFrame([MAE_Poly, MSE_Poly, RMSE_Poly], index=['MAE', 'MSE', 'RMSE'], columns=['metrics'])

## Compare to the simple linear regression

In [None]:
XS_train, XS_test, ys_train, ys_test = train_test_split(X, y, test_size=0.3, random_state=101)

simplemodel = LinearRegression()
simplemodel.fit(XS_train, ys_train)
ys_pred = simplemodel.predict(XS_test)

MAE_simple  = metrics.mean_absolute_error(ys_test,ys_pred)
MSE_simple  = metrics.mean_squared_error(ys_test,ys_pred)
RMSE_simple = np.sqrt(MSE_simple)


pd.DataFrame({'Poly Metrics': [MAE_Poly, MSE_Poly, RMSE_Poly], 
              'Simple Metrics':[MAE_simple, MSE_simple, RMSE_simple]}, 
               index=['MAE', 'MSE', 'RMSE'])

* **We see there is no significant difference between simple and polynomial regression**

## Choose the best degree 

**We make a loop for surveying polynomial from two degree to 10 degree**

In [None]:
# Train List of RMSE per degree
train_RMSE_list=[]

#Test List of RMSE per degree
test_RMSE_list=[]

for d in range(1,8):

    #create poly data set for degree (d)
    polynomial_converter= PolynomialFeatures(degree=d, include_bias=True)
    poly_features= polynomial_converter.fit_transform(X)

    
    #Split the dataset
    X_train, X_test, y_train, y_test = train_test_split(poly_features, y, test_size=0.3,random_state=101)
    
    #Train the Polynomial Model
    model = LinearRegression()
    model.fit(X_train, y_train)
    
    #Predicting on both Train & Test Data
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    
    #Evaluating the Model
    train_RMSE = np.sqrt(metrics.mean_squared_error(y_train, y_train_pred))
    test_RMSE = np.sqrt(metrics.mean_squared_error(y_test, y_test_pred))
    
    #Append the RMSE to the Train and Test List 
    train_RMSE_list.append(train_RMSE)
    test_RMSE_list.append(test_RMSE)
    

In [None]:
plt.plot(range(1,8), train_RMSE_list, label='Train RMSE')
plt.plot(range(1,8), test_RMSE_list, label='Test RMSE')

plt.xlabel('Polynomial Degree')
plt.ylabel('RMSE')
plt.legend()

* **The graph shows that with increasing of degree our errors increasing too**