
<h1> <p style=" font-family: "Times New Roman", Times, serif;">Introduction</p></h1>

<i><b>Traditional approaches to valuing real estate can lean towards the qualitative side, relying more on intuition over sound rationale. Linear regression analysis, however, can offer a robust model  transactions in an area, to provide better guidance on property valuations.</b></i>

<p style="font-family:verdana;">Regression analysis offers a more scientific approach for real estate valuation</p>
<ul>  
<li>Traditionally, there are three approaches for valuing property: comparable sales, income, and cost.</li>
<li>Regression models provide an alternative that is more flexible and objective. It is also a process that once a model is made, becomes autonomous, allowing for real estate entrepreneurs to focus on their core competencies.</li>
<li>A model can be built with numerous variables that are tested for impact on the value of a property, such as square footage and the number of bedrooms.</li>
<li>Regressions are not a magic bullet. There is always the danger that variables contain autocorrelation and/or multicollinearity, or that correlation between variables is spurious</li>
</ul>


<p style="font-family:verdana;">This notebook  is about  build a Linear regression model for predicting the house price.</p>

<h2>Dataset</h2>
The data set is Real estate price prediction that is used for regression analysis, mutiple regression,linear regression, prediction. Since house price is a continues variable, this is a regression problem. The data contains 8columns that include sixFeatures(X) and one Label(y): house price of unit area.

<h3>Import libraries and dataset.</h3>

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

In [None]:
#Create dataframe from a Real estate price prediction dataset.

df = pd.read_csv('../input/real-estate-price-prediction/Real estate.csv')


<h3>check out the data</h3>

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.shape

In [None]:
df.describe

<h3>EDA</h3>

In [None]:
g= sns.pairplot(df)
g.map_upper(plt.scatter)

In [None]:
# find the pairwise correlation of all columns in the dataframe.


df.corr()

In [None]:
#Heatmap for correlation
sns.heatmap(df.corr(), annot=True,cmap='winter')

In [None]:
plt.figure(figsize=(10,4))
sns.displot(df['Y house price of unit area'],kde=True,bins=20, aspect=2)
plt.xlabel('house price of unit area')

In [None]:
plt.figure(figsize=(8, 8), dpi=50)

sns.rugplot(df['Y house price of unit area'], height=0.2)



In [None]:
plt.figure(figsize=(5, 5), dpi=100)

sns.scatterplot(data=df, y=df['Y house price of unit area'], x=df['X1 transaction date'] , hue= 'X2 house age', palette="rocket")


In [None]:
plt.figure(figsize=(5, 5), dpi=100)

sns.scatterplot(data=df, y=df['Y house price of unit area'], x=df['X3 distance to the nearest MRT station'] , hue= 'X4 number of convenience stores', palette="rocket")

In [None]:
plt.figure(figsize=(5, 5), dpi=100)

sns.scatterplot(data=df, y=df['Y house price of unit area'], x=df['X5 latitude'] , hue= 'X6 longitude', palette="rocket")

<h1>Training a Linear Regression Model</h1>

<p>First <b>split</b> up the data into an X array that contains the <b>features</b> to train on, and a y array with the <b>target</b> variable, in this case the (Y house price of unit area) column.<p>

In [None]:
X = df.drop('Y house price of unit area',axis=1)
y = df['Y house price of unit area']


<p><b>Split</b> a data into <b>train</b> and <b>test</b></p>

In [None]:
# train out model on the training set and then use the test set to evaluate the model.


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3,random_state = 101)


<h3>Training model</h3>

In [None]:
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit (X_train, y_train)

<H3>Regression</H3>


In [None]:
lin_reg.coef_

pd.DataFrame(lin_reg.coef_, X.columns, columns=['Coedicients'])

<h3>Test data predictions</h3>

In [None]:
y_pred = lin_reg.predict(X_test)

<h3>Regression Evaluation Metrics</h3>
<ul><b>
    <li>1-MeanAbsolute Error.</li>
    <li>2-Mean Squared Error.</li>
    <li>3-R^2</li>
   </b>
  </ul>

In [None]:
MAE = metrics.mean_absolute_error(y_test,y_pred)
MSE = metrics.mean_squared_error(y_test,y_pred)
RMSE = np.sqrt(MSE)

pd.DataFrame([MAE,MSE,RMSE],index=['MAE', 'MSE', 'RMSE'], columns=['Metrics'])


In [None]:
df['Y house price of unit area'] .mean()

<h3>Residual plots</h3>

In [None]:
test_residuals=y_test - y_pred

In [None]:
sns.scatterplot(x=y_test, y=y_pred)
plt.xlabel('Y-Test')
plt.ylabel('Y-Pred')

In [None]:
sns.displot(test_residuals, bins=10, kde=True ,color='g', edgecolor='white', linewidth=5)


In [None]:
sns.scatterplot(x=y_test, y=test_residuals, color = 'g')
plt.axhline(y=0, color='r', ls='--')