### What is linear regression?
Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.

The term “linearity” in algebra refers to a linear relationship between two or more variables. If we draw this relationship in a two-dimensional space (between two variables), we get a straight line.

Linear regression performs the task to predict a dependent variable value **(y)** based on a given independent variable **(x)**. So, this regression technique finds out a linear relationship between **x (input)** and **y(output)**. Hence, the name is Linear Regression. If we plot the independent variable **(x)** on the x-axis and dependent variable **(y)** on the y-axis, linear regression gives us a straight line that best fits the data points, as shown in the figure below.

We know that the equation of a straight line is basically:



**The equation of the above line is :**

### Y= mx + b

![](https://cdn-images-1.medium.com/max/800/1*weGmaJTZewji5_9H2TZetA.png)


### Import libraries

In [None]:
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

### Load Dataset


In [None]:
realestate_df=pd.read_csv('../input/real-estate-price-prediction/Real estate.csv')
realestate_df.head()

In [None]:
realestate_df.shape

### Dataset Information


In [None]:
realestate_df.info()

In [None]:
realestate_df.describe()

### Visuzalization

In [None]:
sns.pairplot(realestate_df,diag_kind='kde')

In [None]:
sns.displot(realestate_df['Y house price of unit area'],kde=True)

In [None]:
corr_df=realestate_df.corr()
corr_df

In [None]:
sns.heatmap(corr_df,annot=True,cmap='Blues')

## Data Preprocessing


In [None]:
X=realestate_df.drop(['No','Y house price of unit area'],axis=1)
y=realestate_df['Y house price of unit area']

### Train - Test Split


In [None]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=101)

### Linear Regression Model


In [None]:
model=LinearRegression()

In [None]:
model.fit(X_train,y_train)

In [None]:
pd.DataFrame(model.coef_,X.columns,columns=['Coeficient'])

### Prediction 

In [None]:
y_pred=model.predict(X_test)

In [None]:
pd.DataFrame(data={'predictions': y_pred, 'actual': y_test})


In [None]:
pd.DataFrame(model.coef_,X.columns,columns=['Coeficient'])

###  Regression Evaluation Metrics

**MAE (Mean Squared Error)**
* calculated as the average of the absolute error values. Absolute or **abs()** is a mathematical function that simply makes a number **positive**. Therefore, the difference between an expected and predicted value may be positive or negative and is forced to be positive when calculating the **MAE**

**MSE (Mean Squared Error)**
* calculated as the mean or average of the squared differences between predicted and expected target values in a dataset.he squaring also has the effect of inflating or magnifying large errors. That is, the larger the difference between the predicted and expected values, the larger the resulting squared positive error. This has the effect of **“punishing”** models more for larger errors when MSE is used as a loss function. It also has the effect of **“punishing”** models by inflating the average error score when used as a metric.

**RMSE (Root Mean Squared Error)**
* an extension of the mean squared error.Importantly, the square root of the error is calculated, which means that the units of the RMSE are the same as the original units of the target value that is being predicted.

In [None]:
MAE=metrics.mean_absolute_error(y_test,y_pred)
MSE=metrics.mean_squared_error(y_test,y_pred)
RMSE=np.sqrt(MSE)

pd.DataFrame([MAE,MSE,RMSE],index=['MAE', 'MSE', 'RMSE'], columns=['Metrics'])

In [None]:
realestate_df['Y house price of unit area'].mean()

### Residual plots
Residual plots can be used to assess the quality of a regression
These residual plots can be used to assess the quality of the regression. You can examine the underlying statistical assumptions about residuals such as constant variance, independence of variables and normality of the distribution. For these assumptions to hold true for a particular regression model, the residuals would have to be randomly distributed around zero.

Different types of residual plots can be used to check the validity of these assumptions and provide information on how to improve the model. For example, the scatter plot of the residuals will be disordered if the regression is good. The residuals should not show any trend. A trend would indicate that the residuals were not independent. On the other hand, a histogram plot of the residuals should exhibit a symmetric bell-shaped distribution, indicating that the normality assumption is likely to be true.
$$y(test)-y(pred)$$

In [None]:
test_residuals=y_test-y_pred

In [None]:
sns.scatterplot(x=y_test, y=y_pred)
plt.xlabel('Y-Test')
plt.ylabel('Y-Pred')

In [None]:
sns.scatterplot(x=y_test, y=test_residuals)
plt.axhline(y=0, color='r', ls='--')

In [None]:
sns.distplot(test_residuals, bins=25, kde=True)