# Evaluating a Linear Regression Model


## 1. Introduction

   **Regression problems** are supervised learning problems in which the response is continuous
     **Linear regression** is a technique that is useful for regression problems.
    **Classification problems** are supervised learning problems in which the response is categorical

## Benefits of linear regression

*    widely used
*    runs fast
*    easy to use (not a lot of tuning required)
*    highly interpretable
*    basis for many other methods



# 2. Liberaries
*     https://scikit-learn.org/

In [None]:
# imports
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
import matplotlib.pyplot as plt
import numpy as np

# allow plots to appear directly in the notebook
%matplotlib inline

# 3. Read Real Estate
* **take a look at some data, ask some questions about that data**

In [None]:
df=pd.read_csv('/kaggle/input/real-estate-price-prediction/Real estate.csv')

In [None]:
df.head()

In [None]:
df.shape

In [None]:
df.info()

In [None]:
plt.figure(figsize=(8,5))
sns.displot(x=df['Y house price of unit area'], kde=True, aspect=2, color='purple')
plt.xlabel('house price')

In [None]:
plt.figure(figsize=(8,5))
sns.heatmap(df.corr(),annot=True)

# Simple Linear Regression

* Simple linear regression is an approach for predicting a quantitative response using a single feature (or "predictor" or "input variable")
    It takes the following form:
    y = β0 + β1 x

**What does each term represent?**  

*y* is the response

*x* is the feature

*β0* is the intercept

*β1* is the coefficient for x

*β0* and *β1* are called the model coefficients

* To create your model, you must "learn" the values of these coefficients. Once we've learned these coefficients, we can use the model to predict

# 4. Using Train/Test Split

In [None]:
X = df.drop('Y house price of unit area', axis=1)
y = df['Y house price of unit area']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

# 5. Make Model

In [None]:
model = LinearRegression()
model.fit(X_train,y_train)
pd.DataFrame(model.coef_, index= X.columns , columns=['Coef'])

# 6. Using the Model for Prediction

### Model Evaluation Metrics for Regression

**What metrics can we used for regression problems?**

**Mean Absolute Error (MAE)** is the mean of the absolute value of the errors

**Mean Squared Error (MSE)** is the mean of the squared errors

**Root Mean Squared Error (RMSE**) is the square root of the mean of the squared errors

In [None]:
y_pred = model.predict(X_test)
MAE = metrics.mean_absolute_error(y_test, y_pred)
MSE = metrics.mean_squared_error(y_test, y_pred)
RMSE = MSE ** (1/2)

pd.DataFrame([MAE, MSE, RMSE], index=['MAE', 'MSE', 'RMSE'], columns=['Metrics'])

In [None]:
df['Y house price of unit area'].mean()

## 7. Interpreting Model Coefficients

* ### Due to the average of house price(mean is 38.98) and our RMSE(is 6.77) we figure out our prediction have a variance near to 6

## 8. Residuals Plot

In [None]:
residuals = y_test - y_pred
sns.scatterplot(x=y_test,y=y_pred)

In [None]:

sns.scatterplot(x=y_test,y=residuals)
plt.axhline(y=0, color='g', ls='--')

**Residual plot must not have a pattern**

In [None]:
sns.displot(residuals, bins=30, kde=True)

* **must have a normal distribution**