**Linear Regression with Assumptions**

1. Regression analysis is one of the most widely used methods for prediction.

* Here I am explaining the linear regression algorithm with assumption.  

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

1. Linear regression is a **linear model**, e.g. a model that assumes a linear relationship between the input variables (x) and the single output variable (y).

2. More specifically, that y can be calculated from a linear combination of the input variables (x).

3. When there is a **single input variable (x)**, the method is referred to as simple linear regression and When more than **one input variable** multiple input variables.

4. Different techniques can be used to prepare or train the linear regression equation from data, the most common of which is called **Ordinary Least Squares**. It is common to therefore refer to a model prepared this way as Ordinary Least Squares Linear Regression or just Least Squares Regression.

**Linear Regression Equation:-**

>                                 y = B0 + B1*x

1. The linear equation assigns one scale factor to each input value or column, called a coefficient and represented by the capital Greek letter Beta (B). 

2. One additional coefficient is also added, giving the line an additional degree of freedom (e.g. moving up and down on a two-dimensional plot) and is often called the intercept or the bias coefficient.

3. In higher dimensions when we have more than one input (x), the line is called a plane or a hyper-plane.

**Types**

*    There are many more techniques because the model is so well studied. 

**1. Simple Linear Regression**

With simple linear regression when we have a single input, we can use statistics to estimate the coefficients.

This requires that you calculate statistical properties from the data such as means, standard deviations, correlations and covariance. All of the data must be available to traverse and calculate statistics.


**2. Ordinary Least Squares**

When we have more than one input we can use Ordinary Least Squares to estimate the values of the coefficients.

The Ordinary Least Squares procedure seeks to minimize the sum of the squared residuals. This means that given a regression line through the data we calculate the distance from each data point to the regression line, square it, and sum all of the squared errors together.

This approach treats the data as a matrix and uses linear algebra operations to estimate the optimal values for the coefficients. It means that all of the data must be available and you must have enough memory to fit the data and perform matrix operations.

**3. Gradient Descent**

When there are one or more inputs you can use a process of optimizing the values of the coefficients by iteratively minimizing the error of the model on your training data.

This operation is called Gradient Descent and works by starting with random values for each coefficient. The sum of the squared errors are calculated for each pair of input and output values. A learning rate is used as a scale factor and the coefficients are updated in the direction towards minimizing the error. The process is repeated until a minimum sum squared error is achieved or no further improvement is possible.

When using this method, you must select a learning rate (alpha) parameter that determines the size of the improvement step to take on each iteration of the procedure

**4. Regularization**

There are extensions of the training of the linear model called regularization methods. 

These seek to both minimize the sum of the squared error of the model on the training data (using ordinary least squares) but also to reduce the complexity of the model (like the number or absolute size of the sum of all coefficients in the model).

*  Two popular examples of regularization procedures for linear regression are:

**Lasso Regression:**
        where Ordinary Least Squares is modified to also minimize the absolute sum of the coefficients (called L1 regularization).
        
        
**Ridge Regression:**
        where Ordinary Least Squares is modified to also minimize the squared absolute sum of the coefficients (called L2 regularization).
These methods are effective to use when there is collinearity in your input values and ordinary least squares would overfit the training data.

1. Indenpendent variable(X=YearsExperience) and Dependent variable(y=Salary)

In [None]:
#import required library 

import numpy as np
import pandas as pd
from pandas import Series,DataFrame
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [None]:
# read the dataset 

df = pd.read_csv("../input/salary-data-dataset-for-linear-regression/Salary_Data.csv")

**EDA**

In [None]:
df.head()

In [None]:
#check number of rows and number of columns
df.shape     #30 rows and 2 columns

In [None]:
#check dataset information 

df.info()

* Here total 30 examples with no any missing values. 

* one columns is float and another is intger.

In [None]:
df.describe()

In [None]:
#check any missing values

df.isnull().sum()

In [None]:
#any outliers

df.skew()

In [None]:
df.kurt()

* All values are approx. same

In [None]:
#check any correlation and covariance

df[['YearsExperience','Salary']].cov()

In [None]:
df.corr()

**Visualization**

In [None]:
sns.heatmap(df.corr(),annot=True)
plt.show()

In [None]:
sns.pairplot(df)
plt.show()

1. * Here, clearly shows that linearly relationship between dependent and independent variable. 

**Independent and Dependent Variables**

In [None]:
X=df.drop('Salary',axis=1)

In [None]:
y=df.Salary

In [None]:
X.head()

In [None]:
y.head()

**Splitting the data**

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,test_size=0.30)

In [None]:
print(X_train.shape)

In [None]:
X_test.shape

**Model Fitting**

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
LR=LinearRegression()

In [None]:
LR.fit(X_train,y_train)

In [None]:
LR.intercept_   # beta 0

In [None]:
LR.coef_        # beta 1

**Prediction**

In [None]:
y_pred=LR.predict(X_test)

In [None]:
y_pred

In [None]:
y_test

**Evaluation**

In [None]:
from sklearn import metrics

In [None]:
R2=metrics.r2_score(y_test,y_pred)

In [None]:
R2

In [None]:
print(metrics.mean_absolute_error(y_test,y_pred))

In [None]:
print(metrics.mean_squared_error(y_test,y_pred))

In [None]:
print(np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

In [None]:
print(LR.predict([[5]])) 

**Assumptions**

In [None]:
error= y_test-y_pred

In [None]:
error

**No Autocorrelation**

In [None]:
import statsmodels.api as smt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

In [None]:
acf=plot_acf(error)
plt.show()

**Normality**

In [None]:
sns.distplot(error)
plt.xlabel('residual')
plt.show()

**Linearity**

In [None]:
sns.regplot(X,y)
plt.show()

**Homoscedasticity**

In [None]:
sns.scatterplot(x=y_pred,y=error)
plt.xlabel('predicted values')
plt.ylabel('residuals')
plt.xlim([0,150000])
plt.ylim([-8000,8000])
sns.lineplot([0,150000],[0,0],color='blue')
plt.show()

**Multicollinearity**

In [None]:
sns.heatmap(df.corr(),annot=True)
plt.show()