## Some linear regression models are used in this notebook

### Introduction

**I'm trying to improve myself in machine learning. And I enjoy sharing what I learned here. Today, I will share 3 linear regression models. Good reading!**

1.  [Ridge Regression](#1) 
    * [Ridge Regression Model](#1.1)
    * [Ridge Regression Predict](#1.2)
    * [Ridge Regression Model Tuning](#1.3)
    * [Ridge Regression Final Model](#1.4) 
2. [Lasso Regression](#2)
    * [Lasso Regression Model](#2.1)
    * [Lasso Regression Predict](#2.2)
    * [Lasso Regression Model Tuning](#2.3)
    * [Lasso Regression Final Model](#2.4)  
3.  [ElasticNet Regression](#3)
    * [ElasticNet Regression Model](#3.1) 
    * [ElasticNet Regression Predict](#3.2) 
    * [ElasticNet Regression Model Tuning](#3.3) 
    * [ElasticNet Regression Final Model](#3.4) 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load


# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**Importing Libraries**

In [None]:
import missingno as msno
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import Lasso, Ridge, ElasticNet
from sklearn.linear_model import LassoCV, RidgeCV, ElasticNetCV
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
df = pd.read_csv("/kaggle/input/hitters/Hitters.csv") #reads the file
df.head() #returns the first 5 values

In [None]:
df.info() 

There are 20 columns in total.
* 16 - integer
* 3 - object or string
* 1 - float

In [None]:
df.isnull().sum() #returns the number of null values in columns

In [None]:
msno.bar(df);  # visualizing missing values

In [None]:
df = df.dropna() 
dms = pd.get_dummies(df[["League", "Division", "NewLeague"]])
y = df[["Salary"]]
X_ = df.drop(["Salary", "League", "Division", "NewLeague"], axis = 1).astype("float64")
X = pd.concat([X_, dms[["League_N", "Division_W", "NewLeague_N"]]], axis = 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 42)

In [None]:
X.head()

In [None]:
y.head()

<a id = "1"></a>
# <font color = "#483D8B"> Ridge Regression </font>

**Regression models are a method of predicting the dependent variable using independent variables. 
One of them, linear regression, is a method of predicting output by establishing a linear relationship between using independent inputs. Ridge regression is a type of regression that uses L2 regulation.**
** Our goal is to minimize the sum of squares of the difference between real values ​​(y) and the regression curve we create. **

<a id = "1.1"></a>
## Ridge Regression Model 

In [None]:
ridge = Ridge().fit(X_train, y_train) #we created a model
ridge

In [None]:
ridge.coef_ #coefficients

In [None]:
ridge.intercept_ #fixed value - b0

<a id = "1.2"></a>
## Ridge Regression Predict

In [None]:
y_pred = ridge.predict(X_train) #we guessed with X_train.
y_pred[:10] 

In [None]:
#train error
#RMSE = Root Mean Square Error
RMSE = np.sqrt(mean_squared_error(y_train, y_pred))
RMSE

In [None]:
# RMSE with cross validation
np.sqrt(np.mean(-cross_val_score(ridge, X_train, y_train, cv = 10, scoring ="neg_mean_squared_error")))


In [None]:
#test error
y_test_pred = ridge.predict(X_test) # we guessed with X_test.
RMSE = np.sqrt(mean_squared_error(y_test, y_test_pred))
RMSE

<a id = "1.3"></a>
## Ridge Regression Model Tuning

In [None]:
alphas = 10**np.linspace(10,-2,100)*0.5
ridgeCV = RidgeCV(alphas = alphas, scoring = "neg_mean_squared_error", cv = 10, normalize = True)
ridgeCV.fit(X_train, y_train)

In [None]:
ridgeCV.alpha_ # optimum alpha value

<a id = "1.4"></a>
## Ridge Regression Final Model

In [None]:
ridge_tuned = Ridge(alpha = ridgeCV.alpha_).fit(X_train, y_train)
y_pred = ridge_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_test, y_pred))

<a id = "2"></a>
# <font color = "#483D8B"> Lasso Regression </font>

**Lasso (Least Absolute Shrinkage and Selection Operator) is similar to Ridge regression. The main difference here is that Ridge regression uses L2 penalty, while Lasso regression uses L1 penalty.**

<a id = "2.1"></a>
## Lasso Regression Model

In [None]:
lasso = Lasso().fit(X_train, y_train)#we created a model
lasso

In [None]:
lasso.coef_ ##coefficients

In [None]:
lasso.intercept_ #fixed value - b0

<a id = "2.2"></a>

## Lasso Regression Predict

In [None]:
lasso.predict(X_train)[:10] #we guessed with X_train.

In [None]:
lasso.predict(X_test)[:10] #we guessed with X_test.

In [None]:
#test error
y_pred = lasso.predict(X_test)
RMSE = np.sqrt(mean_squared_error(y_test, y_pred)) 
RMSE

<a id = "2.3"></a>

## Lasso Regression Model Tuning

In [None]:
alphas =  np.random.randint(0,1000,100)
lassoCV = LassoCV(alphas = alphas, cv = 10, max_iter = 100000).fit(X_train, y_train)

In [None]:
lassoCV.alpha_ # optimum alpha value

<a id = "2.4"></a>
## Lasso Regression Final Model

In [None]:
final_lasso = Lasso(alpha = lassoCV.alpha_).fit(X_train, y_train)

In [None]:
y_pred = final_lasso.predict(X_test)
np.sqrt(mean_squared_error(y_test, y_pred))

<a id = "3"></a>
# <font color = "#483D8B"> ElasticNet Regression </font>

**Elastic Net Regression is a mixture of Ridge regression and Lasso regression.**

<a id ="3.1"></a>
## ElasticNet Regression Model

In [None]:
em = ElasticNet().fit(X_train, y_train)

In [None]:
em.coef_

In [None]:
em.intercept_

<a id ="3.2"></a>
## ElasticNet Regression Predict

In [None]:
em.predict(X_train)[:10]

In [None]:
em.predict(X_test)[:10]

In [None]:
y_pred = em.predict(X_test)
np.sqrt(mean_squared_error(y_test, y_pred))

In [None]:
r2_score(y_test, y_pred)

<a id ="3.3"></a>
## ElasticNet Regression Model Tuning

In [None]:
alphas = 10**np.linspace(10,-2,100)*0.5
emCV = ElasticNetCV(alphas = alphas, cv = 10).fit(X_train, y_train)

In [None]:
emCV.alpha_

<a id ="3.4"></a>
## ElasticNet Regression Final Model

In [None]:
elanet_tuned = ElasticNet(alpha = emCV.alpha_).fit(X_train, y_train)
y_pred = elanet_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_test, y_pred))