### Bias and Variance: 

Two common problems with model. 

Over-fitting - the model fits the training data but fails to establish a relationship among the features and perfroms poorly on the test data. 

Underfitting - the model neither fits the training data nor the test data. 
    
<img src ="various_fits.png", width = 400, height = 300> 

Bias measures how far we are from the actual value on an average. 

Variance measures spread of the target values. 

<img src ="bias_variance.png", width = 400, height = 300> 

Regularization is a technique to overcome both overfitting or underfitting of the data. 

If $L$ represents the loss function then our goal is to 

minimize (L + regularization term). The regularization term is know as the penalty term. Below we have different regularization with the norms:

L1 norm: $\lambda ||w||_1 = \lambda   \sum |w_j| $

L2 norm: $\lambda ||w||_2^2  = \lambda  \sum w^2_j $

where $w$ is a matrix that contains coefficients for the different features and $\lambda$ is a regularization parameter. 

Least Absolute Shrinkage and Selection Operator (LASSO) regression  uses the L1 regularization. This is used when we have more number of features. 

Ridge regression uses L2 regularization. This is used to prevent multicollinearity. 

The value of $\alpha$ controls the penalty term. If $\alpha$ is very high, then the penalty is high and thus, the magnitude of the coefficients will be small.

#### Important Note: In sklearn for Lasso, Ridge and Elastic, alpha is same as $\lambda$ in the regularization equation. 

#### Hyperparamters are the parameters that the user has to provide manually. Here $\alpha$ is a hyperparameter. 

Elastic net regression is a combination of both L1 and L2 regularization. 

$min( L + \lambda_1 ||w||_1 + \lambda_2 ||w||_2^2) $

$\alpha = \lambda_1 + \lambda_2 $ and 

$l1\_ratio = \frac{\lambda_1}{\lambda_1 + \lambda_2} $

$l1\_ratio = 1$ can only happen when $\lambda_1 = 1$ and $\lambda_2 = 0,$ this will result in Lasso regression.

$l1\_ratio = 0$ can only happen when $\lambda_1 = 0$ and this will result in Ridge regression.

For $l1\_ratio$ between 0 and 1, we get Elastic net regression.

### Relationship between model complexity and error

Reference: http://www.frank-dieterle.de/phd/2_8_1.html


<img src ="model_complexity.png", width = 300, height = 200> 

In [13]:
import pandas as pd
import numpy as np
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso

In [14]:
auto = pd.read_csv('auto_mpg.csv')

In [15]:
print(auto.columns)

Index(['mpg', 'cylinder', 'displacement', 'horse power', 'weight',
       'acceleration', 'model year', 'origin', 'car name'],
      dtype='object')


In [16]:
auto = auto[auto["horse power"] != "?"]

In [17]:
print(auto.shape)

(392, 9)


In [18]:
autox = auto[["displacement", "horse power", "weight", "acceleration"]].copy(deep=True)
autoy = auto[["mpg"]].copy(deep=True)

In [19]:
autox["displacement"] = autox["displacement"].astype(float)
autox["horse power"] = autox["horse power"].astype(float)
autox["weight"] = autox["weight"].astype(float)
autox["accelaration"] = autox["acceleration"].astype(float)

In [20]:
x_train, x_test, y_train, y_test = train_test_split(autox, \
                                                    autoy, \
                                                    test_size=0.2, \
                                                    random_state=4)

In [21]:
print(y_train.shape)


(313, 1)


In [22]:
from sklearn.linear_model import Lasso

lassoReg = Lasso(alpha=0.05, normalize=True)

lassoReg.fit(x_train,y_train)

pred = lassoReg.predict(x_test)
print(pred.shape, y_test.shape)

# calculating mse

mse = np.mean((pred - np.array(y_test).flatten())**2)
print(mse)


b = lassoReg.score(x_test,y_test) # returns the r-squared value
print(b)
print(lassoReg.coef_, lassoReg.intercept_)

(79,) (79, 1)
18.773024487514057
0.7123189217847542
[-0.00121657 -0.0218214  -0.00568267  0.          0.        ] [42.8649594]


In [23]:
from sklearn.linear_model import Ridge

## training the model

ridgeReg = Ridge(alpha=0.05, normalize=True)

ridgeReg.fit(x_train,y_train)

pred = ridgeReg.predict(x_test)

mse = np.mean((y_test - pred)**2)
rd= ridgeReg.score(x_test,y_test) # returns the r-squared value
print(rd)
print(ridgeReg.coef_, ridgeReg.intercept_)

0.7245311044437953
[[-0.01324779 -0.05028545 -0.00428272 -0.09528115 -0.09528115]] [46.92147583]


In [28]:
# Elastic Net code
from sklearn.linear_model import ElasticNet

enReg = ElasticNet(alpha=1, l1_ratio=0.5, normalize=False)

enReg.fit(x_train,y_train)

pred_en = enReg.predict(x_test)
print(pred_en.shape)
print(y_test.shape)

# calculating mse

mse = np.mean((pred_en.reshape(-1,1) - y_test)**2)

# calculating r-squared from Elastic Net

en = enReg.score(x_test,y_test) 
print(en)

(79,)
(79, 1)
0.7253272822673765


In [None]:
# Replacing a value in a column 
"""
df["col_name"] = df["col_name"].replace(value_to_be_replaced, new_value)
"""

In [None]:
# Replacing a value in multiple columns with one value

"""
cols = ["col1", "col2", "col3"]

df["cols"] = df["cols"].replace(value_to_be_replaced, new_value)
"""

In [None]:
# Finding r-squared

"""
from sklearn.metrics import r2_score
r2_score(y_test, yhat) 
"""
# here yhat is same as y predict

In [None]:
"""
In-class activity: For the auto_mpg data, can you pick alpha = 0.1, 
alpha = 0.15, alpha = 0.2 and run Lasso and Ridge regressions? 
Each regression, plot the coefficient values for each alpha. 
"""