### Ridge Regression 

Ridge regression: it is one of regularization technique for linear regression problems


•  maintains accuracy as well as a generalization of the mode
•  reduces the magnitude of the variables, hence maintain all variables or features
•  In simple words, "In regularization technique, we reduce the magnitude of the features by keeping the same number of features

How does Regularization Work?
• by adding a penalty or complexity term to the complex model

so in this:
- a small amount of bias is added
- reduces the complexity of the model, also called L2 regularization
- cost function is altered by adding the penalty term to it
- amount of bias added to the model is called Ridge Regression penalty
- equation for the cost function in ridge regression will be img...


From the cost function of Ridge Regression we can see that if the values of λ tends to zero, the equation becomes the cost function of the linear regression model..

A general linear or polynomial regression will fail if there is high collinearity between the independent variables, so to solve such problems, Ridge regression can be used.

$$\displaystyle\sum\limits_{j=1}^m\left(Y_{i}-W_{0}-\displaystyle\sum\limits_{i=1}^nW_{i}X_{ji} \right)^{2}+\alpha\displaystyle\sum\limits_{i=1}^nW_i^2=loss_{-}function+\alpha\displaystyle\sum\limits_{i=1}^nW_i^2$$


In [1]:
# Parameters:

**alpha** − {float, array-like}, shape(n_targets)
Alpha is the tuning parameter that decides how much we want to penalize the model.

**fit_intercept** − Boolean
Specifies that a constant (bias or intercept) should be added to the decision function or not

**tol** − float, optional, default=1e-4
It represents the precision of the solution.

**normalize** − Boolean, optional, default = False
If True, the regressor X will be normalized before regression. The normalization is done by subtracting the mean and dividing it by L2 norm. If fit_intercept = False, this parameter will be ignored.

**copy_X** − Boolean, optional, default = True
By default, it is true which means X will be copied. But if it is set to false, X may be overwritten.

**max_iter** − int, optional
As name suggest, it represents the maximum number of iterations taken for conjugate gradient solvers.

**solver** − str, {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}’
This parameter represents which solver to use in the computational routines. Following are the properties of options under this parameter

    auto − It let choose the solver automatically based on the type of data.
    svd − In order to calculate the Ridge coefficients, this parameter uses a Singular Value Decomposition of X.
    cholesky − This parameter uses the standard scipy.linalg.solve() function to get a closed-form solution.
    lsqr − It is the fastest and uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr.
    sag − It uses iterative process and a Stochastic Average Gradient descent.
    saga − It also uses iterative process and an improved Stochastic Average Gradient descent.
    
**random_state** − int, RandomState instance or None, optional, default = none
- represents the seed of the pseudo random number generated which is used while shuffling the data. Following are the options −
- if int, random_state is the seed used by random number generator.
- if RandomState instance,random_state is the random number generator.
- if None, the random number generator is the RandonState instance used by np.random

### Attributes

coef_ − array, shape(n_features,) or (n_target, n_features)

This attribute provides the weight vectors.

Intercept_ − float | array, shape = (n_targets)
It represents the independent term in decision function.

n_iter_ − array or None, shape (n_targets)
Available for only ‘sag’ and ‘lsqr’ solver, returns the actual number of iterations for each target.

In [2]:
from sklearn.linear_model import Ridge
import numpy as np
n_samples, n_features = 15, 10
rs = np.random.RandomState(0)
y = rs.randn(n_samples)
X = rs.randn(n_samples, n_features)
model = Ridge(alpha = 0.5)
model.fit(X, y)
model.score(X,y)

0.7629498741931634

In [3]:
model.coef_

array([ 0.32720254, -0.34503436, -0.2913278 ,  0.2693125 , -0.22832508,
       -0.8635094 , -0.17079403, -0.36288055, -0.17241081, -0.43136046])

In [4]:
model.intercept_

0.5274865723969377