### Name-Satyanistha Das
### Regd.No-2241004100

#  LASSO REGRESSION

###### Lasso regression or L1 Regularization , also known as Least Absolute Shrinkage and Selection Operator (LASSO), is a statistical method used in machine learning and regression analysis. It's a powerful technique for addressing issues like overfitting and feature selection.

###### Key Points:

###### Regularization: Lasso is a type of regularization technique. Regularization aims to balance the model's fit to the training data with its overall complexity. This helps prevent overfitting, where the model performs well on the training data but poorly on unseen data.

###### L1 Penalty: Unlike regular linear regression, Lasso adds a penalty term to the cost function. This penalty term is based on the absolute value (L1 norm) of the coefficients. Coefficients with a smaller impact on the model are shrunk towards zero, and some may even be driven to exactly zero.

###### Feature Selection: This shrinkage property of Lasso leads to feature selection. Features with minimal contribution to the model have their coefficients reduced to zero, effectively removing them from the model. This helps identify the most important features for prediction.


###### Key advantages of Lasso regression:

###### Prevents overfitting: By penalizing large coefficients, Lasso discourages the model from becoming overly complex and fitting too closely to random noise in the training data.

###### Feature selection: Lasso performs automatic feature selection by driving unimportant features' coefficients to zero. This simplifies the model and improves interpretability.

###### Better generalization: By avoiding overfitting and selecting relevant features, Lasso regression models often generalize better to unseen data.

## Here is the gradient descent implementation of Lasso Regression

In [14]:
import numpy as np
def predict(x,m):
    prediction=np.dot(x,m)
    return prediction
def error(x,y,m):
    return (y-predict(x,m))
def cost(x,y,m):
    e=0
    ld=10000
    for i in range(len(x)):
        e=e+error(x[i],y[i],m)**2+ ld*abs(m[i])
    return e/(2*len(x))
def gradient(x,y,m):
    temp=[]
    for j in range(len(x[0])):
        m_j=0
        for i in range(len(x)):
            m_j=m_j+(x[i][j]*error(x[i],y[i],m))
        temp.append(m_j)
    return temp
def gradient_descent(x,y,m):
    m=np.zeros((len(x[0]),1))
    e=[]
    iteration=[]
    i=0
    pre_cost=1
    post_cost=0
    while abs(post_cost-pre_cost)>0.001:
        pre_cost=cost(x,y,m)
        m=m+(np.array(gradient(x,y,m))*0.01)
        post_cost=cost(x,y,m)
        e.append(post_cost)
        i+=1
        iteration.append(i)
    print('after',i,' no. of iteration','m:') 
    return m
x=[[1,2,5],[1,7,5]]
y=[2,5]
m=[1,1,1]
gradient_descent(x,y,m)

after 152  no. of iteration m:


array([[0.03077056],
       [0.59999328],
       [0.15385282]])

# RIDGE REGRESSION

###### Ridge regression, also known as L2 regularization, is a technique used in linear regression to address overfitting and multicollinearity.

######  Overfitting: This occurs when a model performs well on the training data but poorly on unseen data. Ridge regression helps by introducing a bias to the model, reducing its variance and complexity.
######  Multicollinearity: This happens when independent variables in a regression model are highly correlated. Ridge regression helps by shrinking the coefficients of these correlated variables, reducing their influence on the model.

###### Key points:

###### Ridge regression doesn't eliminate features entirely, unlike lasso regression (another regularization technique). It shrinks their coefficients.
###### A hyperparameter called lambda controls the strength of the penalty term. A higher lambda leads to stronger shrinkage and potentially reduces variance but increases bias.
###### Ridge regression is a good choice when dealing with multicollinearity or a high number of features relative to the number of observations.


###### There are actually two key advantages to ridge regression:

###### Reduces Overfitting:  This is a major benefit. Standard linear regression can be prone to overfitting, where the model memorizes the training data too well and performs poorly on unseen data. Ridge regression introduces a bias that helps to  shrink the coefficients of the model, making it less complex and more generalizable to new data.

###### Handles Multicollinearity:  This occurs when there's high correlation between independent variables in your data. In such cases, standard regression models can produce unreliable coefficient estimates with high variance. Ridge regression helps by shrinking the coefficients of these correlated variables, reducing their individual influence and leading to more stable and reliable model.

## Here is the gradient descent implementation of Ridge Regression

In [15]:
import numpy as np
def predict(x,m):
    prediction=np.dot(x,m)
    return prediction
def error(x,y,m):
    return (y-predict(x,m))
def costf(x,y,m):
    e=0
    ld=10000
    for i in range(len(x)):
        e+=error(x[i],y[i],m)**2+ ld*m[i]**2
    return e/(2*len(x))
def gradient(x,y,m):
    temp=[]
    for j in range(len(x[0])):
        m_j=0
        for i in range(len(x)):
            m_j=m_j+(x[i][j]*error(x[i],y[i],m))
        temp.append(m_j)
    return temp
def gradient_descent(x,y,m):
    m=np.zeros((len(x[0]),1))
    e=[]
    iteration=[]
    i=0
    pre_cost=1
    post_cost=0
    while abs(post_cost-pre_cost)>0.001:
        pre_cost=costf(x,y,m)
        m=m+(np.array(gradient(x,y,m))*0.01)
        post_cost=costf(x,y,m)
        e.append(post_cost)
        i+=1
        iteration.append(i)
    print('after',i,' no. of iteration m is ') 
    return m
x=[[1,2,5],[1,7,5]]
y=[2,5]
m=[1,1,1]
gradient_descent(x,y,m)

after 158  no. of iteration m is 


array([[0.03077012],
       [0.59999554],
       [0.15385058]])