# **LASSO (Least Absolute Shrinkage and Selection Operator)**

**Regularization** is used to reduce the overfitting of the model by adding a penalty term (Î») to the model. ***Lasso Regression*** uses **L1 regularization** technique.

The "**penalty**" term reduces the value of the coefficients or eliminate few coefficients, so that the model has fewer coefficients. As a result, overfitting can be avoided.

This Process is called as ***Shrinkage***.

**Importing Dependencies**

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error

**Implementation**

In [2]:
class LassoRegression:
    # Constructor function to initialize hyperparameters
    def __init__(self, learning_rate=0.01, no_of_iterations=1000, l1_penalty=1.0):
        self.learning_rate = learning_rate
        self.no_of_iterations = no_of_iterations
        self.l1_penalty = l1_penalty

    # fit function to fit the dataset to the LASSO Regression model
    def fit(self, X, Y):
        # Convert X and Y to numpy arrays
        self.X = X.values
        self.Y = Y.values

        # Initialize no. of training examples (m) and no. of input features (n)
        self.m, self.n = self.X.shape

        # initialize weights
        self.w = np.zeros(self.n)
        self.b = 0

        for i in range(self.no_of_iterations):
            self.update_weights()

    # function to update weights i.e model parameters
    def update_weights(self):
        Y_prediction = self.predict(self.X)

        # Calculate gradients for Lasso Regression
        dw = np.zeros(self.n)
        for j in range(self.n):
            if self.w[j] > 0:
                dw[j] = (-(2 * np.dot(self.X[:, j] ,self.Y - Y_prediction)) + self.l1_penalty) / self.m
            else:
                dw[j] = (-(2 * np.dot(self.X[:, j] ,self.Y - Y_prediction)) - self.l1_penalty) / self.m

        db = -2 * np.sum(self.Y - Y_prediction) / self.m

        # Updating weights and bias
        self.w = self.w - self.learning_rate * dw
        self.b = self.b - self.learning_rate * db

    # function to predict output based on given input feature(s)
    def predict(self, X):
        return np.dot(X, self.w) + self.b

**Model Training**

In [3]:
df = pd.read_csv("/content/drive/MyDrive/Datasets/salary_data.csv")
df.shape

(2217, 2)

**Data Description**

In [4]:
df.head()

Unnamed: 0,YearsExperience,Salary
0,1.1,39343
1,1.3,46205
2,1.5,37731
3,2.0,43525
4,2.2,39891


In [5]:
df.tail()

Unnamed: 0,YearsExperience,Salary
2212,23.3,254396
2213,23.7,244176
2214,15.7,170218
2215,5.0,70881
2216,13.7,152679


In [6]:
df.describe()

Unnamed: 0,YearsExperience,Salary
count,2217.0,2217.0
mean,12.19991,141158.828146
std,7.209522,68201.505858
min,0.0,18581.0
25%,5.8,80992.0
50%,12.1,139452.0
75%,18.6,200821.0
max,25.0,270450.0


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2217 entries, 0 to 2216
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   YearsExperience  2217 non-null   float64
 1   Salary           2217 non-null   int64  
dtypes: float64(1), int64(1)
memory usage: 34.8 KB


**Data Preprocessing**

In [8]:
df.isnull().sum()

Unnamed: 0,0
YearsExperience,0
Salary,0


In [9]:
X = df.drop(columns='Salary', axis=1)
Y = df['Salary']

**Training Testing Dataset Splitting**

In [10]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.15, random_state=49)

In [11]:
print(X.shape, X_train.shape, X_test.shape)

(2217, 1) (1884, 1) (333, 1)


**Training LASSO Regression Model on Training Dataset**

In [12]:
model = LassoRegression(learning_rate=0.001, no_of_iterations=10000, l1_penalty=0.1)

In [13]:
model.fit(X_train, Y_train)

**Model Accuracy Evaluation on Training Dataset**

In [14]:
Y_train_pred = model.predict(X_train)
train_r2_score = r2_score(Y_train, Y_train_pred)
train_mae = mean_absolute_error(Y_train, Y_train_pred)
print("R2 Score on training Dataset: ", train_r2_score)
print("Mean Absolute Error on training Dataset: ", train_mae)

R2 Score on training Dataset:  0.9929560534429169
Mean Absolute Error on training Dataset:  4578.069519561697


**Model Accuracy Evaluation on Testing Dataset**

In [15]:
Y_test_pred = model.predict(X_test)
test_r2_score = r2_score(Y_test, Y_test_pred)
test_mae = mean_absolute_error(Y_test, Y_test_pred)
print("R2 Score on testing Dataset: ", test_r2_score)
print("Mean Absolute Error on testing Dataset: ", test_mae)

R2 Score on testing Dataset:  0.9931912002149403
Mean Absolute Error on testing Dataset:  4436.127637887863


**Making a Predictive System**

In [16]:
input_data = [19.1]
prediction = model.predict(np.array([input_data]))
print(f"Predicted Salary for {input_data[0]} yrs of exp is: {prediction[0]:.2f}")

Predicted Salary for 19.1 yrs of exp is: 206159.72
