<a href="https://colab.research.google.com/github/sanyamChaudhary27/ML_models_from_scratch/blob/main/MY_RIDGE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [108]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [109]:
class LinearRegression:
    def __init__(self, n_estimators: int = 100, learning_rate: float = 0.1, ridge:bool=False, lambda_param: float = 0.1, descent='stochastic'):
        self.n_estimators= n_estimators
        self.learning_rate= learning_rate
        self.intercept= None
        self.weight= None
        self.ridge = ridge
        self.lambda_param = lambda_param # Lambda_param for Ridge regularization
        self.descent=descent

    def fit(self, x, y):
        if self.ridge and self.descent == 'stochastic':
            print(f"WARNING: Ridge regularization is enabled (ridge=True) but descent is '{self.descent}'. Ridge regularization is only applied with 'batch' gradient descent.")
        self.X= x
        # Ensure y is a 1D array for easier calculations
        self.y= y.flatten() if y.ndim > 1 else y
        n_samples, n_features = self.X.shape
        self.weight= np.zeros(n_features)
        self.intercept= 0


        # Using Stochastic gradient descent
        if self.descent=='stochastic':
            for _ in range(self.n_estimators):
                # Shuffle indices for random observation
                shuffled_indices = np.random.permutation(n_samples)
                for i in shuffled_indices:
                    # Predict for a single sample
                    preds = self.predict(self.X[i, :])
                    error = preds - self.y[i]

                    # Update weights and intercept
                    self.weight = self.weight - self.learning_rate * (self.X[i, :] * error)
                    self.intercept = self.intercept - self.learning_rate * error

        # Using Batch gradient descent
        elif self.descent == 'batch':
            for _ in range(self.n_estimators):
                # Predict for the entire batch
                preds = self.predict(self.X)
                error = preds - self.y # Error for the entire batch

                # Calculate gradients
                gradient_weights = (2/n_samples) * np.dot(self.X.T, error)
                gradient_intercept = (2/n_samples) * np.sum(error)

                if self.ridge:
                    # Add L2 regularization term to weight gradient
                    gradient_weights += 2 * self.lambda_param * self.weight

                # Update weights and intercept
                self.weight = self.weight - self.learning_rate * gradient_weights
                self.intercept = self.intercept - self.learning_rate * gradient_intercept

    def predict(self, X):
        if self.weight is None:
            raise ValueError('Please call slf.fit() first')
        return self.intercept + np.dot(X, self.weight)

In [110]:
from sklearn.datasets import fetch_california_housing

In [111]:
X, y = fetch_california_housing().data, fetch_california_housing().target

In [112]:
X

array([[   8.3252    ,   41.        ,    6.98412698, ...,    2.55555556,
          37.88      , -122.23      ],
       [   8.3014    ,   21.        ,    6.23813708, ...,    2.10984183,
          37.86      , -122.22      ],
       [   7.2574    ,   52.        ,    8.28813559, ...,    2.80225989,
          37.85      , -122.24      ],
       ...,
       [   1.7       ,   17.        ,    5.20554273, ...,    2.3256351 ,
          39.43      , -121.22      ],
       [   1.8672    ,   18.        ,    5.32951289, ...,    2.12320917,
          39.43      , -121.32      ],
       [   2.3886    ,   16.        ,    5.25471698, ...,    2.61698113,
          39.37      , -121.24      ]])

In [113]:
X = pd.DataFrame(X)
Y = pd.DataFrame(y)

In [114]:
X

Unnamed: 0,0,1,2,3,4,5,6,7
0,8.3252,41.0,6.984127,1.023810,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.971880,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.802260,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25
...,...,...,...,...,...,...,...,...
20635,1.5603,25.0,5.045455,1.133333,845.0,2.560606,39.48,-121.09
20636,2.5568,18.0,6.114035,1.315789,356.0,3.122807,39.49,-121.21
20637,1.7000,17.0,5.205543,1.120092,1007.0,2.325635,39.43,-121.22
20638,1.8672,18.0,5.329513,1.171920,741.0,2.123209,39.43,-121.32


In [115]:
fetch_california_housing()

{'data': array([[   8.3252    ,   41.        ,    6.98412698, ...,    2.55555556,
           37.88      , -122.23      ],
        [   8.3014    ,   21.        ,    6.23813708, ...,    2.10984183,
           37.86      , -122.22      ],
        [   7.2574    ,   52.        ,    8.28813559, ...,    2.80225989,
           37.85      , -122.24      ],
        ...,
        [   1.7       ,   17.        ,    5.20554273, ...,    2.3256351 ,
           39.43      , -121.22      ],
        [   1.8672    ,   18.        ,    5.32951289, ...,    2.12320917,
           39.43      , -121.32      ],
        [   2.3886    ,   16.        ,    5.25471698, ...,    2.61698113,
           39.37      , -121.24      ]]),
 'target': array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894]),
 'frame': None,
 'target_names': ['MedHouseVal'],
 'feature_names': ['MedInc',
  'HouseAge',
  'AveRooms',
  'AveBedrms',
  'Population',
  'AveOccup',
  'Latitude',
  'Longitude'],
 'DESCR': '.. _california_housing_dataset:\n

In [116]:
X.columns = fetch_california_housing().feature_names
Y.columns = fetch_california_housing().target_names

In [117]:
X

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.023810,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.971880,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.802260,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25
...,...,...,...,...,...,...,...,...
20635,1.5603,25.0,5.045455,1.133333,845.0,2.560606,39.48,-121.09
20636,2.5568,18.0,6.114035,1.315789,356.0,3.122807,39.49,-121.21
20637,1.7000,17.0,5.205543,1.120092,1007.0,2.325635,39.43,-121.22
20638,1.8672,18.0,5.329513,1.171920,741.0,2.123209,39.43,-121.32


In [118]:
from sklearn.preprocessing import StandardScaler
Scaler = StandardScaler()

In [119]:
X_scaled=Scaler.fit_transform(X)

In [120]:
from sklearn.model_selection import train_test_split

In [121]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size = 0.2, random_state = 42)

In [122]:
model1 = LinearRegression(n_estimators=10000, learning_rate=0.01, descent='batch')

In [123]:
model2 = LinearRegression(n_estimators=10000, learning_rate=0.01, descent='batch', ridge=True)

In [124]:
model1.fit(X_train, y_train)
model2.fit(X_train, y_train)

In [125]:
preds = model1.predict(X_test)
ridge_preds = model2.predict(X_test)

In [126]:
from sklearn.metrics import mean_squared_error

In [127]:
def rmse(y_preds, y_test):
    return np.sqrt(mean_squared_error(y_preds, y_test))

###ROOT MEAN SQUARRED ERROR OF NON L2 REGULARIZED MODEL

In [128]:
print(rmse(preds, y_test))

0.7455822822725463


###ROOT MEAN SQUARRED ERROR OF L2 REGULARIZED MODEL

In [129]:
print(rmse(ridge_preds, y_test))

0.7631447120121134


RMSE (Root Mean Squared Error) is a measure of the difference between values predicted by a model and the actual values. In general, you want RMSE to be as low as possible, as a lower RMSE indicates that the model's predictions are closer to the actual values, meaning better accuracy.

In your case, the ridge regularized model has an RMSE of 0.763, which is slightly higher than the non-regularized model's RMSE of 0.745. Here's why this might happen:

- Bias-Variance Trade-off: Ridge regression introduces a penalty term (L2 regularization) to shrink the model's coefficients towards zero. This is done to reduce the model's variance and prevent overfitting, especially when dealing with multicollinearity or a large number of features. However, by adding this penalty, ridge regression also introduces a small amount of bias into the model.

- Optimal Lambda (lambda_param): The strength of the regularization is controlled by the lambda_param. If lambda_param is set too high, the model might be overly penalized, leading to underfitting. An underfit model has high bias and might perform worse (i.e., have a higher RMSE) even on the test set, as it's too simplistic to capture the underlying patterns in the data.

- No Overfitting in the Baseline Model: It's possible that your original non-regularized LinearRegression model was not significantly overfitting the training data, or the amount of overfitting was minor. In such scenarios, adding regularization might not provide a substantial benefit to the test set performance and could even slightly increase the error by introducing unnecessary bias.

- Dataset Characteristics: Some datasets might not inherently benefit much from L2 regularization, especially if multicollinearity is not a major issue or if the relationships are relatively simple.

**To find the optimal lambda_param for ridge regression, you would typically use techniques like cross-validation on a validation set**. This helps you select a lambda_param that balances bias and variance effectively for better generalization.