### ***Regularisation***
In linear regression, regularisation is a technique used to prevent overfitting by adding a penalty term to the loss function. This penalty discourages the model from fitting the noise in the training data, leading to better generalisation on unseen data. Basically, the high value of slope of the best fit line indicates overfitting and low value indicates underfitting. Regularisation helps to keep the slope value in check. How much penalty is needed is determined by a hyperparameter called lambda (λ)(alpha in sklearn). 

`Formula:` ***(loss function) + λ * (slope ^ 2 )***


How this formula works is that when the slope value is high, the penalty term (λ * (slope ^ 2)) becomes significant, increasing the overall loss. This encourages the model to reduce the slope value to minimise the loss, leading to a simpler model that is less likely to overfit. Conversely, if the slope is low, the penalty term has a smaller impact on the loss, allowing the model to fit the data more closely without excessive regularisation.

In [10]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_diabetes
import matplotlib.pyplot as plt

In [11]:
data = load_diabetes()
print(data.DESCR)
data

.. _diabetes_dataset:

Diabetes dataset
----------------

Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n =
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.

**Data Set Characteristics:**

:Number of Instances: 442

:Number of Attributes: First 10 columns are numeric predictive values

:Target: Column 11 is a quantitative measure of disease progression one year after baseline

:Attribute Information:
    - age     age in years
    - sex
    - bmi     body mass index
    - bp      average blood pressure
    - s1      tc, total serum cholesterol
    - s2      ldl, low-density lipoproteins
    - s3      hdl, high-density lipoproteins
    - s4      tch, total cholesterol / HDL
    - s5      ltg, possibly log of serum triglycerides level
    - s6      glu, blood sugar level

Note: Each of these 10 feature variables have bee

{'data': array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
          0.01990749, -0.01764613],
        [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
         -0.06833155, -0.09220405],
        [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
          0.00286131, -0.02593034],
        ...,
        [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
         -0.04688253,  0.01549073],
        [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
          0.04452873, -0.02593034],
        [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
         -0.00422151,  0.00306441]], shape=(442, 10)),
 'target': array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
         69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
         68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
         87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
        259.,  53., 190., 142.,  75., 142.

In [12]:
from sklearn.model_selection import train_test_split
X = data.data
Y = data.target
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

((353, 10), (89, 10), (353,), (89,))

In [21]:
from sklearn.linear_model import LinearRegression, Ridge
model_lr = LinearRegression()
model_rr = Ridge(alpha=10000) # high alpha = high regularisation

model_lr.fit(X_train, Y_train)
model_rr.fit(X_train, Y_train)

pred_lr = model_lr.predict(X_test)
pred_rr = model_rr.predict(X_test)
pred_lr, Y_test

score_lr = model_lr.score(X_test, Y_test)
score_rr = model_rr.score(X_test, Y_test)

print(f"""
      Linear Regression: {score_lr}
      Ridge Regression: {score_rr}
      """)


      Linear Regression: 0.4526027629719197
      Ridge Regression: -0.011710971590750185
      


- Too high alpha makes the model underfit the data.

### ***Alpha Tunning***


In [26]:
alphas = [0, 0.05, 0.1, 0.1005, 0.2, 0.21, 0.5, 1, 1.5, 2, 10]
scores = []

for alpha in alphas:
    temp_rr = Ridge(alpha=alpha)
    temp_rr.fit(X_train, Y_train)
    scores.append({alpha: temp_rr.score(X_test, Y_test)})
    
scores

[{0: 0.45260276297192004},
 {0.05: 0.4589982185720506},
 {0.1: 0.46085219464119254},
 {0.1005: 0.46086372555329524},
 {0.2: 0.46113836892116755},
 {0.21: 0.4609870958472705},
 {0.5: 0.4493973121295206},
 {1: 0.41915292635986545},
 {1.5: 0.38905442818451164},
 {2: 0.36176864948181187},
 {10: 0.161225867509881}]

- 0.2 alpha gives better result in this case.