# Diabetes Dataset Analysis Using Lasso Regression (L1 Regularization)

In this notebook, we will explore the diabetes dataset and apply Lasso regression to understand how different levels of regularization affect the model. The notebook covers the following steps:

1. **Importing Libraries**: We start by importing the necessary libraries for data manipulation, model building, and evaluation.

2. **Loading the Dataset**: We load the diabetes dataset from `scikit-learn`, which contains information about diabetes patients and their features.

3. **Extract feature and target**: 
   - We extract the feature matrix (`X`) and the target variable (`y`).

4. **Data Preprocessing**:
   - We standardize the feature data to have a mean of 0 and a standard deviation of 1, which is essential for Lasso regression.

5. **Applying Lasso Regression**:
   - We initialize a Lasso regression model with a specific regularization parameter (`alpha`), fit the model on the standardized features, and make predictions.
   - We calculate and display the R² score to evaluate the model’s performance.

6. **Evaluating Different Alpha Values**:
   - We experiment with different `alpha` values to observe how the regularization strength affects the model’s accuracy and the coefficients of the features.

This analysis will help in understanding the impact of regularization on the model's performance and how to select appropriate regularization parameters for Lasso regression.


In [9]:
# Import necessary libraries
from sklearn.linear_model import Lasso
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import StandardScaler

In [10]:
# Load the diabetes dataset
dataset = load_diabetes()

# Extract features (X) and target (y) from the dataset
X = dataset.data
y = dataset.target

In [11]:
# Standardize the feature data (mean = 0, standard deviation = 1)
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

In [12]:
# Initialize a Lasso regression model with a regularization parameter alpha set to 0.5
# Modify alpha to 1.5, 3 to see impact in accuracy
regr = Lasso(alpha=0.2)

In [13]:
# Fit the Lasso model on the standardized features and target data
model = regr.fit(X_std, y)

In [14]:
# Predict target values using the fitted model
pred = regr.predict(X_std)

# Display the predicted values
pred

array([204.83683833,  70.17269091, 176.02237593, 163.67900005,
       127.87595467, 105.79854573,  77.42116533, 121.73571947,
       159.90543327, 213.39678066,  98.71681356, 101.61037546,
       114.96987028, 163.14798515, 102.92243203, 173.86234644,
       209.77285251, 182.38939105, 146.30023991, 122.81206413,
       118.39798003,  89.48562254, 115.94840308, 267.72373466,
       163.76645843, 145.47494389,  95.91515769, 177.83524482,
       128.54624517, 184.51485165, 158.74543712,  69.91446651,
       256.7071509 , 109.10184265,  80.32096163,  86.19713657,
       207.00092878, 155.48217387, 241.75001547, 136.58525457,
       152.81007989,  73.43790223, 143.7605098 ,  79.13491462,
       218.47887615, 124.64869745, 140.61838714, 107.34830636,
        77.22231206, 186.4418353 , 157.31479328, 168.29113863,
       133.19045036, 156.67442764, 140.12200959,  73.7701116 ,
       205.15326173,  80.25169477,  99.21099302, 134.88976506,
       114.69340133, 176.47164693,  66.26190012,  99.97

In [15]:
# Calculate the R^2 score using the model's .score() method
regr.score(X_std, y)

0.5163207426885066

## Understanding the `alpha` Parameter in Lasso Regression

In Lasso regression, the `alpha` parameter controls the strength of the regularization applied to the model. 

- **Low `alpha`**: When `alpha` is small, the regularization effect is weak, allowing the model to fit the training data more closely. This can lead to more complex models with many non-zero coefficients but may risk overfitting.

- **High `alpha`**: When `alpha` is large, the regularization effect is strong, shrinking more coefficients towards zero. This simplifies the model by effectively performing feature selection, which can help in reducing overfitting but might lead to underfitting if `alpha` is too large.

By experimenting with different `alpha` values, you can balance model complexity and performance. Lower values might fit the data better but risk overfitting, while higher values create simpler models that might underfit.


In [16]:
# Try different alpha values
for alpha in [0.01, 0.1, 1, 2, 10]:
    model = Lasso(alpha=alpha)
    model.fit(X_std, y)
    predictions = model.predict(X_std)
    accuracy = model.score(X_std, y)
    print(f"Alpha: {alpha}, Accuracy: {accuracy}, Coefficients: {model.coef_}")

Alpha: 0.01, Accuracy: 0.5177406317436763, Coefficients: [ -0.45318633 -11.38606741  24.73608567  15.40766041 -36.0524547
  21.42406165   4.02985264   8.13661386  35.1475716    3.21328284]
Alpha: 0.1, Accuracy: 0.5173761436545252, Coefficients: [ -0.27769342 -11.15948797  24.85518378  15.2421328  -26.44813964
  13.72566329  -0.           7.05557447  31.57506171   3.1584765 ]
Alpha: 1, Accuracy: 0.5132840923855555, Coefficients: [ -0.          -9.31941253  24.83127631  14.08870568  -4.83892808
  -0.         -10.62279919   0.          24.42081057   2.56212987]
Alpha: 2, Accuracy: 0.509385000432985, Coefficients: [ -0.          -7.56885595  24.6216624   13.17740156  -2.71641858
  -0.         -10.0542958    0.          23.14772448   1.6908785 ]
Alpha: 10, Accuracy: 0.45868609892214773, Coefficients: [ 0.         -0.         22.60037484  6.80123653 -0.         -0.
 -3.08803938  0.         19.58593242  0.        ]
