Q1. What is Lasso Regression, and how does it differ from other regression techniques?

Lasso Regression, short for "Least Absolute Shrinkage and Selection Operator," is a type of linear regression that includes regularization. Regularization is a technique used to prevent overfitting by adding a penalty to the model's complexity.

### Key Features of Lasso Regression
1. **L1 Regularization**: Lasso regression adds a penalty equal to the absolute value of the magnitude of the coefficients multiplied by a tuning parameter (λ). This is known as L1 regularization.
2. **Feature Selection**: Because of the L1 penalty, Lasso can shrink some of the coefficients to exactly zero, effectively performing feature selection. This makes it useful for models with many features, as it can simplify the model by keeping only the most significant variables.
3. **Overfitting Prevention**: By penalizing large coefficients, Lasso helps in preventing overfitting, especially when the dataset has many features or the model is complex.

### Differences from Other Regression Techniques

1. **Ridge Regression**:
   - **Penalty**: Ridge regression uses L2 regularization, which adds a penalty equal to the square of the magnitude of the coefficients. Unlike Lasso, Ridge does not shrink coefficients to zero.
   - **Feature Selection**: Ridge does not perform feature selection; it tends to keep all features with small coefficients, even if their influence on the outcome is minor.

2. **Elastic Net**:
   - **Combination of L1 and L2 Penalties**: Elastic Net combines the penalties of Lasso and Ridge, adding both the absolute values and the squares of the coefficients to the cost function. This makes Elastic Net a more flexible regularization technique.
   - **Feature Selection**: Elastic Net can also perform feature selection but generally keeps more features than Lasso due to the combined penalty.

3. **Ordinary Least Squares (OLS) Regression**:
   - **No Regularization**: OLS does not include any regularization term; it minimizes only the sum of squared residuals. This can lead to overfitting, especially with a large number of features.
   - **No Feature Selection**: OLS includes all features in the model, regardless of their significance.

4. **Principal Component Regression (PCR)**:
   - **Dimensionality Reduction**: PCR reduces the dimensionality of the data by projecting it onto a lower-dimensional space using Principal Component Analysis (PCA). The regression is then performed on these principal components rather than the original features.
   - **No Feature Selection**: PCR does not explicitly select features but transforms the data into a different set of variables.

Lasso Regression's ability to perform feature selection and prevent overfitting makes it a powerful tool in scenarios where interpretability and model simplicity are crucial.

Q2. What is the main advantage of using Lasso Regression in feature selection?

The main advantage of using **Lasso Regression** for feature selection is its ability to **automatically select important features** by shrinking the coefficients of less important ones to zero. This results in a simpler, more interpretable model²⁵. Here are some key benefits:

1. **Automatic Feature Selection**: Lasso can effectively exclude irrelevant or redundant features by setting their coefficients to zero⁶.
2. **Reduced Overfitting**: By adding a penalty term, Lasso helps prevent overfitting, especially in high-dimensional datasets³.
3. **Improved Model Interpretability**: With fewer features, the model becomes easier to understand and interpret⁵.



In [4]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.datasets import fetch_california_housing
from sklearn.preprocessing import StandardScaler

# Load the California Housing dataset
california = fetch_california_housing()
X = pd.DataFrame(california.data, columns=california.feature_names)
y = california.target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)


# Initialize and fit the Lasso model
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

# Get the coefficients
lasso_coefficients = pd.Series(lasso.coef_, index=X.columns)

# Print the coefficients
print("Lasso Coefficients:")
print(lasso_coefficients)

# Select features with non-zero coefficients
selected_features = lasso_coefficients[lasso_coefficients != 0].index.tolist()
print("\nSelected Features:")
print(selected_features)


Lasso Coefficients:
MedInc        0.706740
HouseAge      0.106614
AveRooms     -0.000000
AveBedrms     0.000000
Population   -0.000000
AveOccup     -0.000000
Latitude     -0.010394
Longitude    -0.000000
dtype: float64

Selected Features:
['MedInc', 'HouseAge', 'Latitude']


Q3. How do you interpret the coefficients of a Lasso Regression model?

Interpreting the coefficients of a Lasso Regression model involves understanding how the regularization affects the model. Here are the key points:

1. **Shrinkage and Selection**: Lasso Regression applies a penalty to the absolute values of the coefficients, which can shrink some of them to exactly zero. This means Lasso performs both variable selection and regularization, making it useful for models with many predictors².

2. **Magnitude of Coefficients**: The non-zero coefficients in a Lasso model indicate the variables that have a significant impact on the response variable. The magnitude of these coefficients reflects the strength and direction of the relationship between the predictor and the response².

3. **Interpretation Similar to Linear Regression**: For the non-zero coefficients, the interpretation is similar to that in ordinary least squares (OLS) regression. A coefficient represents the change in the response variable for a one-unit change in the predictor variable, holding all other predictors constant¹.

4. **Bias-Variance Tradeoff**: By introducing a penalty, Lasso reduces the variance of the model at the cost of introducing some bias. This tradeoff can lead to better model performance on new data².

Here's a simple example in Python using the `sklearn` library:




In [9]:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 1. Load the dataset
diamonds = sns.load_dataset('diamonds')

# 2. Preprocess the data
# Remove rows with missing values and select numerical features
diamonds.dropna(inplace=True)


# Convert categorical columns to numeric using one-hot encoding
diamonds_encoded = pd.get_dummies(diamonds, drop_first=True)


In [10]:
diamonds_encoded

Unnamed: 0,carat,depth,table,price,x,y,z,cut_Premium,cut_Very Good,cut_Good,...,color_H,color_I,color_J,clarity_VVS1,clarity_VVS2,clarity_VS1,clarity_VS2,clarity_SI1,clarity_SI2,clarity_I1
0,0.23,61.5,55.0,326,3.95,3.98,2.43,False,False,False,...,False,False,False,False,False,False,False,False,True,False
1,0.21,59.8,61.0,326,3.89,3.84,2.31,True,False,False,...,False,False,False,False,False,False,False,True,False,False
2,0.23,56.9,65.0,327,4.05,4.07,2.31,False,False,True,...,False,False,False,False,False,True,False,False,False,False
3,0.29,62.4,58.0,334,4.20,4.23,2.63,True,False,False,...,False,True,False,False,False,False,True,False,False,False
4,0.31,63.3,58.0,335,4.34,4.35,2.75,False,False,True,...,False,False,True,False,False,False,False,False,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53935,0.72,60.8,57.0,2757,5.75,5.76,3.50,False,False,False,...,False,False,False,False,False,False,False,True,False,False
53936,0.72,63.1,55.0,2757,5.69,5.75,3.61,False,False,True,...,False,False,False,False,False,False,False,True,False,False
53937,0.70,62.8,60.0,2757,5.66,5.68,3.56,False,True,False,...,False,False,False,False,False,False,False,True,False,False
53938,0.86,61.0,58.0,2757,6.15,6.12,3.74,True,False,False,...,True,False,False,False,False,False,False,False,True,False


In [12]:
diamonds

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.20,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75
...,...,...,...,...,...,...,...,...,...,...
53935,0.72,Ideal,D,SI1,60.8,57.0,2757,5.75,5.76,3.50
53936,0.72,Good,D,SI1,63.1,55.0,2757,5.69,5.75,3.61
53937,0.70,Very Good,D,SI1,62.8,60.0,2757,5.66,5.68,3.56
53938,0.86,Premium,H,SI2,61.0,58.0,2757,6.15,6.12,3.74


In [13]:
# Separate features and target
X = diamonds_encoded.drop('price', axis=1)
y = diamonds_encoded['price']


In [15]:
y.head()

Unnamed: 0,price
0,326
1,326
2,327
3,334
4,335


In [16]:
X.head()

Unnamed: 0,carat,depth,table,x,y,z,cut_Premium,cut_Very Good,cut_Good,cut_Fair,...,color_H,color_I,color_J,clarity_VVS1,clarity_VVS2,clarity_VS1,clarity_VS2,clarity_SI1,clarity_SI2,clarity_I1
0,0.23,61.5,55.0,3.95,3.98,2.43,False,False,False,False,...,False,False,False,False,False,False,False,False,True,False
1,0.21,59.8,61.0,3.89,3.84,2.31,True,False,False,False,...,False,False,False,False,False,False,False,True,False,False
2,0.23,56.9,65.0,4.05,4.07,2.31,False,False,True,False,...,False,False,False,False,False,True,False,False,False,False
3,0.29,62.4,58.0,4.2,4.23,2.63,True,False,False,False,...,False,True,False,False,False,False,True,False,False,False
4,0.31,63.3,58.0,4.34,4.35,2.75,False,False,True,False,...,False,False,True,False,False,False,False,False,True,False


In [19]:
# 4. Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [23]:
# 5. Apply Lasso Regression
# Set alpha (regularization strength) parameter; tune this value as needed
alpha = 0.1
lasso = Lasso(alpha=alpha)


# Fit the model
lasso.fit(X_train_scaled, y_train)


In [24]:
# 6. Interpretation of Coefficients
coefficients = lasso.coef_
features = X.columns


In [26]:
features

Index(['carat', 'depth', 'table', 'x', 'y', 'z', 'cut_Premium',
       'cut_Very Good', 'cut_Good', 'cut_Fair', 'color_E', 'color_F',
       'color_G', 'color_H', 'color_I', 'color_J', 'clarity_VVS1',
       'clarity_VVS2', 'clarity_VS1', 'clarity_VS2', 'clarity_SI1',
       'clarity_SI2', 'clarity_I1'],
      dtype='object')

In [27]:
print("Lasso Regression Coefficients:")
for feature, coef in zip(features, coefficients):
    print(f"{feature}: {coef}")


Lasso Regression Coefficients:
carat: 5335.154338575525
depth: -92.72231349314123
table: -59.36071783977903
x: -1125.2451819897985
y: -3.8015437658953375
z: -25.899181350699873
cut_Premium: -33.36104478671528
cut_Very Good: -45.27259259596134
cut_Good: -76.44179139454731
cut_Fair: -145.330377043797
color_E: -83.49295455559671
color_F: -106.0053396754168
color_G: -200.54254092242965
color_H: -361.1399576492727
color_I: -441.2338780343329
color_J: -525.1644284700458
clarity_VVS1: -86.5999289229059
clarity_VVS2: -116.75371030797336
clarity_VS1: -279.2373689830957
clarity_VS2: -458.8014816490574
clarity_SI1: -722.887989372465
clarity_SI2: -996.55338347122
clarity_I1: -619.6018012474327


In [28]:
# 7. Check which features were selected (non-zero coefficients)
selected_features = [feature for feature, coef in zip(features, coefficients) if coef != 0]
print("\nSelected Features (Non-zero coefficients):", selected_features)



Selected Features (Non-zero coefficients): ['carat', 'depth', 'table', 'x', 'y', 'z', 'cut_Premium', 'cut_Very Good', 'cut_Good', 'cut_Fair', 'color_E', 'color_F', 'color_G', 'color_H', 'color_I', 'color_J', 'clarity_VVS1', 'clarity_VVS2', 'clarity_VS1', 'clarity_VS2', 'clarity_SI1', 'clarity_SI2', 'clarity_I1']


In [29]:
# 8. Predict on the test set
y_pred = lasso.predict(X_test_scaled)


In [30]:
y_pred

array([ 711.25497237, 3193.03731072, 1947.5894765 , ...,  882.50656234,
       8708.94267169, 3062.46919031])

In [31]:
# 9. Evaluate the model
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"\nMean Squared Error: {mse}")
print(f"R^2 Score: {r2}")



Mean Squared Error: 1288614.8864275338
R^2 Score: 0.9189388337550065


Q4. What are the tuning parameters that can be adjusted in Lasso Regression, and how do they affect the
model's performance?

In Lasso Regression, the primary tuning parameter is the **regularization parameter** (often denoted as α or lambda). Adjusting this parameter can significantly affect the model's performance. Here’s a detailed look at the key tuning parameters and how they impact the model:

### 1. **Regularization Parameter (α or lambda)**

- **Description**: The regularization parameter controls the strength of the L1 penalty applied to the model coefficients. It determines how much the model's coefficients are shrunk towards zero.

- **Effect on Model**:
  - **High α**: A large α value increases the penalty on the coefficients. This results in more coefficients being pushed to exactly zero, leading to a sparser model with fewer features. While this can help in feature selection and reduce overfitting, it may also cause underfitting if too many coefficients are shrunk to zero.
  - **Low α**: A small α value decreases the penalty on the coefficients, allowing the model to retain more features and fit the training data more closely. However, this may lead to overfitting if the model becomes too complex.

### 2. **Feature Scaling**

- **Description**: Although not a parameter in Lasso itself, **feature scaling** (standardization or normalization) is crucial when applying Lasso Regression. It ensures that the L1 penalty is applied uniformly across all features.

- **Effect on Model**:
  - **Without Scaling**: Features with larger scales will disproportionately influence the regularization term, which can lead to biased coefficient shrinkage.
  - **With Scaling**: Ensures that all features contribute equally to the penalty, allowing for more balanced feature selection.

### 3. **Normalization**

- **Description**: In some implementations of Lasso (e.g., `sklearn`), there is an option to normalize the features before applying the L1 penalty.

- **Effect on Model**:
  - **Normalization**: It ensures that each feature contributes equally to the penalty term, which can improve the effectiveness of Lasso in regularizing the model.

### 4. **Max Iterations**

- **Description**: This parameter determines the maximum number of iterations the optimization algorithm will run to find the optimal coefficients.

- **Effect on Model**:
  - **Insufficient Iterations**: If the maximum number of iterations is too low, the algorithm may not converge to the optimal solution, leading to suboptimal model performance.
  - **Excessive Iterations**: Increasing iterations generally ensures convergence but can lead to longer computation times.

### 5. **Tolerance**

- **Description**: The tolerance parameter sets a threshold for the convergence criterion. It determines when the optimization algorithm should stop iterating if changes in the solution are below this threshold.

- **Effect on Model**:
  - **Low Tolerance**: Ensures higher precision in the solution but may increase computation time.
  - **High Tolerance**: Can lead to faster convergence but may result in a less accurate solution.



### Summary

- **Regularization Parameter (α)**: Controls the strength of the L1 penalty. Adjusting it affects model sparsity and performance.
- **Feature Scaling**: Ensures uniform application of the penalty across features.
- **Normalization**: Optional but helps in feature equality in some implementations.
- **Max Iterations**: Controls the number of iterations for convergence.
- **Tolerance**: Sets the convergence criterion.

Tuning these parameters helps in finding the balance between fitting the training data well and generalizing to new, unseen data.

Q5. Can Lasso Regression be used for non-linear regression problems? If yes, how?

Yes, Lasso Regression can be adapted for non-linear regression problems by incorporating **feature engineering** techniques to transform the original features into a higher-dimensional space where linear relationships can approximate the non-linear patterns. Here are a few common approaches:

1. **Polynomial Features**:
   - By adding polynomial terms (e.g., squares, cubes) of the original features, you can capture non-linear relationships. Lasso Regression can then be applied to these polynomial features.
   - Example in Python:
     ```python
     from sklearn.preprocessing import PolynomialFeatures
     from sklearn.linear_model import Lasso
     from sklearn.pipeline import Pipeline
     from sklearn.model_selection import train_test_split

     # Generate synthetic data
     X, y = make_regression(n_samples=100, n_features=1, noise=0.1)

     # Create a pipeline with polynomial features and Lasso regression
     model = Pipeline([
         ('poly', PolynomialFeatures(degree=3)),
         ('lasso', Lasso(alpha=0.1))
     ])

     # Split the data
     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

     # Fit the model
     model.fit(X_train, y_train)

     # Predict and evaluate
     y_pred = model.predict(X_test)
     print(f"Predictions: {y_pred}")
     ```

2. **Interaction Features**:
   - Interaction terms (products of pairs of features) can also capture non-linear relationships. These can be added manually or using tools like `PolynomialFeatures` with interaction-only terms.

3. **Kernel Methods**:
   - Kernel methods, such as the kernel trick used in Support Vector Machines (SVMs), can be applied to transform the data into a higher-dimensional space. While Lasso itself doesn't directly support kernel methods, you can use kernel approximations to achieve similar effects.

4. **Basis Functions**:
   - Basis functions like splines or radial basis functions (RBFs) can be used to transform the features. These transformations can then be fed into Lasso Regression.

By transforming the features, Lasso Regression can effectively handle non-linear relationships while still benefiting from the regularization properties that help in feature selection and preventing overfitting.

Example of Tuning Parameters in sklearn Lasso


In [32]:
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_california_housing

# Load and prepare the data
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Apply LassoCV for hyperparameter tuning
lasso_cv = LassoCV(alphas=np.logspace(-4, 4, 100), cv=5, max_iter=10000, n_jobs=-1)
lasso_cv.fit(X_train_scaled, y_train)

# Optimal alpha
print(f"Optimal α: {lasso_cv.alpha_}")

# Model evaluation
from sklearn.metrics import mean_squared_error, r2_score

y_pred = lasso_cv.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R^2 Score: {r2}")


Optimal α: 0.0006428073117284319
Mean Squared Error: 0.554975164329658
R^2 Score: 0.5764870559154289


Summary

Regularization Parameter (α): Controls the strength of the L1 penalty.
Adjusting it affects model sparsity and performance.

Feature Scaling: Ensures uniform application of the penalty across features.

Normalization: Optional but helps in feature equality in some implementations.

Max Iterations: Controls the number of iterations for convergence.

Tolerance: Sets the convergence criterion

Tuning these parameters helps in finding the balance between fitting the training data well and generalizing to new, unseen data.







Once you’ve created non-linear features, you can apply Lasso Regression to this transformed feature set. The L1 regularization will then act on the new non-linear features.

Here’s how you can apply Lasso Regression after creating polynomial features:



In [33]:
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Lasso
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Load and prepare the data
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Create polynomial features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)



In [34]:
X.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25


In [35]:
y.head()

Unnamed: 0,0
0,4.526
1,3.585
2,3.521
3,3.413
4,3.422


In [40]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.2, random_state=42)


In [41]:
# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [42]:
# Apply Lasso Regression
lasso = Lasso(alpha=0.1)
lasso.fit(X_train_scaled, y_train)


In [43]:
# Model evaluation
y_pred = lasso.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R^2 Score: {r2}")

Mean Squared Error: 0.6693861704590882
R^2 Score: 0.48917766775554605


1. Polynomial

Polynomial features allow you to capture non-linear relationships by adding higher-order terms of the original features. This approach can be implemented using the PolynomialFeatures class from sklearn.



In [44]:
from sklearn.datasets import make_regression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Lasso
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=1, noise=0.1)


In [45]:
X

array([[ 1.06131312],
       [-1.76275609],
       [ 1.33980067],
       [ 0.40384173],
       [-1.12445238],
       [-0.45499555],
       [-0.08788469],
       [ 1.03510749],
       [-1.13321493],
       [-0.09700894],
       [ 0.2939535 ],
       [-0.54624035],
       [-1.2854569 ],
       [ 0.97414575],
       [-0.73580841],
       [-0.21476538],
       [-1.48134595],
       [-1.31995036],
       [-1.33458955],
       [ 2.32325483],
       [ 0.9598839 ],
       [-0.45229732],
       [ 2.6431347 ],
       [ 0.18406731],
       [ 1.18019221],
       [-2.8399941 ],
       [ 0.10527479],
       [-0.3861997 ],
       [-0.05758518],
       [ 1.46182935],
       [ 0.92282987],
       [ 0.31837387],
       [ 0.22366882],
       [ 0.14905295],
       [-0.34998224],
       [-1.27337814],
       [-0.58392525],
       [ 0.9750912 ],
       [-0.54364097],
       [ 0.54372736],
       [ 0.02389105],
       [-1.28414004],
       [ 0.45608002],
       [ 0.85929844],
       [-0.65585762],
       [-0

In [46]:
y

array([ 16.32124453, -27.16016681,  20.7295591 ,   6.23220587,
       -17.31657596,  -7.07443108,  -1.4149894 ,  15.89956388,
       -17.52308598,  -1.50936767,   4.37146919,  -8.4852868 ,
       -19.79944556,  14.96994942, -11.36689227,  -3.14124169,
       -22.95222502, -20.50261747, -20.6129656 ,  35.70214472,
        14.89826143,  -6.95931545,  40.88254132,   2.79788037,
        18.23649418, -43.88237685,   1.59151861,  -5.94570774,
        -0.89294155,  22.59271409,  14.36296737,   4.93575401,
         3.45741678,   2.32098369,  -5.35377278, -19.56023172,
        -9.05729601,  14.89184161,  -8.34023828,   8.53316152,
         0.25135948, -19.66189156,   6.96944141,  13.15229642,
       -10.25435203,  -1.14025722, -10.41777491,  45.52725207,
        -4.02599812, -11.49708149, -17.02784839, -14.4025692 ,
        27.84680174,   1.40099891,  15.6206213 ,  -0.60008357,
        -4.30232783,  -3.6990392 , -18.78674741,   1.013766  ,
       -13.80119457,   7.90218646, -11.84643981,  23.68

In [47]:
# Create a pipeline with polynomial features and Lasso regression
model = Pipeline([
    ('poly', PolynomialFeatures(degree=3)),  # Polynomial features up to degree 3
    ('lasso', Lasso(alpha=0.1))
])


In [48]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [49]:
# Fit the model
model.fit(X_train, y_train)


In [50]:
# Predict and evaluate
y_pred = model.predict(X_test)


In [51]:
y_pred

array([-42.81636622,   1.35664719, -20.7868287 ,  -1.13378702,
       -10.01576973,   8.25662798,  40.77029981,  13.51632315,
         4.45068964,  16.16637618, -20.41067926,  14.04602448,
        16.49375411,   2.24448437,  -5.22172694, -17.18302818,
         7.70358931,  33.34479861, -19.65511084,   4.82260176])

In [52]:
print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred)}")
print(f"R^2 Score: {r2_score(y_test, y_pred)}")


Mean Squared Error: 0.032742748891198506
R^2 Score: 0.9999112599394524


#2. Interaction Features
Interaction features capture the effect of features interacting with each other. You can use PolynomialFeatures with the interaction_only=True parameter to create interaction terms.



In [1]:
from sklearn.datasets import make_regression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Lasso
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data with two features
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)

# Create a pipeline with interaction-only polynomial features and Lasso regression
model = Pipeline([
    ('poly', PolynomialFeatures(degree=2, interaction_only=True)),
    ('lasso', Lasso(alpha=0.1))
])

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the model
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)

print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred)}")
print(f"R^2 Score: {r2_score(y_test, y_pred)}")


Mean Squared Error: 0.025798744983562306
R^2 Score: 0.999997662112907


#3. Kernel Methods
Although Lasso itself does not directly support kernel methods, you can use kernel approximations or other methods such as Support Vector Regression (SVR) with kernel functions to handle non-linear relationships.



In [2]:
from sklearn.svm import SVR
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=1, noise=0.1)

# Create and fit the SVR model with RBF kernel
svr = SVR(kernel='rbf', C=1e3, gamma=0.1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
svr.fit(X_train, y_train)

# Predict and evaluate
y_pred = svr.predict(X_test)
print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred)}")
print(f"R^2 Score: {r2_score(y_test, y_pred)}")


Mean Squared Error: 0.0065284627983630324
R^2 Score: 0.9999785347839895


#4. Basis Functions
Basis functions like splines or radial basis functions (RBFs) can be used for feature transformations.



In [3]:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=1, noise=0.1)
# Create and fit the Gaussian Process Regressor with RBF kernel
kernel = C(1.0, (1e-3, 1e3)) * RBF(1.0, (1e-2, 1e2))
gpr = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
gpr.fit(X_train, y_train)


# Predict and evaluate
y_pred, _ = gpr.predict(X_test, return_std=True)
print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred)}")
print(f"R^2 Score: {r2_score(y_test, y_pred)}")


Mean Squared Error: 25.34133479206591
R^2 Score: 0.5659582711117649


Example with Lasso Regression


In [6]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Without scaling
lasso_no_scaling = Lasso(alpha=0.1)
lasso_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = lasso_no_scaling.predict(X_test)
print(f"Without scaling - Mean Squared Error: {mean_squared_error(y_test, y_pred_no_scaling)}")
print(f"Without scaling - R^2 Score: {r2_score(y_test, y_pred_no_scaling)}")

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

lasso_with_scaling = Lasso(alpha=0.1)
lasso_with_scaling.fit(X_train_scaled, y_train)
y_pred_with_scaling = lasso_with_scaling.predict(X_test_scaled)
print(f"With scaling - Mean Squared Error: {mean_squared_error(y_test, y_pred_with_scaling)}")
print(f"With scaling - R^2 Score: {r2_score(y_test, y_pred_with_scaling)}")


Without scaling - Mean Squared Error: 0.6135115198058131
Without scaling - R^2 Score: 0.5318167610318159
With scaling - Mean Squared Error: 0.6796290284328825
With scaling - R^2 Score: 0.48136113250290735


Q6. What is the difference between Ridge Regression and Lasso Regression?

Ridge Regression and Lasso Regression are both regularization techniques used to prevent overfitting in linear regression models, but they differ in how they apply penalties to the coefficients.

### **Ridge Regression (L2 Regularization)**
- **Penalty Term**: Adds a penalty equal to the sum of the squared values of the coefficients ($$\lambda \sum \beta_j^2$$).
- **Effect on Coefficients**: Shrinks the coefficients towards zero but never exactly zero, meaning all features are retained in the model.
- **Use Case**: Useful when you have many features that are all potentially contributing to the prediction, as it reduces the impact of less important features without eliminating them.

### **Lasso Regression (L1 Regularization)**
- **Penalty Term**: Adds a penalty equal to the sum of the absolute values of the coefficients ($$\lambda \sum |\beta_j|$$).
- **Effect on Coefficients**: Can shrink some coefficients to exactly zero, effectively performing feature selection by eliminating less important features.
- **Use Case**: Useful when you want a simpler model that selects only the most important features, making it easier to interpret.

### **Key Differences**
- **Penalty Type**: Ridge uses the sum of squared coefficients, while Lasso uses the sum of absolute coefficients.
- **Feature Selection**: Ridge retains all features, whereas Lasso can eliminate some features by setting their coefficients to zero.
- **Model Complexity**: Ridge tends to produce more complex models with all features included, while Lasso can produce simpler models with fewer features.



In [7]:
import numpy as np
from sklearn.linear_model import Ridge, Lasso
from sklearn.metrics import mean_squared_error

# Sample data
X = np.random.randn(100, 10)
y = X.dot(np.random.randn(10)) + np.random.randn(100)

# Ridge Regression
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
ridge_pred = ridge.predict(X)
ridge_mse = mean_squared_error(y, ridge_pred)

# Lasso Regression
lasso = Lasso(alpha=1.0)
lasso.fit(X, y)
lasso_pred = lasso.predict(X)
lasso_mse = mean_squared_error(y, lasso_pred)

print(f"Ridge MSE: {ridge_mse:.2f}")
print(f"Lasso MSE: {lasso_mse:.2f}")


Ridge MSE: 1.13
Lasso MSE: 3.68


Q7. Can Lasso Regression handle multicollinearity in the input features? If yes, how?

Yes, Lasso Regression can handle multicollinearity in the input features. It does this through its regularization technique, which includes a penalty term in the objective function that constrains the coefficients of the model.

### How Lasso Regression Handles Multicollinearity

1. **Regularization and Shrinkage**: Lasso (Least Absolute Shrinkage and Selection Operator) adds an \( \ell_1 \) penalty to the loss function, which forces some of the regression coefficients to be exactly zero. This penalty discourages the model from fitting all the correlated features simultaneously, effectively selecting one or a few of them while shrinking the rest towards zero. This results in a simpler model and reduces the impact of multicollinearity.

2. **Variable Selection**: By driving some coefficients to zero, Lasso performs feature selection. In the presence of multicollinearity, where multiple features are correlated, Lasso tends to pick one feature from a group of correlated features and set the rest to zero. This helps in identifying the most relevant predictors and ignoring redundant ones, thus mitigating multicollinearity issues.

3. **Bias-Variance Trade-off**: By adding a regularization term, Lasso introduces bias into the model but reduces variance. This trade-off can lead to better generalization on unseen data, especially when multicollinearity can cause large variance in ordinary least squares (OLS) estimates.

In summary, Lasso Regression addresses multicollinearity by selecting a subset of features and shrinking others to zero, thereby reducing the complexity of the model and mitigating the issues associated with multicollinearity.

Q8. How do you choose the optimal value of the regularization parameter (lambda) in Lasso Regression?

Choosing the optimal value of the regularization parameter (lambda) in Lasso Regression is crucial for balancing model complexity and performance. Here are some common methods to determine the best lambda:

1. **Cross-Validation**:
   - **k-Fold Cross-Validation**: This is the most widely used method. The dataset is divided into k subsets, and the model is trained k times, each time using a different subset as the validation set and the remaining k-1 subsets as the training set. The lambda that minimizes the average error across all k trials is chosen³.
   - **Leave-One-Out Cross-Validation (LOOCV)**: This is a special case of k-fold cross-validation where k equals the number of observations. It can be computationally expensive but provides a thorough evaluation.

2. **Information Criteria**:
   - **Akaike Information Criterion (AIC)** and **Bayesian Information Criterion (BIC)**: These criteria can be used to select the lambda that balances model fit and complexity. Lower values of AIC or BIC indicate a better model¹.

3. **Grid Search**:
   - This involves specifying a range of lambda values and evaluating the model performance for each value. The lambda that results in the best performance metric (e.g., mean squared error) is selected.

4. **Regularization Path**:
   - Some algorithms, like LARS (Least Angle Regression), can compute the entire path of coefficients for different lambda values efficiently. This allows you to visualize how coefficients shrink as lambda increases and choose an optimal point.

