# Q1. What is Lasso Regression, and how does it differ from other regression techniques?

Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a **linear regression technique** that adds **L1 regularization** to the loss function. This penalizes the absolute sum of coefficients, effectively shrinking some of them to **zero**, which enables **automatic feature selection**.

### Differences from Other Regression Techniques:
1. **Ordinary Least Squares (OLS):** OLS does not include regularization, which can lead to overfitting when features are highly correlated.
2. **Ridge Regression:** Ridge applies **L2 regularization**, shrinking coefficients toward zero but never setting them to exactly zero, meaning all features are retained.
3. **Elastic Net Regression:** Elastic Net combines both L1 (Lasso) and L2 (Ridge) penalties, balancing feature selection and coefficient shrinkage.

Lasso is particularly useful when working with **high-dimensional datasets** and **sparse models**, as it helps identify the most important predictors by eliminating irrelevant features.


# Q2. What is the main advantage of using Lasso Regression in feature selection?

The main advantage of **Lasso Regression** in feature selection is its ability to **automatically eliminate irrelevant or less important features** by shrinking their coefficients to **exactly zero**. This leads to a **sparser model**, improving interpretability and reducing overfitting.

### Key Benefits:
- **Feature Selection:** Unimportant features get removed, simplifying the model.
- **Better Interpretability:** Only the most relevant predictors remain.
- **Reduced Multicollinearity:** Helps in handling correlated variables by selecting one and ignoring the others.
- **Improved Model Efficiency:** With fewer features, the model trains faster and generalizes better.

This makes Lasso Regression especially useful for **high-dimensional datasets** where feature selection is crucial.


# Q3. How do you interpret the coefficients of a Lasso Regression model?

In **Lasso Regression**, the coefficients represent the relationship between each independent variable and the dependent variable, similar to **Ordinary Least Squares (OLS) regression**. However, due to the **L1 regularization penalty**, some coefficients may shrink to **exactly zero**, effectively removing those features from the model.

### Interpretation:
1. **Non-Zero Coefficients:** These indicate that the corresponding features contribute to the prediction.
2. **Zero Coefficients:** These features are completely removed from the model, meaning they are not significant in predicting the target variable.
3. **Magnitude of Coefficients:** Larger absolute values suggest a stronger influence on the dependent variable.

### Example:
If we apply Lasso to predict house prices:
- A coefficient of **50,000** for `square_footage` means that increasing the house size by 1 unit increases the price by **₹50,000**, assuming other factors remain constant.
- A coefficient of **0** for `garden_size` means that `garden_size` has no significant impact on house price and has been removed by Lasso.

This ability to **select important features** makes Lasso a powerful tool for both **prediction and feature selection**.


# Q4. What are the tuning parameters that can be adjusted in Lasso Regression, and how do they affect the model's performance?

### **Tuning Parameter in Lasso Regression:**
The main tuning parameter in **Lasso Regression** is **lambda (α)**, which controls the strength of the **L1 regularization**. This parameter determines how much penalty is applied to the model coefficients.

### **Effect of Lambda (α) on Model Performance:**
1. **Small α (Close to 0)**:
   - **Minimal regularization**, making Lasso behave like **Ordinary Least Squares (OLS)**.
   - **More features are retained**, leading to a **complex model**.
   - **Higher risk of overfitting**.

2. **Moderate α**:
   - **Some coefficients shrink to zero**, performing **feature selection**.
   - **Balances model complexity and performance**.
   - **Reduces overfitting** while maintaining relevant features.

3. **Large α**:
   - **Strong regularization**, shrinking more coefficients to **exactly zero**.
   - **Fewer features are used**, leading to a **simpler model**.
   - **May underfit the data** if too many important features are removed.

### **Tuning Lambda (α):**
To find the optimal **α**, we can use:
- **Cross-validation (GridSearchCV or RandomizedSearchCV)**
- **Regularization paths (e.g., LassoCV in Scikit-learn)**

### **Other Parameters in Lasso:**
- `max_iter`: Controls the number of iterations for optimization.
- `tol`: Determines convergence criteria.
- `fit_intercept`: Whether to include an intercept term.

### **Conclusion:**
Tuning **λ (alpha)** is crucial for balancing **bias-variance tradeoff**. Choosing an optimal value ensures **good generalization**, avoiding both **overfitting and underfitting**.


# Q5. Can Lasso Regression be used for non-linear regression problems? If yes, how?

### **Lasso Regression and Non-Linearity**
Lasso Regression is a **linear model** by default, meaning it assumes a **linear relationship** between independent variables (features) and the target variable. However, it can be extended to handle **non-linear regression problems** by transforming the input features.

### **How to Use Lasso for Non-Linear Regression?**
1. **Feature Engineering with Polynomial Features**  
   - Convert the original features into **polynomial terms** (e.g., quadratic, cubic).
   - Example: If `X` is the original feature, transform it into `X²`, `X³`, etc.
   - Use **PolynomialFeatures** from `sklearn.preprocessing` to generate non-linear terms.

2. **Apply Lasso Regression on Transformed Features**  
   - After transforming the data into a higher-dimensional space, Lasso can be used to **select the most important features** while preventing overfitting.

# Q6. What is the difference between Ridge Regression and Lasso Regression?

### **Key Differences Between Ridge and Lasso Regression**
| Feature            | Ridge Regression | Lasso Regression |
|-------------------|----------------|----------------|
| **Regularization Type** | L2 (squared sum of coefficients) | L1 (absolute sum of coefficients) |
| **Effect on Coefficients** | Shrinks coefficients but does not set them to zero | Shrinks coefficients and can set some to zero (feature selection) |
| **Feature Selection** | No feature selection (keeps all features) | Performs feature selection (eliminates less important features) |
| **Best Used When** | All features are important, and multicollinearity is present | Some features are irrelevant, and automatic feature selection is needed |
| **Computational Complexity** | Slightly more efficient than Lasso | Can be computationally expensive when many features are zeroed out |
| **Handling Multicollinearity** | Reduces multicollinearity by shrinking correlated coefficients | Also reduces multicollinearity but may eliminate some correlated features |


# Q7. Can Lasso Regression handle multicollinearity in the input features? If yes, how?

### **Yes, Lasso Regression can handle multicollinearity, but with limitations.**

### **How Lasso Handles Multicollinearity**
1. **Feature Selection:** Lasso applies L1 regularization, which forces some feature coefficients to become exactly zero. This helps remove redundant or highly correlated features, reducing multicollinearity.
2. **Sparse Model:** By eliminating some features, Lasso selects the most relevant ones, preventing overfitting caused by multicollinearity.
3. **Bias-Variance Tradeoff:** Lasso introduces bias to the model, reducing variance and making it more robust in the presence of correlated features.

### **Limitations**
- **Arbitrary Feature Selection:** If two or more features are highly correlated, Lasso may randomly select one and eliminate the others, which can lead to loss of valuable information.
- **Not Ideal for Highly Correlated Data:** Ridge regression is often preferred when dealing with extreme multicollinearity because it shrinks coefficients without setting them to zero.

### **Conclusion**
Lasso can handle multicollinearity by performing feature selection, but it may arbitrarily drop correlated features. If retaining all correlated features is important, **Ridge Regression** might be a better choice.


# Q8. How do you choose the optimal value of the regularization parameter (lambda) in Lasso Regression?

### **The optimal value of lambda in Lasso Regression is chosen using cross-validation.**

### **Methods to Select Lambda (λ)**
1. **Cross-Validation (CV):**  
   - Perform **k-fold cross-validation** (commonly 5-fold or 10-fold) on the training data.
   - Choose the λ value that minimizes the validation error (e.g., Mean Squared Error).
   - This ensures a balance between underfitting and overfitting.

2. **Grid Search:**  
   - Define a range of λ values (e.g., from `0.001` to `10`).
   - Train the model on different λ values and evaluate performance.
   - Select the λ that gives the best prediction accuracy.

3. **Lasso Path (Using LARS Algorithm):**  
   - Compute the entire path of solutions for different λ values.
   - Use **Akaike Information Criterion (AIC)** or **Bayesian Information Criterion (BIC)** to choose the best λ.

4. **Regularization Path Visualization:**  
   - Plot feature coefficients vs. λ values.
   - Identify the point where important features remain while unimportant ones shrink to zero.

### **Conclusion**
Selecting λ using **cross-validation** is the most common and effective method. A small λ leads to minimal regularization (risking overfitting), while a large λ results in strong regularization (risking underfitting). The optimal λ balances these effects.
