# Regression-2

#### Q1. Explain the concept of R-squared in linear regression models. How is it calculated, and what does it represent?

**R-squared (coefficient of determination)** is a statistical measure used to assess how well the independent variables (predictors) explain the variability of the dependent variable in a linear regression model. It represents the proportion of the variance in the dependent variable that is explained by the independent variables.
* It's calculated as: R-squared = 1 - (SSR/SST), *where SSR is the sum of squared residuals (the difference between predicted and actual values), and SST is the total sum of squares (the difference between actual values and the mean of the dependent variable).*
* R-squared ranges from 0 to 1. A value closer to 1 indicates that a larger portion of the variability in the dependent variable is explained by the model's predictors.

#### Q2. Define adjusted R-squared and explain how it differs from the regular R-squared.

**Adjusted R-squared** adjusts the R-squared value to account for the number of predictors in the model. It penalizes the addition of irrelevant predictors that might artificially increase R-squared. It's especially useful when comparing models with different numbers of predictors.

* Adjusted R-squared = 1 - [(1 - R-squared) * (n - 1) / (n - p - 1)],
*where n is the number of observations and p is the number of predictors.*

#### Q3. When is it more appropriate to use adjusted R-squared?

Adjusted R-squared is more appropriate when we're comparing models with different numbers of predictors. It helps us to determine if the additional predictors actually improve the model's fit and explain variability, considering the potential risk of overfitting.

#### Q4. What are RMSE, MSE, and MAE in the context of regression analysis? How are these metrics calculated, and what do they represent?

* **RMSE (Root Mean Squared Error):** The square root of the average of squared differences between predicted and actual values. It penalizes larger errors more heavily.
* **MSE (Mean Squared Error):** The average of squared differences between predicted and actual values.
* **MAE (Mean Absolute Error):** The average of absolute differences between predicted and actual values.

#### Q5. Discuss the advantages and disadvantages of using RMSE, MSE, and MAE as evaluation metrics in regression analysis.

* **Advantages:**
    * All three metrics are common and easy to understand.
    * RMSE and MSE give more weight to larger errors, which might be important in some applications.
    * MAE is more robust to outliers.

* **Disadvantages:**
    * All metrics don't directly indicate whether errors are overestimates or underestimates.
    * RMSE and MSE can be sensitive to outliers.
    * The choice of metric depends on the specific problem and the emphasis on different types of errors.

#### Q6. Explain the concept of Lasso regularization. How does it differ from Ridge regularization, and when is it more appropriate to use?

*Lasso (Least Absolute Shrinkage and Selection Operator) regularization* adds a penalty term to the linear regression cost function that's proportional to the absolute values of the regression coefficients. It can drive some coefficients to exactly zero, effectively performing feature selection.

#### Q7. How do regularized linear models help to prevent overfitting in machine learning? Provide an example to illustrate.

Regularization techniques like Lasso and Ridge add a penalty to the magnitude of the coefficients, preventing them from becoming too large. This helps in reducing overfitting by simplifying the model and avoiding fitting noise in the data.

    Example: In a linear regression predicting house prices, regularization can shrink the coefficients of less relevant features, preventing them from exerting excessive influence on the predictions.

#### Q8. Discuss the limitations of regularized linear models and explain why they may not always be the best choice for regression analysis.

* Choice of hyperparameters (like the regularization strength) can be challenging and might require cross-validation.
* Regularization might oversimplify the model by pushing some coefficients too close to zero, potentially ignoring important variables.
* In cases where all predictors are relevant, regularization might not be the best choice.

#### Q9. You are comparing the performance of two regression models using different evaluation metrics. Model A has an RMSE of 10, while Model B has an MAE of 8. Which model would you choose as the better performer, and why? Are there any limitations to your choice of metric?

Both RMSE and MAE measure prediction accuracy, but RMSE penalizes larger errors more. In our case, Model B with an MAE of 8 is likely the better performer, as it indicates smaller average errors. However, the choice depends on the problem's specifics and the importance of different types of errors.

#### Q10. You are comparing the performance of two regularized linear models using different types of regularization. Model A uses Ridge regularization with a regularization parameter of 0.1, while Model B uses Lasso regularization with a regularization parameter of 0.5. Which model would you choose as the better performer, and why? Are there any trade-offs or limitations to your choice of regularization method?

Choosing between Ridge and Lasso depends on the situation. Model A (Ridge) with a regularization parameter of 0.1 might be preferred if we suspect all predictors are relevant but need some degree of regularization. Model B (Lasso) with a regularization parameter of 0.5 is more appropriate if we suspect that many predictors are irrelevant and can be effectively removed.

Trade-offs: Ridge tends to shrink coefficients towards zero without making them exactly zero, while Lasso can lead to some coefficients being exactly zero, effectively performing feature selection. This trade-off affects model interpretability and complexity.