## Evaluating Regression Models (Q1-Q5)

**Q1. R-squared:**

R-squared (R²) is a statistical measure used in linear regression to assess how well the regression line fits the data. It represents the proportion of the variance in the dependent variable (Y) that is explained by the independent variable(s) (X).

**Calculation:**

R² = 1 - (Σ (yi - ŷi)² ) / (Σ (yi - y̅)²)

* yi: actual value
* ŷi: predicted value
* y̅: mean of actual values

**Interpretation:**

* 0: No linear relationship
* 1: Perfect fit (all points fall exactly on the regression line)
* Values between 0 and 1 indicate the strength of the linear relationship, with higher values suggesting a better fit.

**Q2. Adjusted R-squared:**

Adjusted R-squared (adjusted R²) penalizes R² for the number of independent variables in the model. It accounts for model complexity, preventing overfitting to the training data.

**Difference from R-squared:**

* R² can increase simply by adding more variables, even if they are not truly explanatory.
* Adjusted R² discourages this by considering the penalty for model complexity.

**Q3. When to use Adjusted R-squared:**

* When comparing models with different numbers of independent variables.
* When assessing the generalizability of a model beyond the training data.

**Q4. Error Metrics in Regression:**

* **Root Mean Squared Error (RMSE):** Squares the errors (differences between predicted and actual values), takes the square root to get units of the original variable. Represents the average magnitude of the error.

* **Mean Squared Error (MSE):** Squares the errors and then averages them. Sensitive to outliers as large errors are squared more heavily.

* **Mean Absolute Error (MAE):** Takes the absolute value of the errors and then averages them. Less sensitive to outliers than MSE, but results are in the units of the predicted variable.

**Calculation:**

* RMSE = √(Σ (yi - ŷi)² / n)
* MSE = Σ (yi - ŷi)² / n
* MAE = Σ |yi - ŷi| / n

**Q5. Advantages and Disadvantages:**

* **RMSE:** Easy to interpret, considers magnitude of errors. Disadvantage: Sensitive to outliers.
* **MSE:** Sensitive to outliers, can be difficult to interpret due to squared values. Advantage: Easier to work with mathematically.
* **MAE:** Less sensitive to outliers, interpretable in original units. Disadvantage: Doesn't consider the magnitude of errors.

## Regularization for Linear Regression (Q6-Q8)

**Q6. Lasso Regularization:**

Lasso regularization shrinks the coefficients of some features towards zero, potentially setting some to exactly zero. This helps to:

* Reduce model complexity and prevent overfitting.
* Feature selection: By driving coefficients to zero, it effectively removes irrelevant features from the model.

**Difference from Ridge Regularization:**

* Ridge regularization shrinks all coefficients towards zero but doesn't necessarily set them to zero.
* Lasso performs feature selection, while ridge focuses on reducing coefficient magnitudes.

**Use Lasso When:**

* You suspect some features might be irrelevant or redundant.
* Feature interpretability is important, and you want to identify the most important features.

**Q7. Regularization and Overfitting:**

Overfitting occurs when a model memorizes the training data too well and fails to generalize to unseen data. Regularization helps prevent overfitting by:

* **Penalizing model complexity:** By adding a penalty term to the cost function that increases with the magnitude of the coefficients, the model is discouraged from fitting too closely to the training data at the expense of higher variance.

**Example:** Imagine a model predicting exam grades based on hours studied (X1) and difficulty level (X2). Without regularization, it might overfit to random noise in the training data and not generalize well to unseen difficulty levels. Lasso regularization could set the coefficient for difficulty level (X2) to zero, effectively removing it from the model if it's not truly relevant.

**Q8. Limitations of Regularized Models:**

* **Tuning the regularization parameter:** The strength of the regularization (penalty term) needs to be carefully tuned to avoid underfitting (model is too simple) or overfitting.
* **Loss of information:** Lasso regularization might set important features to zero, potentially losing some information.
* Not a silver bullet: Regularization doesn't guarantee perfect performance, and other model selection techniques might be necessary.

## Choosing Evaluation Metrics (Q9)

**Choosing Between Model A (RMSE 10) and Model B (MAE 8):**

**It's difficult to definitively choose based solely on RMSE and MAE.** Here's why:

* **RMSE:** Sensitive to outliers. A single large error can significantly inflate the RMSE.
* **MAE:** Less sensitive to outliers, but doesn't consider the magnitude of errors. 

**Considerations:**

* **Distribution of errors:** If your errors have a long tail with many outliers, MAE might be a better choice.
* **Cost of large errors:** If large errors are particularly problematic in your application (e.g., predicting stock prices), RMSE might be more relevant.

**Limitations:**

Both metrics have limitations, and the best choice depends on the specific context and the cost function associated with errors in your application.

**Further Exploration:**

* **Visualize the distribution of errors:** This can help you understand if outliers are a concern.
* **Consider using both metrics:** Report both RMSE and MAE to get a more comprehensive picture of model performance.

## Choosing Regularization (Q10)

**Comparing Model A (Ridge, λ=0.1) and Model B (Lasso, λ=0.5):**

**Difficult to choose definitively without additional information.** Here's a breakdown:

* **Ridge regularization:** Reduces coefficient magnitudes but doesn't necessarily set them to zero.
* **Lasso regularization:** Can set coefficients to zero, potentially performing feature selection.

**Choosing Based on Context:**

* **Feature interpretability:** If understanding the most important features is crucial, Lasso might be better (assuming it doesn't eliminate relevant features).
* **Multicollinearity:** If features are highly correlated, Ridge might be preferable as it can reduce the impact of multicollinearity without discarding features entirely.

**Trade-offs and Limitations:**

* **Tuning regularization parameter:** Both require tuning the regularization parameter (λ) to avoid underfitting or overfitting.
* **Lasso might discard important features:** This can lead to a loss of information.
* **Regularization isn't a guarantee:** It helps prevent overfitting but doesn't ensure the best model. Explore other model selection techniques as well.

