## Ans : 1

R-squared, also known as the coefficient of determination, is a statistical measure used to assess the goodness of fit of a linear regression model. It represents the proportion of the variance in the dependent variable (target variable) that can be explained by the independent variables (predictor variables) in the model.

R-squared is calculated by dividing the explained sum of squares (SSR) by the total sum of squares (SST):

R-squared = SSR / SST

where SSR is the sum of squared differences between the predicted values and the mean of the dependent variable, and SST is the sum of squared differences between the actual values and the mean of the dependent variable.

R-squared ranges from 0 to 1, where 0 indicates that the model does not explain any of the variability in the dependent variable, and 1 indicates that the model explains all the variability.

## Ans : 2

Adjusted R-squared is a modified version of R-squared that takes into account the number of predictors in the model. While R-squared tends to increase with the addition of more predictors, it does not account for the potential increase in randomness or overfitting caused by adding irrelevant predictors.

Adjusted R-squared is calculated using the formula:

Adjusted R-squared = 1 - [(1 - R-squared) * (n - 1) / (n - p - 1)]

where n is the number of observations and p is the number of predictors in the model.

Adjusted R-squared penalizes the addition of unnecessary predictors by adjusting for the degrees of freedom. It generally provides a more accurate measure of the model's goodness of fit, especially when comparing models with different numbers of predictors.

## Ans : 3

Adjusted R-squared is more appropriate to use when comparing models with different numbers of predictors. It helps to avoid overfitting and provides a more realistic assessment of a model's predictive power. Adjusted R-squared takes into account the trade-off between the goodness of fit and the number of predictors used in the model.

If two models have similar R-squared values but differ in the number of predictors, the model with the higher adjusted R-squared is generally preferred. It indicates that the model explains a larger proportion of the variation in the dependent variable while considering the complexity introduced by additional predictors.

## Ans : 4

RMSE (Root Mean Squared Error), MSE (Mean Squared Error), and MAE (Mean Absolute Error) are commonly used evaluation metrics in regression analysis to measure the performance of a regression model.

RMSE is calculated as the square root of the average of the squared differences between the predicted values and the actual values:

RMSE = sqrt(MSE)

MSE is calculated as the average of the squared differences between the predicted values and the actual values:
    
MSE = (1/n) * Σ(predicted - actual)^2

MAE is calculated as the average of the absolute differences between the predicted values and the actual values:

MAE = (1/n) * Σ|predicted - actual|

RMSE represents the standard deviation of the residuals and provides a measure of the average magnitude of the errors made by the model. It is more sensitive to large errors compared to MSE and MAE.

MSE represents the average squared difference between the predicted and actual values. It is commonly used due to its mathematical properties, such as being differentiable and emphasizing larger errors through squaring.

MAE represents the average absolute difference between the predicted and actual values. It provides a measure of the average magnitude of the errors made by the model but does not emphasize large errors as much as RMSE.

## Ans : 5

Advantages of RMSE:

RMSE considers both the magnitude and direction of errors, making it suitable when large errors are critical.
It provides a measure of the dispersion of errors around the regression line.
RMSE is widely used and has well-established mathematical properties.
Disadvantages of RMSE:

RMSE is highly sensitive to outliers and large errors, which can dominate the overall evaluation.
Squaring the errors can give more weight to extreme errors, leading to overemphasis on outliers.

Advantages of MSE:

MSE is differentiable, which is advantageous for optimization algorithms.
It provides a measure of the average squared difference between predicted and actual values.
Disadvantages of MSE:

MSE is not on the same scale as the original data, making it difficult to interpret in the original units.
It magnifies the impact of large errors due to squaring.

Advantages of MAE:

MAE is less sensitive to outliers and large errors compared to RMSE and MSE.
It provides a measure of the average absolute difference between predicted and actual values.
Disadvantages of MAE:

MAE does not consider the direction of errors, treating all errors equally.
It may not reflect the overall model performance if both small and large errors are equally important.

The choice of evaluation metric depends on the specific context and requirements of the regression analysis. RMSE is commonly used when large errors are critical, while MAE is preferred when the magnitude of errors matters more than their direction. MSE is often used for mathematical convenience and when differentiability is important.

## Ans : 6

Lasso regularization, also known as L1 regularization, is a technique used to add a penalty term to the linear regression cost function to encourage sparse (zero) coefficients. It achieves this by adding the absolute values of the coefficients as a regularization term.

The Lasso regularization term is calculated as the sum of the absolute values of the coefficients multiplied by a regularization parameter (lambda or alpha):

Lasso regularization term = lambda * Σ|coefficient|

The cost function for Lasso regression is the sum of squared differences between the predicted values and the actual values, plus the Lasso regularization term.

Lasso regularization differs from Ridge regularization (L2 regularization) in the type of penalty applied. While Lasso adds the absolute values of the coefficients, Ridge adds the squared values of the coefficients as a regularization term.

Lasso regularization tends to drive some coefficients to exactly zero, effectively performing feature selection by eliminating irrelevant predictors. This makes Lasso particularly useful when dealing with high-dimensional datasets where feature selection is desired.

## Ans : 7

Regularized linear models, such as Ridge and Lasso regression, help prevent overfitting in machine learning by introducing a penalty term that discourages large coefficient values. By limiting the magnitude of the coefficients, these models reduce the complexity of the model and the potential for overfitting.

For example, let's consider a case where we have a dataset with 100 features (predictors) and 1000 observations. Without regularization, a linear regression model could potentially fit all 100 features, resulting in a complex model that might perform well on the training data but generalize poorly to new data.

By applying regularization, such as Ridge or Lasso regression, the models add a penalty term that shrinks the coefficients. This encourages the model to select the most important features and reduce the impact of irrelevant or noisy features. The regularization term effectively trades off between the goodness of fit and the complexity of the model.

In the case of Lasso regularization, it can drive some coefficients exactly to zero, effectively performing feature selection. This helps in identifying the most relevant features and simplifying the model, further reducing the risk of overfitting.

## Ans : 8

Regularized linear models have certain limitations that may make them not always the best choice for regression analysis:

Feature Interpretability: Regularization can shrink the coefficients, making it challenging to interpret the importance of individual features. When interpretability is crucial, traditional linear regression may be preferred.

Over-regularization: If the regularization parameter is set too high, it can lead to underfitting, where the model is too simple and fails to capture important relationships in the data.

Sensitivity to Scaling: Regularized linear models are sensitive to the scaling of the features. If the features are not properly scaled, some features with larger magnitudes may dominate the regularization process.

Non-linear Relationships: Regularized linear models assume a linear relationship between the predictors and the target variable. If the relationship is highly non-linear, other regression techniques, such as polynomial regression or tree-based models, may be more appropriate.

Large Dataset Requirement: Regularization techniques, especially Lasso, may struggle when the number of predictors is much larger than the number of observations. In such cases, specialized techniques like elastic net regression or dimensionality reduction methods might be more suitable.

Computational Complexity: Regularized linear models, particularly Lasso, can be computationally expensive for very large datasets or when dealing with a high number of predictors.

The choice of regression technique depends on the specific characteristics of the dataset, the goal of the analysis, and the trade-offs between interpretability, flexibility, and computational requirements.

## Ans : 9

Both RMSE and MAE are evaluation metrics used to measure the performance of regression models, but they capture different aspects of the errors.

In this case, Model A has an RMSE of 10, which represents the average magnitude of the errors squared, and Model B has an MAE of 8, which represents the average magnitude of the errors without squaring.

The choice of the better performer depends on the context and the specific requirements of the problem. If we want to penalize larger errors more and consider the magnitude and direction of errors, RMSE may be a suitable choice. However, if the magnitude of errors is more important, and we don't want to overly penalize large errors, MAE may be a better choice.

Considering the lower value of MAE for Model B (8), it suggests that, on average, Model B's predictions have a smaller absolute difference from the actual values compared to Model A. Therefore, if the magnitude of errors is more critical, Model B might be considered the better performer.

Limitations of the choice of metric:

The choice of RMSE or MAE depends on the specific problem and its requirements. There is no universally "correct" metric, and the choice should align with the problem's context and objectives.
If the dataset contains outliers or extreme errors, RMSE may be more sensitive to them due to squaring. In such cases, MAE might provide a more robust evaluation.
RMSE and MAE do not consider the direction of errors. In some scenarios, the direction of errors might be more important than their magnitude, and other metrics like Mean Percentage Error (MPE) or directional statistics might be more suitable.

## Ans : 10 

The choice of the better performer between Model A (Ridge regularization) and Model B (Lasso regularization) depends on the specific goals and requirements of the problem.

Ridge regularization (L2 regularization) adds the squared values of the coefficients as a penalty term, while Lasso regularization (L1 regularization) adds the absolute values of the coefficients.

When comparing the models, the regularization parameter also plays a crucial role. A higher regularization parameter places a stronger penalty on the coefficients, potentially shrinking them more.

In this case, Model A (Ridge regularization) has a lower regularization parameter of 0.1 compared to Model B (Lasso regularization) with a regularization parameter of 0.5.

The choice of the better performer depends on the importance of sparsity (zero coefficients) and the trade-offs between interpretability and model complexity.

If sparsity is not a critical requirement, Model A (Ridge regularization) might be preferred. Ridge regularization generally performs well when dealing with multicollinearity (highly correlated predictors) and when interpretability of the coefficients is important. It shrinks the coefficients towards zero but rarely makes them exactly zero.

On the other hand, if sparsity is desired and feature selection is crucial, Model B (Lasso regularization) might be the better performer. Lasso regularization can drive some coefficients to exactly zero, effectively performing feature selection. It is particularly useful when dealing with high-dimensional datasets where feature reduction is necessary.

Trade-offs and limitations:

Ridge regularization allows all predictors to be included in the model but with reduced impact, which can be advantageous for interpretation. However, it does not perform explicit feature selection.
Lasso regularization performs feature selection by driving some coefficients to zero, providing a more interpretable and sparse model. However, it may be sensitive to multicollinearity and might exclude potentially relevant predictors if they are correlated with other predictors.
The choice between Ridge and Lasso regularization depends on the specific dataset, the goal of the analysis, and the importance of interpretability and sparsity. It is recommended to experiment with different regularization methods and parameter values to find the best-performing model for a given problem.