## Q.1

R-squared in Linear Regression Models : 
R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variables in a regression model.

It measures the goodness of fit of the regression model.
Higher R-squared values indicate a better fit of the model to the data.

calculatio : 

R-squared = 1 - SSR/SST
where, 
SSR = sum of Square residial(error)
SST  = sum of square total

## Q.2

Adjusted R-squared

Adjusted R-squared is a modified version of the R-squared statistic that adjusts for the number of predictors in the model. It provides a more accurate measure of the goodness of fit, especially when multiple predictors are involved.


R-squared: Adding more predictors to the model can never decrease the R-squared value. This can lead to overfitting, where the model appears to have a better fit because it captures random noise in the data.

Adjusted R-squared: It can decrease if the added predictors do not improve the model. This helps to mitigate the risk of overfitting by penalizing the addition of unnecessary predictors.

R-squared > adjusted R-squared

## Q.3

Adjusted R-squared is more appropiate for feature selection

## Q.4


RMSE, MSE, and MAE in Regression Analysis
RMSE (Root Mean Squared Error), MSE (Mean Squared Error), and MAE (Mean Absolute Error) are common metrics used to evaluate the performance of regression models. These metrics provide insights into the accuracy of the predictions made by the model.

## Q.5

Advantages and disadvantages of MSE,RMSE,MAE

Advantages - 
MSE- 1. Equation is differenciable , 2. it has only one local and gloabl minima 
RMSE - 1.Equation is differenciable , 2. Always in same unit
MAE - 1.it is useful to robust  to outliers, 2. It will be in the same unit 

Disadvantages - 
MAE - 1.Not robust to outliers , 2. It is not  in the same unit 
RMSE- 1.  Not robust to outliers
MAE - 1. convergence usually take more time 

## Q.6 

Lasso (Least Absolute Shrinkage and Selection Operator) regularization is a technique used in regression analysis to enhance the prediction accuracy and interpretability of the model by adding a penalty term to the loss function. This penalty term is proportional to the sum of the absolute values of the coefficients.

Lasso regression usually used for Feature selection and on the other hand Ridge regression is used for reducing overfitting 

## Q.7 

How Regularized Linear Models Help Prevent Overfitting
Regularized linear models, such as Lasso and Ridge regression, help prevent overfitting by adding a penalty term to the loss function. This penalty discourages the model from fitting the noise in the training data too closely, which can lead to better generalization on new, unseen data.

When our model has high testing error is called overfitting


In [2]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import mean_squared_error

# Generate synthetic data
np.random.seed(0)
X = np.random.randn(100, 20)
true_coefs = np.random.randn(20)
y = X @ true_coefs + np.random.randn(100) * 0.5  # Adding noise

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Fit a linear regression model (without regularization)
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
y_pred_train = lin_reg.predict(X_train)
y_pred_test = lin_reg.predict(X_test)
print("Linear Regression - Training MSE:", mean_squared_error(y_train, y_pred_train))
print("Linear Regression - Test MSE:", mean_squared_error(y_test, y_pred_test))

# Fit a Ridge regression model (with L2 regularization)
ridge_reg = Ridge(alpha=1.0)
ridge_reg.fit(X_train, y_train)
y_pred_train_ridge = ridge_reg.predict(X_train)
y_pred_test_ridge = ridge_reg.predict(X_test)
print("Ridge Regression - Training MSE:", mean_squared_error(y_train, y_pred_train_ridge))
print("Ridge Regression - Test MSE:", mean_squared_error(y_test, y_pred_test_ridge))

# Fit a Lasso regression model (with L1 regularization)
lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X_train, y_train)
y_pred_train_lasso = lasso_reg.predict(X_train)
y_pred_test_lasso = lasso_reg.predict(X_test)
print("Lasso Regression - Training MSE:", mean_squared_error(y_train, y_pred_train_lasso))
print("Lasso Regression - Test MSE:", mean_squared_error(y_test, y_pred_test_lasso))


Linear Regression - Training MSE: 0.2185086272725248
Linear Regression - Test MSE: 0.2484655963354167
Ridge Regression - Training MSE: 0.22249463723800686
Ridge Regression - Test MSE: 0.25866081223030235
Lasso Regression - Training MSE: 0.4962445180930407
Lasso Regression - Test MSE: 0.8842904250904079


## Q.8 

Limitations of Regularized Linear Models
While regularized linear models such as Ridge and Lasso regression are powerful tools for mitigating overfitting and improving model generalization, they have certain limitations and may not always be the best choice for regression analysis. Here are some of the key limitations:

Assumption of Linearity:

Regularized linear models assume a linear relationship between the predictors and the response variable. If the true relationship is nonlinear, these models may not perform well, even with regularization.
Nonlinear patterns in the data can be better captured by other methods like polynomial regression, decision trees, or neural networks.
Sensitivity to the Choice of Regularization Parameter:

The performance of regularized models heavily depends on the choice of the regularization parameter (
𝜆
λ for Ridge and Lasso). Selecting the optimal value typically requires cross-validation, which can be computationally intensive.
Incorrect choice of 
𝜆
λ can lead to either underfitting (too much regularization) or overfitting (too little regularization).
Computational Complexity:

For very large datasets with high dimensionality, regularized models can become computationally expensive, particularly Lasso regression, which involves solving a complex optimization problem.
While algorithms like coordinate descent have made Lasso more tractable, the computational burden can still be significant for very large-scale problems.
Handling Multicollinearity:

While Ridge regression can mitigate the effects of multicollinearity by shrinking coefficients, it does not eliminate the problem entirely.
Lasso regression can perform variable selection and might drop some correlated predictors, but this can lead to instability in the model when predictors are highly correlated.
Feature Selection in Lasso:

Lasso’s feature selection can be both an advantage and a limitation. If the true model is not sparse (i.e., many predictors are relevant), Lasso might drop important variables, leading to underfitting.
Lasso can also be unstable when the number of predictors is much larger than the number of observations, as small changes in the data can lead to large changes in the selected model.
Interpretability:

While regularized models can simplify interpretation by reducing the number of predictors (in the case of Lasso), the resulting coefficients can still be difficult to interpret, especially when interactions or nonlinearities are present.
Other techniques like decision trees or rule-based models might provide more intuitive interpretations.
When Regularized Linear Models May Not Be the Best Choice
Nonlinear Relationships:

If the underlying relationship between predictors and the response variable is nonlinear, techniques such as polynomial regression, support vector machines, decision trees, or neural networks may be more appropriate.
Complex Interactions:

When there are complex interactions between predictors, methods that can model interactions explicitly, such as decision trees, random forests, or gradient boosting machines, might be more suitable.
High-Dimensional, Sparse Data:

In high-dimensional, sparse datasets (e.g., text data, genomic data), methods like support vector machines with appropriate kernels or specialized techniques like L1-regularized logistic regression for classification might be more effective.
Computational Constraints:

When computational resources are limited, simpler models or models with faster training times, like decision trees or certain ensemble methods, might be preferred.


## Q.9

Sensitivity to Outliers:

Model A (RMSE): If your problem is sensitive to larger errors and you want to penalize them more heavily, RMSE is more appropriate.
Model B (MAE): If you prefer a metric that provides a more balanced view of the average error and is less affected by outliers, MAE is more appropriate.
Nature of the Data:

If the data contains outliers or a few large errors, RMSE will reflect this more than MAE.
If the errors are more uniformly distributed, MAE might give a clearer picture of model performance.
Consistency:

Since the two models are evaluated using different metrics, a direct comparison is challenging. Ideally, both models should be evaluated using the same metric for a fair comparison.
Conclusion
Given the metrics provided:

Model B with MAE of 8 appears to have a lower average error compared to Model A with RMSE of 10. However, this comparison is not entirely fair without knowing the MAE for Model A and the RMSE for Model B.

## Q.10

Factors to Consider in Choosing the Better Model
Model Performance (Validation Metrics):

The ultimate decision should be based on model performance metrics (e.g., RMSE, MAE) on a validation set. These metrics provide a direct comparison of how well each model generalizes to unseen data.
Nature of the Data and Problem:

High Dimensionality and Feature Selection: If the dataset has many features, some of which are potentially irrelevant or redundant, Lasso (Model B) might be preferable due to its ability to perform feature selection by setting some coefficients to zero.
Multicollinearity: If the features are highly correlated, Ridge (Model A) might be better because it tends to handle multicollinearity by distributing the coefficients among the correlated variables.
Regularization Parameter (
𝜆
λ):

The choice of 
𝜆
λ affects the degree of regularization. Comparing a Ridge model with 
𝜆
=
0.1
λ=0.1 and a Lasso model with 
𝜆
=
0.5
λ=0.5 might not be entirely fair, as they represent different strengths of regularization. Ideally, one should perform cross-validation to find the optimal 
𝜆
λ for each model.
Interpretability:

If model interpretability is important and you need a sparse model where some coefficients are exactly zero, Lasso (Model B) is more appropriate.
If you are more concerned with minimizing overall prediction error and are less worried about the number of non-zero coefficients, Ridge (Model A) might be better.
Trade-offs and Limitations
Bias-Variance Trade-off:

Ridge tends to have lower variance but might introduce more bias compared to Lasso.
Lasso might introduce more variance, especially if it sets many coefficients to zero, but it can reduce bias by excluding irrelevant features.
Model Complexity:

Ridge maintains all features but shrinks their coefficients, leading to less drastic model simplification.
Lasso can lead to simpler models by eliminating some features, but this can also mean missing out on important features if 
𝜆
λ is not tuned properly.
Computational Cost:

Lasso can be computationally more intensive, especially for very high-dimensional data, due to the nature of the optimization problem.
Conclusion and Recommendation
To choose the better performer between Model A (Ridge) and Model B (Lasso), you should:

Evaluate Both Models on the Same Metric: Compare the performance metrics (e.g., RMSE, MAE) of both models on a validation set. This direct comparison is crucial for making an informed decision.
Consider Feature Selection Needs: If reducing the number of features is important for interpretability or reducing model complexity, Lasso might be the better choice.
Analyze Regularization Parameters: Ensure that the regularization parameters are optimally chosen for both models using cross-validation.
Practical Approach
Cross-validation: Perform cross-validation to find the optimal 
𝜆
λ for both Ridge and Lasso models.
Model Comparison: Compare the models using the same evaluation metrics on a validation set.
Contextual Considerations: Consider the nature of the data and the importance of feature selection or handling multicollinearity.
