Q1. Explain the concept of R-squared in linear regression models. How is it calculated, and what does it 
represent?
Ans:-R-squared (R²) is a statistical measure that represents the proportion of the variance in the dependent variable (the variable being predicted) that is explained by the independent variable(s) in a regression model. In the context of linear regression, R-squared indicates the goodness of fit of the model.

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Independent variable
y = np.array([2, 4, 5, 4, 5])  # Dependent variable

# Create a linear regression model
model = LinearRegression()

# Fit the model to the data
model.fit(X, y)

# Make predictions
y_pred = model.predict(X)

# Calculate R-squared
r_squared = r2_score(y, y_pred)

print("R-squared:", r_squared)


Q2. Define adjusted R-squared and explain how it differs from the regular R-squared. 
Ans:-Adjusted R-squared is a modification of the traditional R-squared (coefficient of determination) that accounts for the number of predictors (independent variables) in a regression model. While R-squared tells you the proportion of the variance in the dependent variable that is explained by the model, adjusted R-squared adjusts this value based on the number of predictors and provides a more accurate measure of a model's goodness of fit.
Key points about adjusted R-squared:

Penalty for Adding Variables: If adding a new variable to the model does not improve the fit significantly, the adjusted R-squared will decrease. This helps prevent overfitting by discouraging the inclusion of unnecessary variable.

Dependence on Sample Size and Variables: The adjusted R-squared depends on both the sample s
�
n) and the number of pred
�
k), providing a more nuanced evaluation of model perfrmance.

Range: Like R-squared, adjusted R-squared values range from 0 to 1. A higher adjusted R-squared indicates a better fit, but it considers the impact of model complexity.

In [None]:
import statsmodels.api as sm
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Independent variable
y = np.array([2, 4, 5, 4, 5])  # Dependent variable

# Add a constant term to the independent variable matrix
X_with_constant = sm.add_constant(X)

# Create a linear regression model using statsmodels
model = sm.OLS(y, X_with_constant).fit()

# Get the R-squared and the number of predictors
r_squared = model.rsquared
num_predictors = len(model.params) - 1  # Exclude the intercept term

# Calculate the adjusted R-squared
n = len(y)
adjusted_r_squared = 1 - ((1 - r_squared) * (n - 1) / (n - num_predictors - 1))

print("Adjusted R-squared:", adjusted_r_squared)


Q3. When is it more appropriate to use adjusted R-squared?
Ans:-Adjusted R-squared is more appropriate to use when comparing and evaluating models with different numbers of predictors (independent variables). It helps address the potential issue of overfitting and provides a more nuanced assessment of a model's goodness of fit by penalizing the inclusion of unnecessary variables.

Here are situations in which adjusted R-squared is particularly usefu:

Comparing Models: When you have multiple regression models with different numbers of predictors, comparing their adjusted R-squared values can help you determine which model provides the best balance between goodness of fit and model complexity. Adjusted R-squared accounts for the trade-off between adding more predictors and improving the it.

Variable Selection: In situations where you are considering adding or removing predictors from your model, adjusted R-squared can guide the variable selection process. It discourages the inclusion of variables that do not significantly contribute to improving th fit.

Preventing Overfitting: Overfitting occurs when a model captures noise in the training data rather than the underlying pattern. Adjusted R-squared helps mitigate overfitting by penalizing the model for including variables that don't contribute meaningfully to the explanation of the dependent variable.

In [None]:
import statsmodels.api as sm
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Independent variable
y = np.array([2, 4, 5, 4, 5])  # Dependent variable

# Add a constant term to the independent variable matrix
X_with_constant = sm.add_constant(X)

# Create two linear regression models using statsmodels with different numbers of predictors
model1 = sm.OLS(y, X_with_constant).fit()  # Model with one predictor
model2 = sm.OLS(y, sm.add_constant(np.column_stack((X, X**2)))).fit()  # Model with two predictors (quadratic term)

# Calculate R-squared and adjusted R-squared for each model
r_squared1 = model1.rsquared
adjusted_r_squared1 = model1.rsquared_adj

r_squared2 = model2.rsquared
adjusted_r_squared2 = model2.rsquared_adj

# Display the results
print("Model 1 - R-squared:", r_squared1, "Adjusted R-squared:", adjusted_r_squared1)
print("Model 2 - R-squared:", r_squared2, "Adjusted R-squared:", adjusted_r_squared2)


Q4. What are RMSE, MSE, and MAE in the context of regression analysis? How are these metrics 
calculated, and what do they represent?

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

# Sample data
y_actual = np.array([2, 4, 5, 4, 5])  # Actual values
y_predicted = np.array([2.5, 3.8, 4.2, 3.5, 5.1])  # Predicted values

# Mean Squared Error (MSE)
mse = mean_squared_error(y_actual, y_predicted)

# Root Mean Squared Error (RMSE)
rmse = np.sqrt(mse)

# Mean Absolute Error (MAE)
mae = mean_absolute_error(y_actual, y_predicted)

# Display the results
print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)
print("Mean Absolute Error (MAE):", mae)


Q5. Discuss the advantages and disadvantages of using RMSE, MSE, and MAE as evaluation metrics in 
regression analysis.
Ans:-Mean Squared Error (MSE):
Advantages

Sensitivity to Errors: Squaring the errors penalizes larger errors more, which might be desirable if large errors are considered more critical.
Mathematical Simplicity: The squared term makes the calculation of gradients and mathematical operations simpler.
Disadvantges:

Sensitivity to Outliers: MSE gives higher weight to outliers due to the squaring, which may not be suitable if your data contains extreme values.
Units: The units of MSE are the square of the units of the dependent variable, which might make interpretation difficult.
Root Mean Squared Error (RMSE):
Avantages:

Interpretability: RMSE is on the same scale as the dependent variable, making it easier to interpret compared to MSE.
Sensitivity to Errors: Like MSE, it penalizes larger errors more, providing a balance between sensitivity and interpretability.
isadvantages:

Sensitivity to Outliers: Similar to MSE, RMSE can be sensitive to outliers due to the squared term.
Units: Like MSE, RMSE inherits the square of the units of the dependent variable.
Mean Absolute Error MAE):
Advantages:

Robustness to Outliers: MAE is less sensitive to outliers since it does not square the errors.
Interpretability: MAE is in the same units as the dependent variable, making it easily interprtable.
Disadvantages:

Equal Weight to All Errors: MAE treats all errors equally, which may not be appropriate if certain errors are more critical than others.
Mathematical Complexity: The absolute value makes mathematical operations involving gradients less straightforward.

Q6. Explain the concept of Lasso regularization. How does it differ from Ridge regularization, and when is 
it more appropriate to use?
Ans:-Differences and When to Use Lasso:
Sparse Solutions: Lasso tends to produce sparse coefficient vectors by driving some coefficients to exactly zero. This makes it useful for feature selection, where irrelevant features are automatically excluded from the model

Feature Selection: If you suspect that only a small number of features are truly relevant in your regression model, Lasso is a good choie.

Less Tolerance for Outliers: Lasso is more sensitive to outliers than Ridge due to the use of the absolute values of coefficients.

In [None]:
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt

# Create a synthetic dataset
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create and fit a Lasso regression model
alpha = 0.1  # Regularization parameter
lasso_model = Lasso(alpha=alpha)
lasso_model.fit(X_train_scaled, y_train)

# Print the coefficients
print("Lasso Coefficients:", lasso_model.coef_)

# Plot the regression line
plt.scatter(X_test, y_test, label='Test Data')
plt.plot(X_test, lasso_model.predict(X_test_scaled), color='red', label='Lasso Regression')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.legend()
plt.show()


Q7. How do regularized linear models help to prevent overfitting in machine learning? Provide an 
example to illustrate.
Ans:-There are two common types of regularization for linear models: L1 regularization (Lasso) and L2 regularization (Ridge).

L1 Regularization (Lasso:

The penalty term is the sum of the absolute values of the coefficients.
Encourages sparse solutions by driving some coefficients to exactly zero.
Useful for feature selection.
L2 Regularization (idge):

The penalty term is the sum of the squared values of the coefficients.
Encourages smaller, but non-zero, coefficients.
Reduces the impact of individual features without necessarily excluding them entirely.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error

# Create synthetic data with a quadratic relationship
np.random.seed(42)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Polynomial regression without regularization (degree=15)
degree = 15
model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
model.fit(X_train, y_train)

# Polynomial regression with Lasso regularization (L1)
alpha_lasso = 0.01
lasso_model = make_pipeline(PolynomialFeatures(degree), Lasso(alpha=alpha_lasso))
lasso_model.fit(X_train, y_train)

# Polynomial regression with Ridge regularization (L2)
alpha_ridge = 0.01
ridge_model = make_pipeline(PolynomialFeatures(degree), Ridge(alpha=alpha_ridge))
ridge_model.fit(X_train, y_train)

# Plot the results
X_plot = np.arange(0, 5, 0.01)[:, np.newaxis]
plt.scatter(X, y, s=20, edgecolor="black", c="darkorange", label="data")
plt.plot(X_plot, model.predict(X_plot), color="cornflowerblue", label="Linear Regression")
plt.plot(X_plot, lasso_model.predict(X_plot), color="red", label="Lasso Regression")
plt.plot(X_plot, ridge_model.predict(X_plot), color="green", label="Ridge Regression")
plt.xlabel("data")
plt.ylabel("target")
plt.title("Polynomial Regression with Regularization")
plt.legend()
plt.show()


Q8. Discuss the limitations of regularized linear models and explain why they may not always be the best 
choice for regression analysis.
Ans:-While regularized linear models are powerful tools for preventing overfitting and addressing multicollinearity in regression analysis, they have some limitations that may make them less suitable in certain situations. Here are some of the key limitations:

Sensitivity to Scalin:

Regularized linear models are sensitive to the scale of the features. If features are not scaled properly, the regularization term may impact some features more than others, leading to biased coefficient estimates. It's essential to standardize or normalize features before applying regularization.
Inability to Handle Categorical Variables Diretly:

Regularization techniques are designed for numerical features, and they may not handle categorical variables directly. One-hot encoding or other preprocessing steps are often required to use regularized models with categorical variables.
Loss of Interpretbility:

The penalty terms in regularization, especially in L1 regularization (Lasso), may drive some coefficients to exactly zero. While this can be beneficial for feature selection, it also leads to a loss of interpretability, as certain features are effectively excluded from the model.
Model SelectionChallenge:

Choosing the appropriate regularization strength (alpha) can be challenging. If the regularization strength is too high, the model may underfit the data, while if it's too low, the model may overfit. Cross-validation is often used to find the optimal value for alpha, but this adds complexity to the model selection process.
Limited Effectiveness with Large Fature Spaces:

Regularization may be less effective in high-dimensional feature spaces, especially when the number of features is much larger than the number of observations. In such cases, other techniques like feature engineering or dimensionality reduction might be more appropriate.
Non-Smooth Obective Function:

The regularization terms lead to non-smooth objective functions, making optimization more challenging. This can affect the convergence speed of optimization algorithms, particularly in complex models or large datasets.
Potential oss of Information:

In some cases, aggressive regularization may lead to a significant reduction in model flexibility, potentially causing a loss of important information present in the data.
Asumption of Linearity:

Regularized linear models assume a linear relationship between the features and the target variable. If the true relationship is highly non-linear, other non-linear models might be more apprpriate.
Outliers Impact:

L1 regularization (Lasso) is sensitive to outliers, as the absolute values of the coefficients can be influenced by extreme data points. Outliers can disproportionately impact the regularization term and the resulting model.

Q9. You are comparing the performance of two regression models using different evaluation metrics. 
Model A has an RMSE of 10, while Model B has an MAE of 8. Which model would you choose as the better 
performer, and why? Are there any limitations to your choice of metric?