#Regression



---

**1. What is Simple Linear Regression?**

Simple Linear Regression is a statistical method that models the relationship between two variables: one independent (X) and one dependent (Y). It fits a straight line, $Y = mX + c$, to predict the value of Y from X.

---

**2. What are the key assumptions of Simple Linear Regression?**

Key assumptions include linearity, independence of errors, homoscedasticity (constant variance of errors), normal distribution of errors, and no significant outliers. Violating these assumptions can lead to misleading results or poor model performance.

---

**3. What does the coefficient m represent in the equation Y = mX + c?**

The coefficient **m** is the slope. It represents the change in the dependent variable **Y** for every one-unit increase in the independent variable **X**. It shows the strength and direction (positive/negative) of the relationship between X and Y.

---

**4. What does the intercept c represent in the equation Y = mX + c?**

The intercept **c** is the value of **Y** when **X = 0**. It represents the starting point of the line on the Y-axis. It gives context to the equation and may or may not have real-world interpretability depending on the dataset.

---

**5. How do we calculate the slope m in Simple Linear Regression?**

The slope **m** is calculated using:
$m = \frac{n(\sum XY) - (\sum X)(\sum Y)}{n(\sum X^2) - (\sum X)^2}$
It minimizes the sum of squared differences between actual and predicted values using the least squares method.

---

**6. What is the purpose of the least squares method in Simple Linear Regression?**

The least squares method minimizes the sum of the squares of the residuals (errors) between actual and predicted values. This gives the best-fitting line through the data and ensures accurate predictions with minimal total error.

---

**7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?**

R² measures how well the regression line fits the data. An R² of 1 means perfect fit, while 0 means no predictive value. It represents the proportion of variance in the dependent variable explained by the independent variable.


---

**8. What is Multiple Linear Regression?**

Multiple Linear Regression is a method that models the relationship between one dependent variable and two or more independent variables. The equation is:
$Y = b_0 + b_1X_1 + b_2X_2 + ... + b_nX_n$

---

**9. What is the main difference between Simple and Multiple Linear Regression?**

Simple Linear Regression uses one independent variable, while Multiple Linear Regression uses two or more. The latter can model more complex relationships and accounts for more influencing factors.

---

**10. What are the key assumptions of Multiple Linear Regression?**

Assumptions include: linear relationship, multivariate normality, no multicollinearity, homoscedasticity, and independence of errors. These ensure the model is reliable and interpretable.

---

**11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?**

Heteroscedasticity means non-constant variance of residuals. It violates model assumptions, can bias standard errors, and leads to unreliable confidence intervals and hypothesis tests.

---

**12. How can you improve a Multiple Linear Regression model with high multicollinearity?**

You can remove correlated variables, use dimensionality reduction techniques like PCA, or apply regularization methods such as Ridge or Lasso regression to reduce multicollinearity.

---

**13. What are some common techniques for transforming categorical variables for use in regression models?**

Common techniques include one-hot encoding, label encoding, and using dummy variables. These convert categorical data into numerical form suitable for regression models.

---

**14. What is the role of interaction terms in Multiple Linear Regression?**

Interaction terms allow modeling of the combined effect of two or more variables on the dependent variable. They help capture non-additive relationships that a basic linear model may miss.

---

**15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?**

In Simple Linear Regression, the intercept is the value of Y when X is 0. In Multiple Linear Regression, it’s the predicted Y when all independent variables are zero, which may or may not be meaningful.

---

**16. What is the significance of the slope in regression analysis, and how does it affect predictions?**

The slope shows the rate of change in the dependent variable for a unit change in the independent variable. A significant slope means the variable strongly influences the outcome.

---

**17. How does the intercept in a regression model provide context for the relationship between variables?**

The intercept anchors the regression line on the Y-axis. It provides a baseline prediction when all predictors are zero and helps contextualize the relationship between variables.

---

**18. What are the limitations of using R² as a sole measure of model performance?**

R² doesn’t show if the model is biased, doesn’t indicate overfitting, and doesn't work well for comparing different models. Adjusted R² and residual analysis give better insight.

---

**19. How would you interpret a large standard error for a regression coefficient?**

A large standard error suggests the coefficient estimate is unstable and may not be significantly different from zero, indicating less confidence in the variable’s effect on the outcome.

---

**20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?**

It’s identified by a funnel or pattern in the residual vs. fitted plot. Addressing it is important because it affects the validity of confidence intervals and significance tests.

---

**21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?**

It means the model may be overfitting. High R² can occur by adding variables, but adjusted R² penalizes irrelevant features, showing the true explanatory power of the model.

---

**22. Why is it important to scale variables in Multiple Linear Regression?**

Scaling helps when predictors are on different scales. It improves convergence for gradient-based optimizers and makes coefficients comparable, especially when regularization is used.

---

**23. What is polynomial regression?**

Polynomial regression models nonlinear relationships by adding powers of the independent variable to the model (e.g., X², X³). It fits curved lines rather than straight ones.

---

**24. How does polynomial regression differ from linear regression?**

Polynomial regression includes higher-degree terms of the predictors, allowing it to model non-linear trends, while linear regression only models straight-line relationships.

---

**25. When is polynomial regression used?**

It’s used when the data shows a nonlinear trend that a straight line can’t capture—like parabolas or U-shapes. It’s common in physics, economics, and real-world forecasting.

---

**26. What is the general equation for polynomial regression?**

The general form is:
$Y = b_0 + b_1X + b_2X^2 + b_3X^3 + ... + b_nX^n$
Where **n** is the degree of the polynomial.

---

**27. Can polynomial regression be applied to multiple variables?**

Yes, polynomial regression can be extended to multiple variables by including interaction and higher-order terms of each variable, but it increases model complexity and risk of overfitting.

---

**28. What are the limitations of polynomial regression?**

It’s prone to overfitting, especially with high-degree terms, sensitive to outliers, and less interpretable than linear models. Also, it may not generalize well outside the data range.

---

**29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?**

Use cross-validation, adjusted R², AIC/BIC, and residual plots. These help assess whether added complexity improves the model or just causes overfitting.

---

**30. Why is visualization important in polynomial regression?**

Visualization helps to understand the model fit, detect overfitting or underfitting, identify patterns and residual trends, and communicate findings effectively to stakeholders.

---

**31. How is polynomial regression implemented in Python?**

Using `PolynomialFeatures` from `sklearn.preprocessing` to transform input data, then applying `LinearRegression` or another regression model on the transformed data.

---



