# **Theoretical Questions**


1. What is Simple Linear Regression?
* Simple Linear Regression models the relationship between two variables by fitting a straight line (Y = mX + c) that best predicts the dependent variable Y from the independent variable X.

2. What are the key assumptions of Simple Linear Regression?
* The key assumptions are linearity, independence of observations, homoscedasticity (constant variance of residuals), normality of residuals, and no perfect multicollinearity.

3. What does the coefficient m represent in the equation Y = mX + c?
* The coefficient m represents the slope of the regression line, indicating the expected change in Y for a one-unit increase in X.

4. What does the intercept c represent in the equation Y = mX + c?
* The intercept c represents the value of Y when X equals zero, providing the baseline level of the dependent variable.

5. How do we calculate the slope m in Simple Linear Regression?
* The slope m is calculated using the least squares formula m = Σ[(Xi – X̄)(Yi – Ȳ)] / Σ[(Xi – X̄)²], which minimizes the sum of squared residuals.

6. What is the purpose of the least squares method in Simple Linear Regression?
* The least squares method finds the line parameters (m and c) that minimize the sum of squared differences between observed and predicted Y values, ensuring the best fit.

7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?
* R² measures the proportion of variance in Y explained by X, ranging from 0 to 1; a higher R² indicates a better model fit.

8. What is Multiple Linear Regression?
* Multiple Linear Regression models the relationship between one dependent variable and two or more independent variables using a linear equation: Y = b0 + b1X1 + b2X2 + ... + bnXn.

9. What is the main difference between Simple and Multiple Linear Regression?
* Simple Linear Regression involves one predictor variable, while Multiple Linear Regression includes multiple predictor variables to explain variance in the dependent variable.

10. What are the key assumptions of Multiple Linear Regression?
* The assumptions are linearity, independence of errors, homoscedasticity, normality of residuals, and low multicollinearity among predictors.

11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?
* Heteroscedasticity occurs when residuals have non-constant variance, leading to inefficient parameter estimates and invalid standard errors, which can distort hypothesis tests.

12. How can you improve a Multiple Linear Regression model with high multicollinearity?
* To address multicollinearity, you can remove or combine correlated predictors, apply regularization methods like Ridge or Lasso, or use dimensionality reduction (e.g., PCA).

13. What are some common techniques for transforming categorical variables for use in regression models?
* Techniques include one-hot encoding, label encoding, and target/frequency encoding, chosen based on the model and cardinality of the feature.

14. What is the role of interaction terms in Multiple Linear Regression?
* Interaction terms model the combined effect of two predictors on the response, capturing non-additive relationships when the effect of one variable depends on another.

15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?
* In Multiple Regression, the intercept represents the expected Y when all predictors are zero, which may not be meaningful if zero is outside the observed range.

16. What is the significance of the slope in regression analysis, and how does it affect predictions?
* The slope quantifies the rate of change in the dependent variable for a unit change in the predictor, directly influencing predictions.

17. How does the intercept in a regression model provide context for the relationship between variables?
* The intercept sets the baseline from which the effect of predictors is measured, anchoring the regression line in the data space.

18. What are the limitations of using R² as a sole measure of model performance?
* R² does not penalize model complexity and always increases with more predictors; adjusted R², AIC, and BIC are used to account for complexity.

19. How would you interpret a large standard error for a regression coefficient?
* A large standard error indicates high uncertainty in the coefficient estimate, suggesting the predictor may not significantly contribute to the model.

20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?
* Heteroscedasticity appears as a funnel shape in residual vs. fitted plots; addressing it is crucial because it violates model assumptions and invalidates inference.

21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?
* It suggests that added predictors did not significantly improve model fit after accounting for the number of variables, indicating possible overfitting.

22. Why is it important to scale variables in Multiple Linear Regression?
* Scaling ensures predictors contribute equally, improves numerical stability, and is essential when using regularization.

23. What is polynomial regression?
* Polynomial regression models nonlinear relationships by adding polynomial terms of predictors to a linear model.

24. How does polynomial regression differ from linear regression?
* Polynomial regression captures curved trends by including higher-degree terms, whereas linear regression fits straight lines.

25. When is polynomial regression used?
* It is used when data exhibit nonlinear patterns that cannot be captured by a straight line.

26. What is the general equation for polynomial regression?
* Y = b0 + b1X + b2X² + ... + bdX^d, where d is the polynomial degree.

27. Can polynomial regression be applied to multiple variables?
* Yes, by including polynomial terms for each predictor and their interactions.

28. What are the limitations of polynomial regression?
* It can overfit with high-degree terms, is sensitive to outliers, and may exhibit oscillations at data extremes.

29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?
* Cross-validation, adjusted R², AIC, BIC, and evaluating residual plots to balance bias and variance.

30. Why is visualization important in polynomial regression?
* Visualization helps assess fit, identify overfitting, and understand the shape of relationships.

31. How is polynomial regression implemented in Python?
* Use `PolynomialFeatures` from `sklearn.preprocessing` to expand X, then fit a `LinearRegression` model on the transformed features.
