# Regression Theory Questions

1. What is Simple Linear Regression?
  - Simple Linear Regression is a statistical technique used to model the relationship between one independent variable and one dependent variable. It assumes this relationship can be represented by a straight line. The method helps in predicting the value of the dependent variable based on the independent variable.

2. What are the key assumptions of Simple Linear Regression?
  - Simple Linear Regression assumes that the relationship between variables is linear, the errors are independent, and the variance of errors is constant. It also assumes that the residuals are normally distributed. These assumptions help ensure reliable and valid predictions.

3. What does the coefficient m represent in Y = mX + c?
  - The coefficient m represents the slope of the regression line. It indicates how much the dependent variable Y changes when the independent variable X increases by one unit. It shows the direction and strength of the relationship.

4. What does the intercept c represent in Y = mX + c?
  - The intercept c represents the value of Y when X is zero. It shows where the regression line crosses the Y-axis. This provides a starting point for the relationship between the variables.

5. How do we calculate the slope m in Simple Linear Regression?
  - The slope m is calculated using the covariance between X and Y divided by the variance of X. This calculation finds how changes in X are associated with changes in Y. It gives the best-fitting line through the data points.

6. What is the purpose of the least squares method?
  - The least squares method is used to find the best-fitting regression line by minimizing the sum of squared differences between observed and predicted values. It ensures that the regression line is as close as possible to the data points.  

7. How is R² interpreted?
  - R², also called the coefficient of determination, measures how much of the variation in the dependent variable is explained by the independent variable. A higher R² value means the model explains more of the data variation.

8. What is Multiple Linear Regression?
  - Multiple Linear Regression is a technique used to model the relationship between one dependent variable and two or more independent variables. It helps analyze how several factors together influence the outcome.

9. What is the main difference between Simple and Multiple Linear Regression?
  - Simple Linear Regression uses only one independent variable to predict the dependent variable, while Multiple Linear Regression uses two or more independent variables. This makes Multiple Regression more suitable for complex real-world problems.

10. What are the key assumptions of Multiple Linear Regression?
  - Multiple Linear Regression assumes linear relationships between variables, no multicollinearity among predictors, constant variance of errors, independence of observations, and normally distributed residuals. These assumptions ensure accurate and unbiased estimates.

11. What is heteroscedasticity?
  - Heteroscedasticity occurs when the variability of errors is not constant across all levels of the independent variables. It can lead to inefficient and biased estimates, reducing the reliability of the regression results.

12. How can multicollinearity be reduced?
  - Multicollinearity can be reduced by removing or combining highly correlated independent variables. Using techniques such as ridge regression or principal component analysis can also help improve the model.

13. How are categorical variables used in regression?
  - Categorical variables are transformed into numerical form using techniques like dummy coding or one-hot encoding. This allows regression models to include non-numeric data.  

14. What is the role of interaction terms?
  - Interaction terms allow the effect of one independent variable to depend on another variable. They help capture more complex relationships between predictors and the dependent variable.

15. How does the interpretation of the intercept differ?
  - In Simple Linear Regression, the intercept is the value of Y when X is zero. In Multiple Linear Regression, it represents the value of Y when all independent variables are zero, which may or may not be meaningful.

16. What is the significance of the slope?
  - The slope indicates the rate at which the dependent variable changes with the independent variable. It plays a key role in making predictions and understanding the relationship between variables.    

17. How does the intercept provide context?
  - The intercept provides a baseline level of the dependent variable when all predictors are zero. It helps in understanding where the regression line starts.

18. What are the limitations of R²?
  - R² does not indicate whether the model is correct or whether important variables are missing. It always increases when more variables are added, even if they do not improve the model.

19. How do you interpret a large standard error?
  - A large standard error means the estimated coefficient is not very precise. This indicates uncertainty and reduces confidence in the reliability of the coefficient.

20. How is heteroscedasticity identified?
  - Heteroscedasticity can be seen in residual plots as a funnel or uneven spread of errors. It is important to address it because it affects the accuracy of hypothesis tests and confidence intervals.  

21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?
  - If a model has a high R² but a low adjusted R², it means that although the model appears to fit the data well, many of the variables may not actually be useful. Adjusted R² penalizes unnecessary predictors, so a low value indicates overfitting. This suggests that some independent variables do not significantly contribute to explaining the dependent variable.

22. Why is it important to scale variables in Multiple Linear Regression?
  - Scaling is important because variables may have different units and ranges, which can affect the stability and interpretation of the model. It helps algorithms converge faster and improves numerical accuracy. Scaling is especially useful when regularization methods like ridge or lasso regression are used.

23. What is polynomial regression?
  - Polynomial regression is a type of regression where the relationship between the independent and dependent variables is modeled as a polynomial equation. It is used when data shows a curved or non-linear trend rather than a straight line.  

24. How does polynomial regression differ from linear regression?
  - Linear regression fits a straight line to the data, while polynomial regression fits a curved line. Polynomial regression includes higher-order terms such as X², X³, and so on to better model complex patterns.

25. When is polynomial regression used?
  - Polynomial regression is used when the relationship between variables is non-linear but still smooth and continuous. It is especially helpful when data trends form curves instead of straight lines.  

26. What is the general equation for polynomial regression?

  - The general equation is Y=b
0
+b
1
X+b
2
X
2
+b
3
X
3
+⋯+b
n
X
n
. Here, n represents the degree of the polynomial, and the coefficients determine the shape of the curve.

27. Can polynomial regression be applied to multiple variables?
  - Yes, polynomial regression can be extended to multiple variables by including polynomial and interaction terms for each variable. This allows the model to capture more complex relationships among predictors.

28. What are the limitations of polynomial regression?
  - Polynomial regression can easily overfit the data, especially with a high-degree polynomial. It is also sensitive to outliers and may perform poorly when predicting outside the range of the data.

29. How can we evaluate model fit when selecting the degree of a polynomial?
  - Model fit can be evaluated using methods such as R², adjusted R², cross-validation, and mean squared error. These techniques help determine whether the polynomial degree is too low or too high.

30. Why is visualization important in polynomial regression?
  - Visualization helps in understanding how well the polynomial curve fits the data. It allows us to see overfitting, underfitting, and the overall pattern of the relationship between variables.

31. How is polynomial regression implemented in Python?
  - Polynomial regression in Python is typically implemented using libraries such as NumPy and scikit-learn. The PolynomialFeatures class is used to generate polynomial terms, and then a linear regression model is applied to fit the data.  