

1. **What is Simple Linear Regression?**
   A regression method that models the relationship between one independent variable (**X**) and one dependent variable (**Y**) using a straight line.

2. **Key assumptions of Simple Linear Regression:**

   * Linearity between X and Y
   * Independence of residuals
   * Homoscedasticity (constant variance of residuals)
   * Normality of residuals

3. **What does the coefficient `m` represent in `Y = mX + c`?**
   It is the **slope** of the line — change in Y for a one-unit increase in X.

4. **What does the intercept `c` represent in `Y = mX + c`?**
   The value of Y when X = 0.

5. **How do we calculate the slope `m` in Simple Linear Regression?**

   $$
   m = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{\sum{(X_i - \bar{X})^2}}
   $$

6. **Purpose of the least squares method in SLR:**
   To minimize the **sum of squared residuals** (errors) between observed and predicted Y values.

7. **How is R² interpreted in SLR?**
   Proportion of variance in Y explained by X.
   R² = 0.85 → 85% of variation in Y is explained by X.

---



8. **What is Multiple Linear Regression?**
   A regression model where the dependent variable is predicted using **two or more** independent variables.

9. **Main difference between SLR and MLR:**
   SLR uses **one** predictor; MLR uses **multiple** predictors.

10. **Key assumptions of MLR:**

    * Linearity
    * Independence
    * Homoscedasticity
    * No multicollinearity
    * Normal distribution of errors

11. **What is heteroscedasticity?**
    Unequal spread of residuals across the range of predictors. It violates model assumptions and affects standard errors and inference.

12. **How to improve MLR with high multicollinearity:**

    * Remove/reduce correlated features
    * Use **Ridge/Lasso regression**
    * Use **PCA** or **feature selection**

13. **Common techniques for transforming categorical variables:**

    * **One-hot encoding**
    * **Label encoding**
    * **Ordinal encoding** (if categories have order)

14. **Role of interaction terms in MLR:**
    To capture combined effects of variables (e.g., X1 \* X2) that may influence the outcome differently when used together.

15. **Interpretation of intercept (SLR vs MLR):**

    * SLR: Y when X = 0
    * MLR: Y when **all X variables = 0** (may not be meaningful if unrealistic)

16. **Significance of slope:**
    It quantifies the relationship — **how much Y changes** per unit increase in X (holding others constant in MLR).

17. **How intercept provides context:**
    It anchors the line; helps define the baseline value of Y when all predictors are zero.

18. **Limitations of R²:**

    * Doesn’t indicate causation
    * Increases with more variables (even irrelevant ones)
    * Doesn’t assess model generalization

19. **Interpret large standard error for a coefficient:**
    The estimate is unstable or imprecise, possibly due to **multicollinearity** or **insufficient data**.

20. **Identify heteroscedasticity in residual plots:**
    Look for **funnel shapes** (residuals fan out). It’s important to address as it can lead to **biased standard errors**.

21. **High R² but low adjusted R²:**
    The model may be **overfitting** with too many non-informative predictors.

22. **Why scale variables in MLR:**

    * Helps with **interpretability**
    * Necessary for **regularization** (e.g., Ridge, Lasso)
    * Avoids bias toward variables with larger ranges

---



23. **What is Polynomial Regression?**
    Regression that models a nonlinear relationship by introducing polynomial terms (e.g., X², X³).

24. **Difference from linear regression:**
    Linear regression fits a straight line, polynomial regression fits a **curved line**.

25. **When to use polynomial regression:**
    When the data shows a **non-linear trend** that can't be captured by a straight line.

26. **General equation for polynomial regression:**

    $$
    Y = b_0 + b_1X + b_2X^2 + b_3X^3 + \dots + b_nX^n
    $$

27. **Can it be applied to multiple variables?**
    Yes, this becomes **Multivariate Polynomial Regression** (e.g., terms like $X_1^2, X_1X_2$).

28. **Limitations of polynomial regression:**

    * Overfitting (especially at high degrees)
    * Poor extrapolation
    * Sensitive to outliers

29. **Model fit methods for selecting polynomial degree:**

    * **Adjusted R²**
    * **Cross-validation (CV)**
    * **AIC/BIC (information criteria)**
    * **Residual plots**

30. **Why visualization is important in polynomial regression:**
    To verify **fit quality**, detect **overfitting**, and understand **nonlinear trends**.

31. **How to implement polynomial regression in Python:**

```python
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# Sample implementation for 3rd degree
model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
```

---


