
**1. What is Simple Linear Regression?**
A statistical method used to model the relationship between one independent variable (X) and one dependent variable (Y) using a straight line:
**Y = mX + c**

---

**2. What are the key assumptions of Simple Linear Regression?**

* Linearity
* Independence of errors
* Homoscedasticity (constant variance of errors)
* Normality of residuals
* No significant outliers

---

**3. What does the coefficient m represent in the equation Y = mX + c?**
It is the **slope**, indicating how much Y changes for a one-unit increase in X.

---

**4. What does the intercept c represent in the equation Y = mX + c?**
It is the value of Y when X is zero; the point where the line intersects the Y-axis.

---

**5. How do we calculate the slope m in Simple Linear Regression?**

$$
m = \frac{\sum{(X_i - \bar{X})(Y_i - \bar{Y})}}{\sum{(X_i - \bar{X})^2}}
$$

---

**6. What is the purpose of the least squares method in Simple Linear Regression?**
To find the best-fit line by minimizing the sum of squared residuals (errors between actual and predicted values).

---

**7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?**
R² shows the proportion of variance in Y explained by X.
For example, **R² = 0.80** means 80% of the variation in Y is explained by X.

---


**8. What is Multiple Linear Regression?**
A regression model involving **more than one** independent variable to predict a dependent variable:
**Y = b₀ + b₁X₁ + b₂X₂ + ... + bₙXₙ**

---

**9. What is the main difference between Simple and Multiple Linear Regression?**
Simple uses **one** predictor; multiple uses **two or more** predictors.

---

**10. What are the key assumptions of Multiple Linear Regression?**

* Linearity
* No multicollinearity
* Homoscedasticity
* Independence
* Normality of residuals

---

**11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?**
It means the **variance of errors is not constant**. It leads to inefficient estimates and incorrect significance tests.

---

**12. How can you improve a Multiple Linear Regression model with high multicollinearity?**

* Remove correlated predictors
* Use PCA (Principal Component Analysis)
* Apply **Ridge or Lasso Regression**

---

**13. What are some common techniques for transforming categorical variables for use in regression models?**

* **Label Encoding**
* **One-Hot Encoding**
* **Dummy Variables**

---

**14. What is the role of interaction terms in Multiple Linear Regression?**
They model the combined effect of two or more variables, capturing relationships like:
**Y = b₀ + b₁X₁ + b₂X₂ + b₃(X₁\*X₂)**

---

**15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?**
In simple regression: value of Y when X = 0.
In multiple regression: Y when **all X’s = 0**, which may not be meaningful.

---

**16. What is the significance of the slope in regression analysis, and how does it affect predictions?**
It shows the rate of change in Y for a unit increase in X. A significant slope means X has predictive power.

---

**17. How does the intercept in a regression model provide context for the relationship between variables?**
It gives the **baseline value** of Y when all predictors are zero, helping understand the offset in predictions.

---

**18. What are the limitations of using R² as a sole measure of model performance?**

* R² always increases with more variables
* It doesn’t detect overfitting
* Doesn’t assess prediction error

---

**19. How would you interpret a large standard error for a regression coefficient?**
It suggests **low precision** in estimating that coefficient; predictor may be weak or noisy.

---

**20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?**
Look for **funnel shapes** in residual vs. predicted plots. It can invalidate p-values and lead to biased models.

---

**21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?**
It suggests **overfitting**; added variables do not meaningfully improve the model.

---

**22. Why is it important to scale variables in Multiple Linear Regression?**
To ensure equal importance across features and improve convergence in optimization (especially for regularized models).

---

**23. What is polynomial regression?**
A type of regression where the relationship between the independent variable and the dependent variable is modeled as an **nth-degree polynomial**.

---

**24. How does polynomial regression differ from linear regression?**
Polynomial regression can model **non-linear** curves, while linear regression fits a **straight line**.

---

**25. When is polynomial regression used?**
When the data shows a **curved or non-linear** relationship between variables.

---

**26. What is the general equation for polynomial regression?**

$$
Y = b_0 + b_1X + b_2X^2 + b_3X^3 + ... + b_nX^n
$$

---

**27. Can polynomial regression be applied to multiple variables?**
Yes, it's called **Multiple Polynomial Regression**, using terms like X₁², X₁X₂, etc.

---

**28. What are the limitations of polynomial regression?**

* Risk of **overfitting** with high degrees
* **Poor extrapolation** beyond training data
* **Hard to interpret** higher-degree terms

---

**29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?**

* **Cross-validation**
* **Adjusted R²**
* **AIC/BIC scores**
* **Residual analysis**

---

**30. Why is visualization important in polynomial regression?**
It helps understand the **fit of the curve**, detect **overfitting**, and communicate model behavior.

---

**31. How is polynomial regression implemented in Python?**

```python
from sklearn.preprocessing import PolynomialFeatures  
from sklearn.linear_model import LinearRegression  
from sklearn.pipeline import make_pipeline

# Example:
model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
---

