### 1. What is Simple Linear Regression?

Simple Linear Regression is a statistical method used to model the relationship between two variables:

* **Independent variable (X)**: the predictor.
* **Dependent variable (Y)**: the outcome.
  It finds the best straight line (Y = mX + c) that predicts Y based on X.

---

### 2. What are the key assumptions of Simple Linear Regression?

* There is a **linear relationship** between X and Y.
* The residuals (errors) are **normally distributed**.
* The residuals have **constant variance** (homoscedasticity).
* Observations are **independent**.

---

### 3. What does the coefficient m represent in the equation Y = mX + c?

* **m is the slope.**
  It shows how much Y changes when X increases by one unit.

---

### 4. What does the intercept c represent in the equation Y = mX + c?

* **c is the intercept.**
  It is the value of Y when X is 0. It tells where the line crosses the Y-axis.

---

### 5. How do we calculate the slope m in Simple Linear Regression?

$m = \frac{n(\sum XY) - (\sum X)(\sum Y)}{n(\sum X^2) - (\sum X)^2}$
It’s calculated using the formula based on the data points.

---

### 6. What is the purpose of the least squares method in Simple Linear Regression?

It minimizes the **sum of squared errors** (difference between actual and predicted Y) to find the best-fitting line.

---

### 7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?

R² shows **how well the model explains the variation** in Y.

* R² = 1 → Perfect fit
* R² = 0 → Model does not explain the variability

---

### 8. What is Multiple Linear Regression?

It is a regression model with **two or more independent variables** used to predict the dependent variable.

---

### 9. What is the main difference between Simple and Multiple Linear Regression?

* **Simple Linear Regression:** One independent variable.
* **Multiple Linear Regression:** Two or more independent variables.

---

### 10. What are the key assumptions of Multiple Linear Regression?

* Linear relationship
* Independence of errors
* Homoscedasticity (constant variance)
* No multicollinearity
* Errors are normally distributed

---

### 11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?

Heteroscedasticity means the **variance of residuals is not constant**.
It can lead to **unreliable predictions and incorrect standard errors**.

---

### 12. How can you improve a Multiple Linear Regression model with high multicollinearity?

* Remove correlated variables
* Use **Principal Component Analysis (PCA)**
* Standardize variables
* Combine similar predictors

---

### 13. What are some common techniques for transforming categorical variables for use in regression models?

* **One-Hot Encoding:** Create binary columns.
* **Label Encoding:** Assign numeric labels.
* **Dummy Variables:** Create (0,1) variables for each category.

---

### 14. What is the role of interaction terms in Multiple Linear Regression?

Interaction terms show **how the effect of one variable changes depending on another variable**. It allows us to model more complex relationships.

---

### 15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?

* In Simple Regression: Intercept is Y when X = 0.
* In Multiple Regression: Intercept is Y when all X variables = 0 (which might not always make real-world sense).

---

### 16. What is the significance of the slope in regression analysis, and how does it affect predictions?

The slope shows the **rate of change of Y for each unit change in X.** It tells the direction and strength of the relationship.

---

### 17. How does the intercept in a regression model provide context for the relationship between variables?

It gives a **starting point** for Y when all X variables are zero. Sometimes, the intercept has meaningful interpretation, sometimes not.

---

### 18. What are the limitations of using R² as a sole measure of model performance?

* It doesn’t tell if the model is good for prediction.
* It increases with more variables, even if they are not useful.
* It doesn’t detect overfitting.

---

### 19. How would you interpret a large standard error for a regression coefficient?

It means the **coefficient estimate is not precise.**
The variable may not be a reliable predictor.

---

### 20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?

In residual plots:

* Heteroscedasticity → spread of residuals increases or decreases with X.
* Important to fix because it can make test statistics unreliable.

---

### 21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?

It means the model might have **unnecessary variables.** Adjusted R² penalizes adding useless predictors.

---

### 22. Why is it important to scale variables in Multiple Linear Regression?

Scaling ensures that all variables contribute equally to the analysis, especially when they have **different units or ranges.**

---

### 23. What is polynomial regression?

It is a regression where the relationship between X and Y is **curved** (not a straight line). The equation includes powers of X.

---

### 24. How does polynomial regression differ from linear regression?

* Linear Regression: Straight-line relationship.
* Polynomial Regression: Curved relationship using X², X³, etc.

---

### 25. When is polynomial regression used?

When data shows a **non-linear pattern** that cannot be captured by a straight line.

---

### 26. What is the general equation for polynomial regression?

Y = b₀ + b₁X + b₂X² + b₃X³ + ... + bₙXⁿ

---

### 27. Can polynomial regression be applied to multiple variables?

Yes, it’s called **Multivariate Polynomial Regression** where each variable can have polynomial terms.

---

### 28. What are the limitations of polynomial regression?

* Prone to **overfitting** with high degrees.
* Complex to interpret.
* Can become unstable outside the data range.

---

### 29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?

* **Cross-Validation**
* **Adjusted R²**
* **Residual Analysis**
* **AIC / BIC scores** (model selection criteria)

---

### 30. Why is visualization important in polynomial regression?

Visualization helps:

* Understand if the curve fits well.
* Detect overfitting.
* Show the relationship between X and Y clearly.

---

### 31. How is polynomial regression implemented in Python?

Example using `numpy` and `sklearn`:

```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# Example: Degree 2 polynomial
model = make_pipeline(PolynomialFeatures(2), LinearRegression())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
```
