

---

### ** Regression:**

**1. What is Simple Linear Regression?**  
Simple Linear Regression models the relationship between **two variables** — one independent (X) and one dependent (Y) — by fitting a straight line: **Y = mX + c**.

---



**2. What are the key assumptions of Simple Linear Regression?**  
- Linearity: Relationship between X and Y is linear.  
- Independence: Observations are independent.  
- Homoscedasticity: Constant variance of residuals.  
- Normality: Residuals are normally distributed.

---

**3. What does the coefficient m represent in the equation Y = mX + c?**  
The **slope (m)** represents the **change in Y** for a **one-unit change in X**.

---

**4. What does the intercept c represent in the equation Y = mX + c?**  
The **intercept (c)** is the **predicted value of Y when X = 0**.

---

**5. How do we calculate the slope m in Simple Linear Regression?**  
\[
m = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2}
\]

---

**6. What is the purpose of the least squares method in Simple Linear Regression?**  
It finds the line that **minimizes the sum of squared differences** between observed and predicted Y values (minimizes errors).

---

**7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?**  
R² measures the **percentage of the variance in Y** that is **explained by X**.  
- R² = 1 → perfect fit  
- R² = 0 → no relationship

---

### **Multiple Linear Regression:**

**8. What is Multiple Linear Regression?**  
Regression where **multiple independent variables** predict a **single dependent variable**.

---

**9. What is the main difference between Simple and Multiple Linear Regression?**  
- **Simple**: One predictor (X).  
- **Multiple**: Two or more predictors (X₁, X₂, ... Xn).

---

**10. What are the key assumptions of Multiple Linear Regression?**  
- Linearity  
- Independence of errors  
- Homoscedasticity  
- Normality of errors  
- No multicollinearity between predictors

---

**11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?**  
**Heteroscedasticity** means **non-constant variance** of residuals.  
It leads to **biased standard errors** → **incorrect p-values** → **wrong conclusions**.

---

**12. How can you improve a Multiple Linear Regression model with high multicollinearity?**  
- Remove highly correlated predictors  
- Combine predictors  
- Use Ridge or Lasso Regression  
- Apply Principal Component Analysis (PCA)

---

**13. What are some common techniques for transforming categorical variables for use in regression models?**  
- **One-hot encoding**  
- **Label encoding**  
- **Ordinal encoding** (for ordered categories)

---

**14. What is the role of interaction terms in Multiple Linear Regression?**  
They **capture the combined effect** of two or more variables that is **different from their individual effects**.

---

**15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?**  
- **Simple**: Value of Y when X = 0.  
- **Multiple**: Value of Y when **all predictors = 0** (may not always be meaningful).

---

**16. What is the significance of the slope in regression analysis, and how does it affect predictions?**  
The slope tells us how much the dependent variable **changes for a one-unit change** in the independent variable, **holding other variables constant** (in multiple regression).

---

**17. How does the intercept in a regression model provide context for the relationship between variables?**  
It sets the **baseline value** for Y when **all predictors are at their reference values (usually 0)**.

---

**18. What are the limitations of using R² as a sole measure of model performance?**  
- R² **always increases** with more variables (even if they are useless).  
- It doesn’t measure **causality** or **model quality**.  
- Better to check **Adjusted R²**.

---

**19. How would you interpret a large standard error for a regression coefficient?**  
It suggests the coefficient estimate is **unstable** and possibly **not statistically significant**.

---

**20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?**  
- Identified by a **funnel shape** or **pattern** in residual vs fitted plots.  
- Important because it **violates assumptions** and **biases standard errors**.

---

**21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?**  
It means **some predictors** are **not actually useful**, and **Adjusted R² penalizes** for those unnecessary variables.

---

**22. Why is it important to scale variables in Multiple Linear Regression?**  
- To ensure that predictors with large ranges don’t **dominate the model**.  
- Some algorithms assume **normalized input** (like Ridge or Lasso).

---

### **Polynomial Regression:**

**23. What is polynomial regression?**  
It models the relationship between X and Y as an **nth degree polynomial** (not just a straight line).

---

**24. How does polynomial regression differ from linear regression?**  
- **Linear Regression** fits a **straight line**.  
- **Polynomial Regression** fits a **curved line**.

---

**25. When is polynomial regression used?**  
When the relationship between X and Y is **non-linear**, but can still be modeled with a **polynomial curve**.

---

**26. What is the general equation for polynomial regression?**  
\[
Y = b_0 + b_1X + b_2X^2 + b_3X^3 + ... + b_nX^n
\]

---

**27. Can polynomial regression be applied to multiple variables?**  
Yes, it's called **multivariate polynomial regression**, and it includes terms like \( X_1^2, X_1X_2 \), etc.

---

**28. What are the limitations of polynomial regression?**  
- Can easily **overfit** data.  
- Higher degree polynomials become **unstable** outside the training range (extrapolation problem).

---

**29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?**  
- Cross-validation  
- Adjusted R²  
- AIC (Akaike Information Criterion) / BIC (Bayesian Information Criterion)

---

**30. Why is visualization important in polynomial regression?**  
To visually check if the **curve fits the data** well without **overfitting** or **underfitting**.

---

**31. How is polynomial regression implemented in Python?**  
Example using **scikit-learn**:
```python
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

# Example Data
X = [[1], [2], [3], [4], [5]]
y = [1, 4, 9, 16, 25]

# Transforming to polynomial features
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# Model training
model = LinearRegression()
model.fit(X_poly, y)

# Prediction
y_pred = model.predict(X_poly)
```

---

