# Regression


### 1. What is Simple Linear Regression?
Simple Linear Regression is a statistical method that models the relationship between a dependent variable and one independent variable using a straight line.

---

### 2. What are the key assumptions of Simple Linear Regression?
- Linearity
- Independence of errors
- Homoscedasticity (constant variance of errors)
- Normal distribution of residuals

---

### 3. What does the coefficient m represent in the equation Y = mX + c?
The coefficient **m** represents the **slope** of the line, indicating the change in Y for a one-unit change in X.

---

### 4. What does the intercept c represent in the equation Y = mX + c?
The intercept **c** represents the value of Y when X is 0.

---

### 5. How do we calculate the slope m in Simple Linear Regression?
The slope is calculated as:  
**m = Σ[(Xi - X̄)(Yi - Ȳ)] / Σ[(Xi - X̄)²]**

---

### 6. What is the purpose of the least squares method in Simple Linear Regression?
The least squares method minimizes the sum of the squared differences between observed and predicted values, providing the best-fitting line.

---

### 7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?
R² indicates the proportion of the variance in the dependent variable explained by the independent variable. It ranges from 0 to 1.

---

### 8. What is Multiple Linear Regression?
Multiple Linear Regression models the relationship between one dependent variable and two or more independent variables.

---

### 9. What is the main difference between Simple and Multiple Linear Regression?
Simple Linear Regression uses one independent variable, while Multiple Linear Regression uses multiple independent variables.

---

### 10. What are the key assumptions of Multiple Linear Regression?
- Linearity
- Independence of errors
- Homoscedasticity
- No multicollinearity
- Normal distribution of residuals

---

### 11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?
Heteroscedasticity refers to unequal error variances. It can lead to inefficient estimates and unreliable hypothesis tests.

---

### 12. How can you improve a Multiple Linear Regression model with high multicollinearity?
- Remove or combine correlated predictors
- Use dimensionality reduction (e.g., PCA)
- Apply regularization techniques (e.g., Ridge or Lasso)

---

### 13. What are some common techniques for transforming categorical variables for use in regression models?
- One-hot encoding
- Label encoding
- Dummy variables

---

### 14. What is the role of interaction terms in Multiple Linear Regression?
Interaction terms model the combined effect of two or more variables, capturing relationships that aren't purely additive.

---

### 15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?
In Simple Linear Regression, the intercept is the expected value of Y when X is zero. In Multiple Linear Regression, it's the expected value of Y when all independent variables are zero.

---

### 16. What is the significance of the slope in regression analysis, and how does it affect predictions?
The slope indicates how much the dependent variable is expected to change with a one-unit change in the predictor. It directly affects prediction outcomes.

---

### 17. How does the intercept in a regression model provide context for the relationship between variables?
The intercept provides a baseline value for the dependent variable, offering insight into the starting point of the model when all predictors are zero.

---

### 18. What are the limitations of using R² as a sole measure of model performance?
- It doesn't indicate whether the model is biased
- Can be artificially inflated by adding irrelevant variables
- Doesn't measure predictive accuracy

---

### 19. How would you interpret a large standard error for a regression coefficient?
A large standard error suggests that the coefficient estimate is unstable and may not be significantly different from zero.

---

### 20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?
Heteroscedasticity appears as a funnel shape in residual plots. It violates regression assumptions and leads to inefficient estimates.

---

### 21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?
It may indicate that the model includes unnecessary predictors that do not contribute to explaining the variance in the dependent variable.

---

### 22. Why is it important to scale variables in Multiple Linear Regression?
Scaling ensures that variables contribute equally to the model, especially important for algorithms sensitive to magnitude and when using regularization.

---

### 23. What is polynomial regression?
Polynomial regression models the relationship between the dependent variable and the independent variable(s) as an nth-degree polynomial.

---

### 24. How does polynomial regression differ from linear regression?
Linear regression fits a straight line, while polynomial regression fits a curved line by including higher-degree terms of the independent variable.

---

### 25. When is polynomial regression used?
It is used when the data shows a non-linear relationship that cannot be captured by a straight line.

---

### 26. What is the general equation for polynomial regression?
**Y = β₀ + β₁X + β₂X² + β₃X³ + ... + βnXⁿ + ε**

---

### 27. Can polynomial regression be applied to multiple variables?
Yes, it can be extended to multiple predictors by including polynomial terms for each variable and their interactions.

---

### 28. What are the limitations of polynomial regression?
- Prone to overfitting with high-degree polynomials
- Difficult to interpret
- Sensitive to outliers

---

### 29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?
- Cross-validation
- Adjusted R²
- AIC/BIC
- Residual analysis

---

### 30. Why is visualization important in polynomial regression?
Visualization helps in understanding the data fit, identifying overfitting, and interpreting the relationship between variables.

---

### 31. How is polynomial regression implemented in Python?
```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# Example: 3rd-degree polynomial
model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
model.fit(X, y)
y_pred = model.predict(X)
