# REGRESSION ASSIGNMENT QUESTIONS

# 1. What is Simple Linear Regression?
Simple Linear Regression models the relationship between two variables: one independent variable (X) and one dependent variable (Y), using the equation:
  **Y = mX + c**,
where **m** is the slope and **c** is the intercept.


---



#2. What are the key assumptions of Simple Linear Regression?

* Linearity
* Independence of errors
* Homoscedasticity (constant variance of errors)
* Normality of residuals
* No (or minimal) multicollinearity (not relevant with only one X)


---


# 3. What does the coefficient *m* represent in Y = mX + c?
The **slope (m)** shows how much Y changes for a one-unit increase in X.


---


# 4. What does the intercept *c* represent in Y = mX + c?
The **intercept (c)** is the predicted value of Y when X = 0.


---


# 5. How do we calculate the slope *m* in Simple Linear Regression?
$$
m = \frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sum{(x_i - \bar{x})^2}}
$$

---


# 6. What is the purpose of the least squares method?
It minimizes the **sum of squared residuals** (differences between observed and predicted values).


---


# 7. How is R² interpreted in Simple Linear Regression?
R² measures the proportion of variance in Y explained by X.

* R² = 1: perfect fit
* R² = 0: model explains none of the variance



---



# 8. What is Multiple Linear Regression?
It models the relationship between one dependent variable and **two or more independent variables**:
  **Y = b₀ + b₁X₁ + b₂X₂ + ... + bₙXₙ**


---



# 9.what is the Main difference between Simple and Multiple Linear Regression?

* **Simple:** One independent variable
* **Multiple:** Two or more independent variables



---




# 10. what are the Key assumptions of Multiple Linear Regression?

* Linearity
* Independence of errors
* Homoscedasticity
* Normality of residuals
* No multicollinearity




---




# 11. What is heteroscedasticity?
Unequal variance of residuals across levels of an independent variable. It leads to inefficient estimates and biased standard errors.



---



# 12. How can you improve a Multiple Linear Regression model with high multicollinearity?

* Remove/reduce correlated features
* Use **Principal Component Analysis (PCA)**
* Use **regularization** (Ridge/Lasso)



---



# 13.What are some common techniques for transforming categorical variables for use in regression models?

* **One-Hot Encoding**
* **Label Encoding** (only when order matters)
* **Binary Encoding**, **Target Encoding** (advanced)



---



# 14. What is the role of interaction terms in Multiple Linear Regression?
They model situations where the effect of one variable depends on another (e.g., **X₁ × X₂**).



---




# 15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?

* **Simple:** Y when X = 0
* **Multiple:** Y when all Xᵢ = 0 (often less interpretable)




---




# 16. What is the significance of the slope in regression analysis, and how does it affect predictions?
Indicates how the dependent variable is expected to change when an independent variable increases by one unit, holding others constant.




---




# 17. How does the intercept in a regression model provide context for the relationship between variables?
It sets a **baseline value** for predictions. Though it may not always have a real-world interpretation, it anchors the regression line.




---




# 18. What are the limitations of using R² as a sole measure of model performance?

* Doesn’t penalize for overfitting
* Can increase with more variables even if they aren’t meaningful
* Doesn’t indicate whether predictors are statistically significant



---



# 19. How would you interpret a large standard error for a regression coefficient?
Implies **low precision** in estimating that coefficient. It may not be statistically significant.




---


# 20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?
Look for a **funnel shape** (residuals fan out/in). Addressing it is vital for valid statistical inference.



---




# 21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?
Suggests **overfitting** – you may have included irrelevant variables.



---





# 22. Why is it important to scale variables in Multiple Linear Regression?

* Required for algorithms using regularization
* Helps interpret coefficients
* Improves numerical stability




---





# 23. What is polynomial regression?
It models a non-linear relationship by adding powers of the independent variable(s), e.g.:
  **Y = b₀ + b₁X + b₂X² + ... + bₙXⁿ**



---




# 24. How does polynomial regression differ from linear regression?
Still **linear in parameters**, but includes **non-linear terms** (X², X³, etc.).




---




# 25. When is polynomial regression used?
When a **non-linear** relationship exists between X and Y but you still want to use linear models.



---



# 26. What is the general equation for polynomial regression?
  **Y = b₀ + b₁X + b₂X² + ... + bₙXⁿ**



---




# 27. Can polynomial regression be applied to multiple variables?
Yes — e.g., include **X₁²**, **X₁×X₂**, etc., but risk of **overfitting** increases.



---




# 28.What are the limitations of polynomial regression?

* Sensitive to outliers
* Overfitting with high-degree polynomials
* Difficult to interpret



---



# 29. - What methods can be used to evaluate model fit when selecting the degree of a polynomial?

* **Cross-validation (CV)**
* **Adjusted R²**
* **AIC/BIC**
* **Residual plots**



---



# 30. Why is visualization important in polynomial regression?
Helps you:

* Understand the **shape** of the model
* Identify **overfitting/underfitting**
* Interpret predictions more intuitively


---


# 31. How to implement polynomial regression in Python?

```python
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# Example: Polynomial of degree 3
model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
```

---

