# Assignment Theory Questions

### 1. What is Simple Linear Regression?

->  Simple Linear Regression is a statistical method used to model the relationship between two variables: one independent variable (X) and one dependent variable (Y). It fits a straight line (Y = mX + c) through the data points to predict the value of Y based on X.


### 2. What are the key assumptions of Simple Linear Regression?

-> Linearity: Relationship between X and Y is linear.
Independence: Observations are independent of each other.
Homoscedasticity: Constant variance of errors.
Normality: Residuals (errors) are normally distributed.


### 3. What does the coefficient m represent in the equation Y = mX + c?

-> m is the slope of the regression line. It represents the change in Y for a one-unit change in X.


### 4. What does the intercept c represent in the equation Y = mX + c?

-> c is the intercept of the line. It indicates the value of Y when X = 0.


### 5. How do we calculate the slope m in Simple Linear Regression?
Using the least squares method:

-> m = (n * Σ(xy) - Σx * Σy) / (n * Σ(x²) - (Σx)²)


### 6. What is the purpose of the least squares method in Simple Linear Regression?

-> To find the best-fitting line by minimizing the sum of squared differences (errors) between the actual Y values and predicted Y values.


### 7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?

-> R² measures the proportion of the variance in the dependent variable that is predictable from the independent variable.
R² = 1: perfect fit
R² = 0: no relationship


### 8. What is Multiple Linear Regression?

-> Multiple Linear Regression involves more than one independent variable to predict the dependent variable. The equation is: \(Y=b_{0}+b_{1}X_{1}+b_{2}X_{2}+\dots +b_{n}X_{n}\)


### 9. What is the main difference between Simple and Multiple Linear Regression?

-> Simple Linear Regression uses one independent variable.
Multiple Linear Regression uses two or more independent variables.


### 10. What are the key assumptions of Multiple Linear Regression?
Same as Simple Linear Regression plus:

-> No multicollinearity between predictors.
No autocorrelation.
The model is correctly specified (no missing variables).


### 11. What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?

-> Heteroscedasticity means non-constant variance of errors. It can lead to inefficient estimates and incorrect conclusions about predictor significance.


### 12. How can you improve a Multiple Linear Regression model with high multicollinearity?

-> Remove highly correlated variables.
Use Principal Component Analysis (PCA).
Apply Ridge or Lasso regression.
Combine correlated predictors.


### 13. What are some common techniques for transforming categorical variables for use in regression models?

-> One-hot encoding
Label encoding
Binary encoding
Ordinal encoding


### 14. What is the role of interaction terms in Multiple Linear Regression?

-> Interaction terms allow modeling interactions between variables, helping to capture combined effects (e.g., X1 * X2) that wouldn't be explained by individual variables alone.


### 15. How can the interpretation of intercept differ between Simple and Multiple Linear Regression?

-> In Simple Linear Regression, intercept is the expected value of Y when X = 0.
In Multiple Linear Regression, it's the expected value of Y when all Xs = 0 (which may not be meaningful if 0 is outside the data range).


### 16. What is the significance of the slope in regression analysis, and how does it affect predictions?

-> The slope shows how much Y changes per unit change in X. A significant slope (low p-value) means X has a statistically significant impact on Y.


### 17. How does the intercept in a regression model provide context for the relationship between variables?

-> The intercept represents the predicted value of Y when all X variables are 0. It gives a baseline and helps interpret the overall model, especially when zero is meaningful for predictors.


### 18. What are the limitations of using R² as a sole measure of model performance?

-> R² doesn't indicate causation.
It can increase with added variables, even if they're irrelevant.
It doesn't detect overfitting or model bias.
Doesn't measure prediction accuracy on new data.


### 19. How would you interpret a large standard error for a regression coefficient?

-> A large standard error indicates uncertainty in the estimate of the coefficient. It suggests the coefficient might not be significantly different from zero.


### 20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?

-> Identification: Residuals fan out or form a pattern (non-random scatter).
Importance: It violates regression assumptions, affecting standard errors, leading to unreliable hypothesis tests and confidence intervals.


### 21. What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?

-> It means the model has many variables, but some may be irrelevant. Adjusted R² penalizes for adding unnecessary variables, so a low adjusted R² suggests overfitting.


### 22. Why is it important to scale variables in Multiple Linear Regression?

-> Ensures equal weighting for variables.
Helps in convergence for optimization algorithms.
Necessary for regularization methods like Ridge/Lasso.


### 23. What is polynomial regression?

-> Polynomial regression models the relationship between X and Y as an nth-degree polynomial:
(Y=\beta _{0}+\beta _{1}X+\beta _{2}X^{2}+\dots +\beta _{n}X^{n}+\epsilon \)


### 24. How does polynomial regression differ from linear regression?

-> Linear regression models a straight-line relationship. Polynomial regression models a curved relationship using higher-degree terms.


### 25. When is polynomial regression used?

-> When data shows a non-linear pattern.
When the relationship cannot be well-fit by a straight line.


### 26. What is the general equation for polynomial regression?

-> y = b₀ + b₁x + b₂x² + ... + bₙxⁿ + ε
n is the degree of the polynomial.


### 27. Can polynomial regression be applied to multiple variables?

-> Yes. It becomes multivariate polynomial regression, involving interaction and polynomial terms of multiple independent variables.


### 28. What are the limitations of polynomial regression?

-> High degree polynomials can overfit.
Poor extrapolation beyond the training data.
Computationally expensive for high degrees.
Interpretability decreases as degree increases.


### 29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?

-> Cross-validation
Adjusted R²
AIC/BIC
Residual analysis
Validation set performance (RMSE, MAE)


### 30. Why is visualization important in polynomial regression?
It helps:

-> Understand fit quality.
Detect overfitting/underfitting.
Communicate model behavior clearly.



In [4]:
# 31. How is polynomial regression implemented in Python?

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression())