**Easy Level:**

1. What is linear regression?
2. Explain the difference between simple linear regression and multiple linear regression.
3. What is the goal of linear regression?
4. How do you represent a linear regression equation mathematically?
5. What is the role of the dependent variable in linear regression?
6. Define the terms "independent variable" and "coefficient" in the context of linear regression.
7. What is the least squares method, and how is it used in linear regression?
8. What is the difference between the intercept and slope in a linear regression equation?
9. Explain the concept of residuals in linear regression.
10. How do you assess the goodness of fit in linear regression?


**1. What is linear regression?**


**Linear regression** is a statistical method used to model the relationship between one dependent variable and one or more independent variables by fitting a linear equation to observed data.

---

**2. Explain the difference between simple linear regression and multiple linear regression.**

* **Simple Linear Regression**: Involves **one independent variable** and one dependent variable.

  $$
  y = \beta_0 + \beta_1 x + \epsilon
  $$
* **Multiple Linear Regression**: Involves **two or more independent variables** predicting a single dependent variable.

  $$
  y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \epsilon
  $$

---

**3. What is the goal of linear regression?**
The goal is to **find the best-fitting linear relationship** between the input features and the output variable, so that we can:

* Understand the strength and direction of relationships.
* Predict outcomes for new data.

---

**4. How do you represent a linear regression equation mathematically?**

$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \epsilon
$$

Where:

* $y$: Dependent variable
* $\beta_0$: Intercept
* $\beta_1, \dots, \beta_n$: Coefficients
* $x_1, \dots, x_n$: Independent variables
* $\epsilon$: Error term

---

**5. What is the role of the dependent variable in linear regression?**
The **dependent variable** (also called target or output) is the variable the model is trying to predict or explain based on the independent variables.

---

**6. Define the terms "independent variable" and "coefficient" in the context of linear regression.**

* **Independent variable**: Input variable(s) that influence the dependent variable.
* **Coefficient ($\beta_i$)**: Represents the weight or influence of each independent variable on the dependent variable. A change of 1 unit in $x_i$ leads to a $\beta_i$ change in $y$, all else constant.

---

**7. What is the least squares method, and how is it used in linear regression?**
The **least squares method** minimizes the **sum of squared residuals** (differences between actual and predicted values):

$$
\min \sum (y_i - \hat{y}_i)^2
$$

This method helps determine the optimal coefficients for the linear regression model.

---

**8. What is the difference between the intercept and slope in a linear regression equation?**

* **Intercept ($\beta_0$)**: Value of $y$ when all independent variables are zero. It shifts the regression line up or down.
* **Slope ($\beta_i$)**: Represents the change in $y$ for a one-unit change in $x_i$. It determines the direction and steepness of the line.

---

**9. Explain the concept of residuals in linear regression.**
**Residuals** are the differences between the actual and predicted values:

$$
\text{Residual} = y_i - \hat{y}_i
$$

They represent the error or noise the model couldn’t explain.

---

**10. How do you assess the goodness of fit in linear regression?**
Common metrics:

* **R-squared ($R^2$)**: Proportion of variance in $y$ explained by $X$; ranges from 0 to 1.
* **Adjusted R-squared**: Adjusts $R^2$ for number of predictors.
* **Mean Squared Error (MSE)** or **Root Mean Squared Error (RMSE)**: Measure average prediction error.
* **Residual plots**: Visual inspection of model performance and assumption validity.


**Medium Level:**

11. What are the assumptions of linear regression?
12. How do you handle multicollinearity in multiple linear regression?
13. Describe the process of model selection in linear regression.
14. What is the purpose of the coefficient of determination (R-squared) in linear regression?
15. How do you interpret the p-value in linear regression?
16. Explain the concept of homoscedasticity and its implications in linear regression.
17. What is regularization, and why might it be applied to linear regression?
18. Discuss the difference between L1 and L2 regularization techniques in linear regression.
19. What is the normality assumption in linear regression, and how is it tested?
20. How can you deal with outliers in linear regression?



### **11. What are the assumptions of linear regression?**

1. **Linearity** – The relationship between independent and dependent variables is linear.
2. **Independence** – Observations are independent of each other.
3. **Homoscedasticity** – Constant variance of errors across all levels of independent variables.
4. **Normality of errors** – The residuals (errors) should be normally distributed.
5. **No multicollinearity** – Independent variables should not be highly correlated with each other.

---

### **12. How do you handle multicollinearity in multiple linear regression?**

* **Remove correlated predictors**.
* **Combine variables** using techniques like PCA.
* **Use regularization** (e.g., Ridge regression).
* **Check Variance Inflation Factor (VIF)** to detect multicollinearity (VIF > 5 or 10 indicates issues).

---

### **13. Describe the process of model selection in linear regression.**

1. **Feature selection** – Choose relevant features (manual or automated).
2. **Fit multiple models** – Try different combinations of predictors.
3. **Evaluate metrics** – Use R², Adjusted R², AIC, BIC, RMSE.
4. **Cross-validation** – Check how the model generalizes to unseen data.
5. **Choose best tradeoff** between simplicity and performance.

---

### **14. What is the purpose of the coefficient of determination (R-squared) in linear regression?**

* Measures the **proportion of variance** in the dependent variable explained by the model.
* Ranges from **0 to 1**.
* Higher R² means better fit, but **does not guarantee causation or no overfitting**.

---

### **15. How do you interpret the p-value in linear regression?**

* It tests whether a coefficient is significantly different from zero.
* **Low p-value (< 0.05)** → The predictor is statistically significant.
* **High p-value** → Not enough evidence that the predictor impacts the dependent variable.

---

### **16. Explain the concept of homoscedasticity and its implications in linear regression.**

* Homoscedasticity = **Equal variance of residuals** across all values of predictors.
* If violated (i.e., heteroscedasticity), it can:

  * Distort standard errors
  * Lead to **invalid confidence intervals** and **p-values**
* Detected using **residual plots** or **Breusch-Pagan test**.

---

### **17. What is regularization, and why might it be applied to linear regression?**

* Regularization adds a **penalty** to the loss function to discourage large coefficients.
* Helps prevent **overfitting**, especially when:

  * There are many features
  * Features are correlated
* Common types: **Ridge (L2)** and **Lasso (L1)** regression.

---

### **18. Discuss the difference between L1 and L2 regularization techniques in linear regression.**

| Feature           | L1 (Lasso)             | L2 (Ridge)        |
| ----------------- | ---------------------- | ----------------- |
| Penalty           | Sum of absolute values | Sum of squares    |
| Feature Selection | Yes (can shrink to 0)  | No                |
| Sparse model?     | Yes                    | No                |
| Use case          | Feature reduction      | Multicollinearity |

---

### **19. What is the normality assumption in linear regression, and how is it tested?**

* Assumes that the **residuals (errors)** follow a **normal distribution**.
* Important for valid **confidence intervals** and **hypothesis tests**.
* Tested using:

  * **Histogram** or **Q-Q plot** of residuals
  * **Shapiro-Wilk** or **Kolmogorov–Smirnov test**

---

### **20. How can you deal with outliers in linear regression?**

* **Identify** using boxplots, residual plots, or leverage scores (e.g., Cook's distance).
* **Solutions**:

  * Remove them (if truly anomalous)
  * Use **robust regression** (e.g., Huber regression)
  * **Transform variables** (e.g., log transformation)
  * Cap or winsorize the values

---


**Hard Level:**

21. Describe the gradient descent algorithm and its use in linear regression.
22. What are the potential issues with using linear regression for a dataset with a nonlinear relationship?
23. Explain the bias-variance trade-off in the context of linear regression.
24. Discuss the concept of ridge regression and its advantages over ordinary least squares (OLS) regression.
25. What is the purpose of feature scaling or normalization in linear regression?
26. How does cross-validation help in evaluating the performance of a linear regression model?
27. Describe the differences between linear regression and logistic regression.
28. What is the impact of outliers on the coefficients and predictions of a linear regression model?
29. Explain the concept of heteroscedasticity and how to address it in linear regression.
30. What is the difference between ridge regression and LASSO regression?


### **21. Describe the gradient descent algorithm and its use in linear regression.**

* **Gradient Descent** is an optimization algorithm that minimizes the cost function (e.g., Mean Squared Error).
* It updates parameters (weights) iteratively:

$$
\text{new } w = w - \alpha \cdot \frac{\partial \text{Loss}}{\partial w}
$$

* In **linear regression**, it finds the best-fitting line by reducing the prediction error.

> Used when analytical solutions (like the Normal Equation) are too expensive or infeasible for large datasets.

---

### **22. What are the potential issues with using linear regression for a dataset with a nonlinear relationship?**

* Linear regression assumes a straight-line relationship.
* On nonlinear data:

  * **Poor fit** and high residuals
  * **Low R² score**
  * **Biased predictions**
* Solution: Use **polynomial regression** or **nonlinear models** (like decision trees, SVR, etc.).

---

### **23. Explain the bias-variance trade-off in the context of linear regression.**

* **Bias**: Error from incorrect assumptions (e.g., assuming linearity).
* **Variance**: Error from sensitivity to small fluctuations in training data.
* **Trade-off**:

  * **High bias** → underfitting
  * **High variance** → overfitting

Linear regression has **low variance, high bias**, making it stable but sometimes underfits.

---

### **24. Discuss the concept of ridge regression and its advantages over ordinary least squares (OLS) regression.**

* **Ridge Regression** adds a penalty to large coefficients:

$$
\text{Loss} = \text{MSE} + \lambda \sum w^2
$$

* **Advantages**:

  * Reduces **overfitting**
  * Handles **multicollinearity**
  * Keeps all features (shrinks but doesn’t eliminate)

---

### **25. What is the purpose of feature scaling or normalization in linear regression?**

* Ensures all features contribute **equally** to the model.
* Especially important for:

  * **Gradient descent** (converges faster)
  * **Regularized models** (like Ridge/Lasso) so no feature dominates the penalty term.
* Methods: **Standardization** (z-score), **Min-Max scaling**

---

### **26. How does cross-validation help in evaluating the performance of a linear regression model?**

* Cross-validation (e.g., **k-fold CV**) splits data into **training + validation sets** multiple times.
* Provides a more **robust estimate** of model performance.
* Helps detect:

  * **Overfitting** (train ≫ validation error)
  * **Underfitting** (both errors high)
* Improves **generalizability**.

---

### **27. Describe the differences between linear regression and logistic regression.**

| Aspect        | Linear Regression             | Logistic Regression                |
| ------------- | ----------------------------- | ---------------------------------- |
| Output        | Continuous                    | Binary (0/1)                       |
| Function      | Linear                        | Sigmoid (S-shaped)                 |
| Loss Function | Mean Squared Error            | Log Loss (Cross-Entropy)           |
| Use Case      | Predict values (e.g., prices) | Classification (e.g., spam vs not) |

---

### **28. What is the impact of outliers on the coefficients and predictions of a linear regression model?**

* **Outliers** have a **large influence** on the fitted line.
* Can **distort coefficients**, reducing model accuracy.
* Predictions become unreliable.
* Solutions:

  * **Remove or cap outliers**
  * Use **robust regression**
  * **Log-transform** or use **median-based techniques**

---

### **29. Explain the concept of heteroscedasticity and how to address it in linear regression.**

* **Heteroscedasticity**: Non-constant variance of residuals.

  * Violates regression assumptions
  * Leads to **inefficient estimates** and **biased standard errors**
* Detection:

  * Residual plots
  * Breusch–Pagan test
* Solutions:

  * **Log-transform** the dependent variable
  * Use **Weighted Least Squares**
  * Switch to **robust regression**

---

### **30. What is the difference between ridge regression and LASSO regression?**

| Feature          | Ridge (L2)                   | Lasso (L1)                         |   |   |
| ---------------- | ---------------------------- | ---------------------------------- | - | - |
| Penalty Term     | $\lambda \sum w^2$           | (\lambda \sum                      | w | ) |
| Shrinks to Zero? | No                           | Yes (feature selection)            |   |   |
| Use Case         | Multicollinearity            | Sparse models, feature elimination |   |   |
| Solution Type    | Always has a unique solution | May zero out some coefficients     |   |   |



**Advanced Level:**

31. How can you handle missing data in linear regression?
32. Discuss the concept of endogeneity in linear regression and how to address it.
33. What is the Durbin-Watson statistic, and what does it measure in linear regression?
34. Explain the concept of autocorrelation in the residuals of a time series linear regression model.
35. What are generalized linear models (GLMs), and how do they extend linear regression?
36. Describe the assumptions and applications of logistic regression in contrast to linear regression.
37. How can you perform feature selection in linear regression effectively?
38. Discuss the differences between forward, backward, and stepwise regression selection methods.
39. What is the concept of regularized linear regression, and how does it relate to ridge and LASSO regression?
40. Can you implement linear regression from scratch in Python or another programming language?




**31. How can you handle missing data in linear regression?**
Handling missing data is critical for ensuring model accuracy. Common strategies include:

* **Deletion**:

  * *Listwise deletion*: Remove rows with any missing values (if missingness is MCAR).
* **Imputation**:

  * *Mean/Median imputation*: For numerical data.
  * *KNN or regression-based imputation*: More accurate but complex.
  * *Multiple Imputation*: Generates multiple imputations and combines estimates.
* **Modeling missingness**: Use missingness as a feature if it may carry information.

---

**32. Discuss the concept of endogeneity in linear regression and how to address it.**
**Endogeneity** occurs when an independent variable is correlated with the error term. It violates the assumption of exogeneity, leading to biased and inconsistent estimates.
**Causes**:

* Omitted variable bias
* Simultaneity (mutual causality)
* Measurement error
  **Solutions**:
* **Instrumental Variables (IV)**: Use instruments uncorrelated with the error term but correlated with the endogenous regressor.
* **Two-stage least squares (2SLS)**: A common IV-based method.
* **Control function approach**: Adjusts the regression to account for endogeneity.

---

**33. What is the Durbin-Watson statistic, and what does it measure in linear regression?**
The **Durbin-Watson (DW) statistic** tests for **autocorrelation in the residuals**, particularly first-order serial correlation.
**Formula**:

$$
DW = \frac{\sum_{t=2}^n (e_t - e_{t-1})^2}{\sum_{t=1}^n e_t^2}
$$

**Interpretation**:

* DW ≈ 2 → No autocorrelation
* DW < 2 → Positive autocorrelation
* DW > 2 → Negative autocorrelation

---

**34. Explain the concept of autocorrelation in the residuals of a time series linear regression model.**
**Autocorrelation (serial correlation)** occurs when residuals are correlated across time.
**Why it matters**:

* Violates the assumption of independent errors.
* Leads to underestimated standard errors → inflated t-stats → misleading significance.
  **Detection**:
* Durbin-Watson test
* ACF/PACF plots
  **Remedies**:
* Include lagged variables
* Use time series models (e.g., ARIMA)
* Apply GLS (Generalized Least Squares)

---

**35. What are generalized linear models (GLMs), and how do they extend linear regression?**
GLMs **extend linear regression** by allowing:

* **Response variables** with error distributions other than normal (e.g., binomial, Poisson).
* A **link function** to model the relationship between predictors and the expected value of the response.
  **Components**:

1. **Random component**: Distribution of response (e.g., binomial, Poisson)
2. **Systematic component**: Linear combination of inputs
3. **Link function**: Connects the mean of the response to the linear predictor
   **Examples**:

* Logistic regression (logit link + binomial)
* Poisson regression (log link + Poisson)

---

**36. Describe the assumptions and applications of logistic regression in contrast to linear regression.**
**Logistic Regression** is used for **binary classification**, not regression.
**Key differences**:

* **Output**: Probability (0–1), not continuous value
* **Loss function**: Log-likelihood, not MSE
* **Link function**: Logistic (sigmoid)

**Assumptions of Logistic Regression**:

* Independent observations
* Linear relationship between independent variables and the **log-odds**
* No multicollinearity
* Large sample size
  **Applications**:
* Credit risk prediction
* Disease classification
* Email spam detection

---

**37. How can you perform feature selection in linear regression effectively?**
Methods include:

* **Filter Methods**:

  * Correlation matrix
  * Variance Threshold
* **Wrapper Methods**:

  * Forward selection
  * Backward elimination
  * Stepwise selection
* **Embedded Methods**:

  * Lasso (L1) regularization (shrinks some coefficients to zero)
  * Recursive Feature Elimination (RFE)
* **Model-based**:

  * Use feature importance from models (e.g., Random Forest)

---

**38. Discuss the differences between forward, backward, and stepwise regression selection methods.**

* **Forward Selection**:
  Starts with no variables, adds one at a time based on improvement in model (e.g., AIC, adjusted R²).

* **Backward Elimination**:
  Starts with all variables, removes the least significant one iteratively.

* **Stepwise Selection**:
  Combines forward and backward — adds/removes variables at each step.

**Use**:

* Forward: When few predictors are likely relevant.
* Backward: When you start with many.
* Stepwise: Balanced approach but prone to overfitting if not validated.

---

**39. What is the concept of regularized linear regression, and how does it relate to ridge and LASSO regression?**
Regularization **adds a penalty** to the loss function to prevent overfitting and reduce variance.

* **Ridge Regression (L2 penalty)**:

  $$
  \text{Loss} = \text{MSE} + \lambda \sum w_i^2
  $$

  Shrinks coefficients but doesn't make them exactly zero.

* **LASSO Regression (L1 penalty)**:

  $$
  \text{Loss} = \text{MSE} + \lambda \sum |w_i|
  $$

  Performs both shrinkage and feature selection (some coefficients become exactly zero).

* **Elastic Net**: Combines L1 and L2.

---

**40. Can you implement linear regression from scratch in Python or another programming language?**
Yes, here's a basic Python example using NumPy:

```python
import numpy as np

# Input data
X = np.array([[1, 1], [1, 2], [1, 3]])  # add intercept term manually
y = np.array([1, 2, 3])

# Closed-form solution (Normal Equation): w = (X^T X)^-1 X^T y
X_T_X = X.T @ X
X_T_y = X.T @ y
weights = np.linalg.inv(X_T_X) @ X_T_y

print("Weights:", weights)
```

This gives the coefficients of a simple linear regression model without using libraries like `scikit-learn`.

