**1. What is Simple Linear Regression?**

**Answer:**
Simple Linear Regression (SLR) is a supervised learning technique used to model the linear relationship between a **single independent variable (X)** and a **dependent variable (Y)**. The goal is to find the best-fitting straight line (regression line) that minimizes the error between the predicted and actual values of Y. The model is represented by the equation:

$$
Y = mX + c
$$

Where `m` is the slope and `c` is the intercept.

---

**2. What are the key assumptions of Simple Linear Regression?**

**Answer:**
The effectiveness of SLR relies on several assumptions:

1. **Linearity**: The relationship between X and Y is linear.
2. **Independence**: The residuals (errors) are independent of each other.
3. **Homoscedasticity**: Constant variance of residuals across values of X.
4. **Normality**: The residuals are normally distributed.
5. **No multicollinearity**: (Not relevant in SLR but important in MLR).

Violating these assumptions can lead to biased or inefficient estimates.

---

**3. What does the coefficient 'm' represent in Y = mX + c?**

**Answer:**
The coefficient **‘m’** is the **slope** of the regression line. It quantifies the change in the dependent variable Y for a **one-unit increase in the independent variable X**. A positive m indicates a direct relationship, while a negative m indicates an inverse relationship.

---

**4. What does the intercept 'c' represent in Y = mX + c?**

**Answer:**
The intercept **‘c’** is the value of Y when X = 0. It represents the point where the regression line crosses the Y-axis. Conceptually, it provides a baseline or reference point for prediction when the input variable has no contribution.

---

**5. How do we calculate the slope (m) in Simple Linear Regression?**

**Answer:**
The slope is derived using the **least squares method**, which minimizes the squared differences between actual and predicted Y values. The formula is:

$$
m = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2}
$$

This captures the **covariance of X and Y**, normalized by the **variance of X**.

---

**6. What is the purpose of the least squares method in Simple Linear Regression?**

**Answer:**
The **least squares method** aims to minimize the **sum of squared residuals** (the vertical distances between observed and predicted Y values). It ensures the regression line has the best fit by reducing the overall prediction error across all data points.

---

**7. How is the coefficient of determination (R²) interpreted in Simple Linear Regression?**

**Answer:**
**R²**, or the coefficient of determination, measures the **proportion of variance in the dependent variable** that is predictable from the independent variable.

* R² = 1 → Perfect fit
* R² = 0 → No predictive power
  In SLR, a high R² suggests a strong linear relationship, but it does **not imply causation**.


**Question 8. What is Multiple Linear Regression?**

**Answer:**
Multiple Linear Regression (MLR) is an extension of simple linear regression that models the relationship between **one dependent variable (Y)** and **two or more independent variables (X₁, X₂, ..., Xₙ)**.
The model is expressed as:

$$
Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n + \varepsilon
$$

Each coefficient $\beta_i$ represents the effect of its corresponding variable $X_i$, assuming all other variables are held constant.

---

**Question 9. What is the main difference between Simple and Multiple Linear Regression?**

**Answer:**
The primary difference is the number of independent variables:

* **SLR** uses **one independent variable**
* **MLR** uses **two or more independent variables**

MLR helps in capturing **more complex relationships** and improving model accuracy by considering multiple predictors.

---

Question 10. What are the key assumptions of Multiple Linear Regression?**

**Answer:**
MLR relies on the following assumptions:

1. **Linearity**: The relationship between independent variables and the dependent variable is linear.
2. **Independence of errors**: Residuals are not correlated.
3. **Homoscedasticity**: Constant variance of residuals across all levels of predictors.
4. **Normality of residuals**: Errors are normally distributed.
5. **No multicollinearity**: Independent variables should not be highly correlated with each othe

**Question 11. What is heteroscedasticity, and how does it affect MLR results?**

**Answer:**
**Heteroscedasticity** occurs when the variance of residuals is **not constant** across all levels of the independent variables.
This violates the homoscedasticity assumption and leads to:

* Biased standard errors
* Invalid hypothesis tests (t-tests, F-tests)
* Less reliable confidence intervals

It is often detected via **residual plots**.

**Question 12. How can you improve a Multiple Linear Regression model with high multicollinearity?**

**Answer:**
To handle multicollinearity (when independent variables are highly correlated), consider:

* **Removing or combining correlated predictors**
* **Using PCA (Principal Component Analysis)**
* **Applying regularization techniques** like Ridge or Lasso regression
* **Variance Inflation Factor (VIF)** analysis to detect problematic variables

**Question 13. What are some common techniques for transforming categorical variables in regression?**

**Answer:**
To include categorical variables in a regression model:

* **One-Hot Encoding**: Converts categories into binary columns (e.g., 'Red', 'Blue' → \[1, 0], \[0, 1])
* **Label Encoding**: Assigns numeric labels (use cautiously as it implies order)
* **Ordinal Encoding**: For ordered categories (e.g., Low < Medium < High)

These techniques help in incorporating non-numeric features into numerical models.


**Question 14. What is the role of interaction terms in MLR?**

**Answer:**
**Interaction terms** capture the **combined effect** of two or more variables on the dependent variable.
For instance, the effect of **X₁ on Y** might change depending on **X₂**.
Interaction term:

$$
Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 (X_1 \cdot X_2)
$$

Useful when variables interact in a **non-additive** way.


**Question 15. How can the interpretation of intercept differ between SLR and MLR?**

**Answer:**

* In **SLR**, the intercept is the expected value of Y when **X = 0**.
* In **MLR**, the intercept represents the expected value of Y when **all independent variables = 0**.
  This may not always be interpretable or realistic depending on the variables involved.

---

**Question 16. What is the significance of the slope in regression analysis?**

**Answer:**
Each slope (coefficient) in a regression model represents the **average change in the dependent variable Y** for a **one-unit change in the independent variable**, assuming all other variables are constant.
It quantifies the **direction and strength** of influence.


Question 17. How does the intercept in a regression model provide context for the relationship between variables?**

**Answer:**
The intercept offers a **baseline value** of the target variable when all predictors are zero. While sometimes it lacks real-world meaning, it is essential for positioning the regression plane or line in multi-dimensional spce

**18. What are the limitations of using R² as a sole measure of model performance?**

**Answer:**
R² only indicates the **proportion of variance explained**, but:

* It increases even if we add irrelevant variables.
* It doesn't measure **predictive accuracy**.
* It doesn't penalize model complexity.

Hence, **Adjusted R²**, **RMSE**, and **cross-validation** scores are preferred for holistic evaluation.

**19. How would you interpret a large standard error for a regression coefficient?**

**Answer:**
A **large standard error** indicates high variability in the estimate of the coefficient, meaning the model is **less confident** in that coefficient's value. This may suggest the variable is not significantly contributing to the model.

**20. How can heteroscedasticity be identified in residual plots, and why is it important to address it?**

**Answer:**
Plot residuals vs. predicted values:

* If the spread of residuals increases or decreases with predictions, heteroscedasticity is present (e.g., fan or cone shapes).
* Important to address it as it leads to **inefficient and biased** coefficient estimates and **invalid hypothesis testing**.


**21. What does it mean if a MLR model has a high R² but low adjusted R²?**

**Answer:**
This means the model has many variables that **do not improve prediction**.

* **R²** increases with every variable added.
* **Adjusted R²** penalizes unnecessary variables.
  If adjusted R² is low, the model may be **overfitted or poorly specified**.

**22. Why is it important to scale variables in Multiple Linear Regression?**

**Answer:**
Scaling (standardization or normalization) ensures:

* Variables are on the **same scale**, avoiding dominance by variables with large ranges.
* Improves **model stability** and **interpretation of coefficients**, especially in regularized regression (e.g., Ridge, Lasso).

**23. What is Polynomial Regression?**

**Answer:**
Polynomial Regression is a form of linear regression where the **relationship between independent and dependent variables is modeled as an nth-degree polynomial**.
Instead of fitting a straight line (as in linear regression), it fits a **curve** to capture nonlinear patterns.
Equation:

$$
Y = \beta_0 + \beta_1X + \beta_2X^2 + \cdots + \beta_nX^n + \varepsilon
$$


**24. How does Polynomial Regression differ from Linear Regression?**

**Answer:**

* **Linear Regression** fits a straight line (1st-degree polynomial).
* **Polynomial Regression** fits a **curved line** by adding powers of X (e.g., X², X³).
  It can model more complex, nonlinear relationships.


**25. When is Polynomial Regression used?**

**Answer:**
Polynomial Regression is used when:

* The data shows **nonlinear patterns** that a straight line cannot capture.
* The residuals from a linear regression indicate **non-random patterns**.
  Common in modeling growth curves, market trends, or physical phenomena.

**26. What is the general equation for Polynomial Regression?**

**Answer:**

$$
Y = \beta_0 + \beta_1X + \beta_2X^2 + \beta_3X^3 + \cdots + \beta_nX^n + \varepsilon
$$

Where:

* $\beta_i$ are the coefficients
* $X^n$ are the polynomial terms
* $\varepsilon$ is the error term


**27. Can Polynomial Regression be applied to multiple variables?**

**Answer:**
Yes. Polynomial regression can be extended to **multiple variables**, resulting in **interaction and power terms** for each variable.
For example, with X and Z:

$$
Y = \beta_0 + \beta_1X + \beta_2Z + \beta_3X^2 + \beta_4XZ + \beta_5Z^2
$$

This is also known as **multivariate polynomial regression**.


**28. What are the limitations of Polynomial Regression?**

**Answer:**

* **Overfitting**: High-degree polynomials can model noise instead of the trend.
* **Interpretability**: As the degree increases, the model becomes complex and hard to explain.
* **Instability**: Slight changes in data can cause large swings in the curve (especially at boundaries — known as Runge’s phenomenon).


**29. What methods can be used to evaluate model fit when selecting the degree of a polynomial?**

**Answer:**

* **Cross-validation (k-fold)**: Helps detect overfitting.
* **Adjusted R²**: Penalizes unnecessary complexity.
* **AIC/BIC (Akaike/Bayesian Information Criteria)**: Lower values indicate a better trade-off between fit and complexity.
* **Residual plots**: Check for randomness in residuals.

**30. Why is visualization important in Polynomial Regression?**

**Answer:**
Visualization helps to:

* **Understand** the shape and flexibility of the fitted curve.
* Detect **underfitting (too simple)** or **overfitting (too complex)** visually.
* Communicate results intuitively to stakeholders.
  Plots of data + fitted curve + residuals give crucial insights into model behavior.


**31 How is Polynomial Regression implemented in Python **.







In [1]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

X = [[1], [2], [3], [4]]
y = [3, 6, 11, 18]

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, y)


y_pred = model.predict(X_poly)
