Q1. Explain the difference between simple linear regression and multiple linear regression. Provide an
example of each.
Q2. Discuss the assumptions of linear regression. How can you check whether these assumptions hold in
a given dataset?
Q3. How do you interpret the slope and intercept in a linear regression model? Provide an example using
a real-world scenario.
Q4. Explain the concept of gradient descent. How is it used in machine learning?
Q5. Describe the multiple linear regression model. How does it differ from simple linear regression?
Q6. Explain the concept of multicollinearity in multiple linear regression. How can you detect and
address this issue?
Q7. Describe the polynomial regression model. How is it different from linear regression?
Q8. What are the advantages and disadvantages of polynomial regression compared to linear
regression? In what situations would you prefer to use polynomial regression?

### **Q1: Difference between Simple Linear Regression and Multiple Linear Regression**
- **Simple Linear Regression**: Involves one independent variable (predictor) and one dependent variable (response).  
  **Example**: Predicting house price based on square footage.  

  \[
  y = \beta_0 + \beta_1x + \epsilon
  \]

  where \( y \) is the house price, \( x \) is the square footage, \( \beta_0 \) is the intercept, \( \beta_1 \) is the slope, and \( \epsilon \) is the error term.

- **Multiple Linear Regression**: Involves two or more independent variables.  
  **Example**: Predicting house price based on square footage, number of bedrooms, and location.

  \[
  y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \beta_nx_n + \epsilon
  \]

---

### **Q2: Assumptions of Linear Regression and How to Check Them**
1. **Linearity**: The relationship between independent and dependent variables should be linear.  
   **Check**: Scatter plots, residual plots.  
2. **Independence**: Observations should be independent.  
   **Check**: Durbin-Watson test.  
3. **Homoscedasticity**: Constant variance of residuals.  
   **Check**: Residual plot (should have constant spread).  
4. **Normality**: Residuals should be normally distributed.  
   **Check**: Histogram, Q-Q plot, Shapiro-Wilk test.  
5. **No Multicollinearity**: Independent variables should not be highly correlated.  
   **Check**: Variance Inflation Factor (VIF).  

---

### **Q3: Interpretation of Slope and Intercept**
- **Intercept (\(\beta_0\))**: The predicted value of \( y \) when all independent variables are zero.  
- **Slope (\(\beta_1\))**: The change in \( y \) for a one-unit increase in \( x \).  

**Example**:  
If a model predicts **salary** based on **years of experience**:

\[
Salary = 30,000 + 5,000 \times (Years\ of\ Experience)
\]

- **Intercept (30,000)**: Base salary when experience = 0.  
- **Slope (5,000)**: Each additional year of experience increases salary by ₹5,000.  

---

### **Q4: Gradient Descent in Machine Learning**
Gradient descent is an optimization algorithm used to minimize the error in machine learning models by iteratively adjusting model parameters.  
1. Start with random values of parameters (\(\beta_0, \beta_1, \dots\)).  
2. Compute the cost function (Mean Squared Error).  
3. Compute the gradient (partial derivatives).  
4. Update parameters using:

   \[
   \theta_j := \theta_j - \alpha \cdot \frac{\partial}{\partial \theta_j} J(\theta)
   \]

   where \( \alpha \) is the learning rate.  
5. Repeat until convergence.  

---

### **Q5: Multiple Linear Regression Model**
- It models a dependent variable based on **multiple independent variables**.  
- Equation:

  \[
  y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \beta_nx_n + \epsilon
  \]

- **Difference from Simple Linear Regression**: It has more than one predictor variable, allowing for more complex relationships.  

**Example**: Predicting car price based on **age, mileage, and brand**.  

---

### **Q6: Multicollinearity in Multiple Linear Regression**
**Definition**: When two or more independent variables are highly correlated, making it difficult to determine individual effects.  

**How to Detect?**  
1. **Correlation Matrix**: High correlations (\(|r| > 0.7\)) indicate multicollinearity.  
2. **Variance Inflation Factor (VIF)**: VIF > 5 suggests multicollinearity.  

**How to Fix?**  
1. Remove highly correlated variables.  
2. Use **Principal Component Analysis (PCA)**.  
3. Use **Ridge Regression** or **Lasso Regression**.  

---

### **Q7: Polynomial Regression Model**
Polynomial regression extends linear regression by adding polynomial terms to capture non-linear relationships.

Equation:

\[
y = \beta_0 + \beta_1x + \beta_2x^2 + \dots + \beta_nx^n + \epsilon
\]

**Difference from Linear Regression**:  
- Linear regression fits a **straight line**.  
- Polynomial regression fits a **curved line** (quadratic, cubic, etc.).  

**Example**: Predicting house prices where the effect of size is non-linear.  

---

### **Q8: Advantages and Disadvantages of Polynomial Regression**
**Advantages**:  
Captures non-linear relationships.  
More flexible than simple linear regression.  

**Disadvantages**:  
Prone to overfitting for high-degree polynomials.  
More complex and computationally expensive.  

**When to Use?**  
 If the relationship between variables is non-linear.  
 When linear regression gives high error.  
Use **cross-validation** to check overfitting.  
