# **Best Fit Line**

A **best fit line** is a straight line that best represents the relationship between the independent variable(s) (e.g., $x$) and the dependent variable (e.g., $y$) in regression analysis. It minimizes the error between the predicted values and the actual data points.

---

## **How to Find the Best Fit Line**

The equation of the best fit line is:  
$$ y = mx + c $$  
Where:
- $m$ = slope of the line (how steep it is),
- $c$ = y-intercept (where the line crosses the y-axis).

To determine $m$ and $c$, we use optimization techniques that minimize the cost function, such as **Gradient Descent**.

---

## **Gradient Descent Technique**

### **Overview**
Gradient Descent is an iterative optimization algorithm used to minimize the cost function by adjusting the model parameters ($m$ and $c$) step by step in the direction of the steepest descent.

---

### **Steps in Gradient Descent**

1. **Initialize Parameters:**
   - Start with initial guesses for $m$ and $c$ (e.g., $m = 0$, $c = 0$).

2. **Define the Cost Function:**
   - For linear regression, the cost function is Mean Squared Error (MSE):
     $$ J(m, c) = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - (mx_i + c) \right)^2 $$

3. **Compute the Gradients:**
   - Partial derivatives of the cost function with respect to $m$ and $c$ are:
     $$ \frac{\partial J}{\partial m} = -\frac{2}{n} \sum_{i=1}^{n} x_i \cdot \left( y_i - (mx_i + c) \right) $$
     $$ \frac{\partial J}{\partial c} = -\frac{2}{n} \sum_{i=1}^{n} \left( y_i - (mx_i + c) \right) $$

4. **Update the Parameters:**
   - Update $m$ and $c$ using the gradients and a learning rate ($\alpha$):
     $$ m_{\text{new}} = m_{\text{old}} - \alpha \cdot \frac{\partial J}{\partial m} $$
     $$ c_{\text{new}} = c_{\text{old}} - \alpha \cdot \frac{\partial J}{\partial c} $$
   - The learning rate determines the step size. A small $\alpha$ slows convergence, while a large $\alpha$ risks overshooting the minimum.

5. **Iterate Until Convergence:**
   - Repeat the update step until the cost function converges to a minimum or the change in parameters becomes negligible.

---

### **Visualizing Gradient Descent**
- **Cost Function Landscape:** 
  Gradient Descent aims to find the global minimum of the cost function (a bowl-shaped curve in linear regression).
- **Parameter Updates:** 
  Each iteration moves $m$ and $c$ closer to the values that minimize $J(m, c)$.

---

### **Advantages of Gradient Descent**
- Works well for large datasets.
- Efficient for convex cost functions like MSE in linear regression.

---

### **Variants of Gradient Descent**
1. **Batch Gradient Descent:** 
   Uses the entire dataset to compute gradients in each iteration. It is computationally expensive for large datasets.
2. **Stochastic Gradient Descent (SGD):** 
   Updates parameters for each data point, making it faster but more noisy.
3. **Mini-Batch Gradient Descent:** 
   Combines the benefits of batch and SGD by updating parameters using small batches of data.

---

## **Summary**
- The **best fit line** is determined by minimizing the cost function, typically MSE.
- **Gradient Descent** optimizes the line's parameters ($m$ and $c$) iteratively by following the gradient of the cost function until convergence.
