### Cost Function Explanation

In this lecture, we’re exploring the cost function in detail and how it helps find the best parameters for a model, especially in linear regression. Let's break it down step by step.

#### Recap: The Linear Model

You want to fit a straight line to your training data. Your model is defined as:

$$
f_{w, b}(x) = wx + b
$$

where:

- $ w $ and $ b $ are the model parameters (weights and bias, respectively),
- $ x $ is the input feature, and
- $ f\_{w, b}(x) $ is the predicted output.

For different values of $ w $ and $ b $, you get different straight lines, and you want to find the values of $ w $ and $ b $ that minimize the difference between the predicted values $ f\_{w, b}(x) $ and the actual values $ y $.

#### What the Cost Function Does

The cost function $ J(w, b) $ measures the difference between the model's predictions and the true values. The goal is to minimize this cost function. We want to find $ w $ and $ b $ such that the cost is as small as possible:

$$
\min_{w, b} J(w, b)
$$

In this video, we'll simplify the model and look at how the cost function works when we only have one parameter $ w $ and set $ b = 0 $.

#### Simplified Model: $ b = 0 $

For simplicity, let's assume that the bias term $ b $ is set to zero, so the model becomes:

$$
f_w(x) = wx
$$

This leaves us with just one parameter $ w $. The cost function now depends only on $ w $, and is defined as the sum of squared errors between the predicted values and the true values:

$$
J(w) = \frac{1}{2m} \sum_{i=1}^{m} \left(f_w(x^i) - y^i\right)^2
$$

where:

- $ m $ is the number of training examples,
- $ f_w(x^i) $ is the predicted value for the $ i $-th training example,
- $ y^i $ is the true value for the $ i $-th training example.

Our goal is to find the value of $ w $ that minimizes this cost function $ J(w) $.

#### Visualizing the Cost Function

Let’s now visualize how the cost function changes as we choose different values for $ w $. We’ll plot both the function $ f_w(x) $ and the cost function $ J(w) $.

1. **For $ w = 1 $**:

   - The model $ f_w(x) = x $ fits the data perfectly for the points (1, 1), (2, 2), and (3, 3), as the line passes exactly through all of them.
   - The cost function $ J(1) = 0 $, as there is no error between the predicted values and the actual values.

2. **For $ w = 0.5 $**:

   - The model $ f_w(x) = 0.5x $ now underestimates the values for each point.
   - The squared errors for each point are:
     - For $ x = 1 $, the predicted value is $ 0.5 $, so the error is $ (0.5 - 1)^2 = 0.25 $.
     - For $ x = 2 $, the predicted value is $ 1 $, so the error is $ (1 - 2)^2 = 1 $.
     - For $ x = 3 $, the predicted value is $ 1.5 $, so the error is $ (1.5 - 3)^2 = 2.25 $.
   - The total cost $ J(0.5) $ is the sum of these errors:

   $$
   J(0.5) = \frac{1}{2 \times 3} \left(0.25 + 1 + 2.25\right) = 0.58
   $$

3. **For $ w = 0 $**:

   - The model $ f_w(x) = 0 $ is just a horizontal line at $ y = 0 $, and the errors are quite large.
   - The squared errors for each point are:
     - For $ x = 1 $, the error is $ (0 - 1)^2 = 1 $,
     - For $ x = 2 $, the error is $ (0 - 2)^2 = 4 $,
     - For $ x = 3 $, the error is $ (0 - 3)^2 = 9 $.
   - The total cost $ J(0) $ is:

   $$
   J(0) = \frac{1}{2 \times 3} \left(1 + 4 + 9\right) = 2.33
   $$

#### How the Cost Function Relates to the Line

As we change $ w $, the line $ f_w(x) $ changes its slope, and the cost function $ J(w) $ reflects how well the model fits the data. The closer the predicted values are to the true values, the smaller the cost $ J(w) $.

- When $ w = 1 $, the model fits the data perfectly, and the cost is zero.
- As $ w $ moves away from 1, the cost increases because the model's predictions get further from the true values.

#### Minimizing the Cost Function

To find the optimal value for $ w $, we need to minimize the cost function $ J(w) $. This can be done using optimization techniques like gradient descent. The value of $ w $ that minimizes the cost function will give us the best fit for the data.

In this example, the value of $ w = 1 $ results in the smallest cost and a perfect fit to the data.

#### Summary

- The cost function $ J(w) $ measures the difference between the model's predictions and the actual values.
- We use the cost function to find the best parameters (in this case, $ w $) that minimize the error.
- When the model fits the data well, the cost is small.
- The goal of linear regression is to minimize $ J(w) $ to find the best fit line.
