### Linear Regression: Cost Function and Its Role

In linear regression, one of the first essential steps is defining the **cost function**, which evaluates how well our model is performing. This function helps us adjust the model to improve its predictions.

#### The Training Set

We start with a **training set** that contains:

- Input features $ x $
- Output targets $ y $

The model we're going to use is a linear function:

$$
f_{w,b} (x) = wx + b
$$

Here, $ w $ and $ b $ are called the **parameters** of the model. These parameters are adjusted during training to improve the model's predictions.

- $ w $ is referred to as the **weight** or **coefficient**, and it determines the slope of the line.
- $ b $ is the **bias** or **intercept**, and it determines where the line crosses the y-axis.

---

![Example](images/TrainingSet.png)

---

#### Effect of Parameters $ w $ and $ b $

The values of $ w $ and $ b $ affect the model's output in the following ways:

- When $ w = 0 $ and $ b = 1.5 $, the function $ f(x) $ is constant:

  $$
  f(x) = 0 \cdot x + 1.5 = 1.5
  $$

  This creates a **horizontal line** at $ y = 1.5 $.

- When $ w = 0.5 $ and $ b = 0 $, the function is:

  $$
  f(x) = 0.5x
  $$

  The line passes through the origin (0, 0), and the slope is $ 0.5 $.

- When $ w = 0.5 $ and $ b = 1 $, the function is:

  $$
  f(x) = 0.5x + 1
  $$

  The line has a slope of $ 0.5 $ and crosses the y-axis at $ y = 1 $.

---

![My Image](images/LinearGraph.png)

---

#### Training Set Example

Let’s assume a dataset with **m** examples, each consisting of an input $ x^i $ and target $ y^i $. For a given input $ x^i $, the model predicts $ \hat{y}^i $ using the linear function $ f\_{w,b}(x^i) = wx^i + b $.

#### The Goal of Linear Regression

The objective is to find values of $ w $ and $ b $ such that the predicted values $ \hat{y}^i $ are as close as possible to the true target values $ y^i $ for all the training examples.

#### Cost Function: Measuring the Error

The **cost function** measures how far off the predictions $ \hat{y}^i $ are from the true values $ y^i $. The error for each prediction is:

$$
\text{Error} = \hat{y}^i - y^i
$$

To quantify this error, we square the difference, and for each example $ i $, the squared error is:

$$
\text{Squared Error} = (\hat{y}^i - y^i)^2
$$

#### Total and Average Error

To measure the overall error across all training examples, we sum the squared errors:

$$
\text{Total Squared Error} = \sum_{i=1}^m (\hat{y}^i - y^i)^2
$$

Since the size of the training set (denoted as $ m $) influences the total error, we normalize this by dividing by $ m $ to get the **average squared error**:

$$
\text{Average Squared Error} = \frac{1}{m} \sum_{i=1}^m (\hat{y}^i - y^i)^2
$$

To make the formula cleaner, we add a division by 2:

$$
J(w, b) = \frac{1}{2m} \sum_{i=1}^m (\hat{y}^i - y^i)^2
$$

This is the **cost function** $ J(w, b) $, also called the **squared error cost function**. The reason it's called squared error is that we're squaring the difference between the predicted and actual values.

#### Why the Division by 2?

The division by 2 doesn't change the behavior of the cost function but simplifies later calculations, particularly when performing optimization to minimize $ J(w, b) $.

#### Interpretation of the Cost Function

The cost function $ J(w, b) $ measures how well the model's predictions match the true targets. The goal in linear regression is to adjust the parameters $ w $ and $ b $ so that the cost function $ J(w, b) $ is as small as possible.

- If $ J(w, b) $ is **large**, it means the model's predictions are far from the actual values.
- If $ J(w, b) $ is **small**, the model’s predictions are close to the actual values.

---

![Example](images/CFF.png)

---

In the next steps, we will use optimization techniques, such as **gradient descent**, to adjust $ w $ and $ b $ to minimize $ J(w, b) $.
