# Cost Function for Linear Regression

The cost equation provides a measure of how well your predictions match your training data. Loss is the error for a single data point, while cost is the average loss over the entire training dataset. 

Minimizing the cost can provide optimal values of $w$, $b$.

$$ J(w, b) = \frac{1}{2n} \sum_{i=1}^{n} \left( f_{w,b}(x^{(i)}) - y^{(i)} \right)^2 $$



In [1]:
%matplotlib ipympl
import numpy as np
import matplotlib.pyplot as plt

from utils_common import generate_data
from utils_grad_dec import plt_intuition
plt.style.use('ggplot')

In [2]:
data_set = np.sort(generate_data(1, 2, 300, 500, 10, 1))

x_train = data_set[0]  #features
y_train = data_set[1]  #target value

## Computing Cost


The code below calculates cost by looping over each example. In each loop:
- `f_wb`, a prediction is calculated
- the difference between the target and the prediction is calculated and squared.
- this is added to the total cost.

In [3]:
def compute_cost(x, y, w, b): 
    """
    Computes the cost function for linear regression.
    Args:
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    Returns
        total_cost (float): The cost of using m,b as the parameters for
        linear regression to fit the data points in x and y
    """
    # number of training examples
    n = x.shape[0] 
    
    cost_sum = 0 
    for i in range(n): 
        f_wb = w * x[i] + b   
        cost = (f_wb - y[i]) ** 2  
        cost_sum = cost_sum + cost  
    total_cost = (1 / (2 * n)) * cost_sum  

    return total_cost

## Cost Function Intuition

Below, use the slider control to select the value of $w$ that minimizes cost. It can take a few seconds for the plot to update.

In [4]:
plt_intuition(x_train,y_train)

interactive(children=(IntSlider(value=150, description='w', max=450, min=-150), Output()), _dom_classes=('widg…