# Ng Machine Learning

#### Hypothesis and Linear Regression

Once a training set is fed to a learning algorithm, it generates an output function called <b> hypothesis </b> denoted by $h$. The job of hypothesis function is to take input, say the size of a house, and output its price.

Therefore, $h$ maps from $x$ to $y$ where $x$ is the independent variable (size of the house) and $y$ is the dependent variable (price of the house)

There are many ways to represent the hypothesis $h$:

$h_{\theta}(x)$ = ${\theta}_0 + {\theta}_1.x$

This means that $y$ or $h_{\theta}(x)$ is a linear function. This is called <b> Univariate Linear Regression </b> (or single variable linear regression).

#### Univariate Linear Regression and Cost Function

<b>m</b>: Number of training examples

In this equation: $h_{\theta}(x)$ = ${\theta}_0 + {\theta}_1.x$, the ${\theta}_is$ are called <b>parameters</b> of the model

${\theta}_0$ and ${\theta}_1$ help determine what the cost function would look like. 

Say for example if ${\theta}_0$ = 1.5 and ${\theta}_1$ = 0, we'll get a straight horizontal line.

So the idea is to get the most accurate ${\theta}_0$ and ${\theta}_1$ values so as to minimize the cost function

The idea is to choose ${\theta}_0$ and ${\theta}_1$ so that $h_{\theta}(x)$ is close to $y$ for our training examples $(x,y)$

So we want $$ J({\theta_0,\theta_1}) = \frac{1}{2m} \sum _{i=1}^m \left(h_\theta(X^{(i)})-Y^{(i)}\right)^2$$ (difference between the output of my hypothesis and the actual price of the house squared) to be as small as possible. $m$ here is the number of training examples. Multiplying it by half to make math easier and the $m$ is present to get average over all the training examples

Here $J({\theta_0,\theta_1})$ is the <b> squared error cost function </b>

Our goal is to minimize this cost function

#### Example with cost function:

Say our cost function is like so where ${\theta}_0 = 0$ and ${\theta}_1$ is the slope.

$h_{\theta}(x) = {\theta}_1.x$

So there are two key functions we're intersted in:

$(a)$ The hypothesis function $h_{\theta}(x)$ (which is a function of $x$ which is the size of the house) and


$(b)$ The cost function $J({\theta_1})$ (which controls the slope of our line through the points) is a function of our parameter ${\theta_1}$

So $h_{\theta}(x)$ will go through points $(1,1) (2,2) (3,3)$ and so on if I choose ${\theta_1}$ as $1$

Now what will the value of $J({\theta_1})$ (the cost function) be, if ${\theta_1} = 1$?

As a reminder, our cost function was: $$ J({\theta_0,\theta_1}) = \frac{1}{2m} \sum _{i=1}^m \left(h_\theta(X^{(i)})-Y^{(i)}\right)^2$$

Where in our case ${\theta_0}$ will be $0$.

So, since our ${\theta_1} = 1$, the value inside parenthesis will be 0 (since, predicted value = exact value), therefore, cost function = 0

Therefore, $J(1) = 0$

For each value of ${\theta_1}$, these were the values of $J({\theta_1})$:

<li>${\theta_1} = 1$ , $J({\theta_1}) = 0$ </li>
<li> ${\theta_1} = 0.5$ , $J({\theta_1}) = 0.583$</li>
<li> ${\theta_1} = 0$ , $J({\theta_1}) = 2.3$</li>
<li> ${\theta_1} = -0.5$ , $J({\theta_1}) = 5.25$</li>

So we want to choose the value of ${\theta_1}$ that minimizes $J({\theta_1})$, which is the first set in the list above: ${\theta_1} = 1$ , $J({\theta_1}) = 0$


#### Example with cost function:

Previous example was for a cost function where ${\theta}_0 = 0$ and ${\theta}_1$ 

If we have have a cost function where both thetas are non-zero, we get a cost function that is bowl shaped. It is also 3D. AKA contour plot.

#### Gradient Descent:

<b> Gradient Descent </b> is used to minimize the cost function. The ideas is that we start with some ${\theta}_0$ and ${\theta}_1$ 

This is similar to a person going toward the lowest point in the valley by taking small steps to points that'll take you down the fastest. 

Gradient descent formula: ${\theta}_j := {\theta}_j - {\alpha}\frac{\partial}{\partial {\theta}_j}J({\theta_0,\theta_1})$

Few things to note here:
<li> ${\alpha}$ is the learning rate ie how fast you want to descend </li>
<li> $\frac{\partial}{\partial {\theta}_j}J({\theta_0,\theta_1})$ is the derivative that tells you which direction to move in.