# [Machine Learning, Andrew Ng, Stanford](https://www.coursera.org/learn/machine-learning)

<br>


# What Is Machine Learning?
***A computer program is learning from experience E, with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measuread by P, improves with experience E***

**Example**: predicting weather using a large ammount of historical data:
1. *task* **T** = the weather prediction
2. *performance* **P** = the probability of correctly predicting the weather
3. *experience* **E** = the process of examining the large amount of historical data

## Supervised Learning
***In supervised learning we already know that there is a relationship between the input and the output and we have an idea regarding how the output should look***
### Regression
We are trying to map variables to some continuous function / value (predicting results with a continuos output / real-valued output)

**Example**: predict the weather, or the price of a product, or stocks, based on historical data
### Classification
We are trying to map input variables into discrete categories (predicting results in a discrete output)

**Example**: given a patient with a tumor, predict wether the tumor is malignant or benign (0, 1 values)
## Unsupervised Learning
***We have little to no ideea on how the results should look like. We can derive structure from data where we don't necessarily know the effect of the variables. Data can be clustered based on relationships among the variables in the data***

**Example**: Create groups (*Clustering*) from a large collection of genes that are related by diffrent variables, such as lifespan, location, roles, etc.

**Example**: The *Cocktail Party Algorithm* allows us to find structure in chaotic environments, like identifying individual voices and music from a mesh of sounds at a party.
# Linear Regression with one Variable
## Model Representation
$m$ = number of training examples (e.g. rows in a table of data)

$x$ = input variable / features (e.g. living area or size of a house)

$y$ = "output" variable / "target" variable (e.g. price of the house)

![model_repr.png](images/model_represent.png "Model Representation")

$(x, y)$ = one training example

$(x^{(i)}, y^{(i)}) = i^{th}$ training example. $i$ being an ***index*** into the data set (e.g. $(x^{(3)}, y^{(3)})$ is the data on the 3rd row in the table above - (153, 315))
### How a supervised learning algorithm works
If we feed a *training set* to a *learning algorithm*, it's job will be to output a function $h$ *(hypothesis)*. The *hypothesis* takes an input $x$ (e.g. the size of a house) and it tries to output the estimated value of $y$ (the estimated price of the house). 

![supervised_learning.jpg](images/supervised_learning.jpg "Supervised Learning")

In other words, $h$ is a function that maps from $x$'s to $y$'s
### Representing the hypothesis *$h$*
*When designing a learning algorithm we need to decide how do we represent the hypothesis h*

<span style="font-size:18px;">$h_\theta(x) =\theta_0 +\theta_1$</span>
(*note*: might show up as $h(x)$ without theta $\theta$)

Example: if we have we take a data set with housing prices, and put it on a *$x, y$ plane*, the *$y$ axis* representing size of houses and *$x$* representing the price, we are going to predict that *$y$ is a linear function of $x$* - prices increase with the size of the house:

<span style="font-size:18px;">$h$<sub>$\theta$</sub>$(x) =$ $\theta$<sub>$0$</sub> + $\theta$<sub>$1$</sub>$x$</span> (see the image below)

![representing_h.png](images/representing_h.png "Representing the hypothesis")

This particular model is named **Linear Regression With One Variable** (*$x$*), or **Univariate Linear Regression**.

## Cost Function (Squared error cost function)
The accuracy of $h$ can be measured using a **cost function** $J$. This function is the average difference of all the results of the hypothesis with inputs from $x$'s and the actual output $y$'s. This will help us pick the best possible straight line through our data.

The $\theta_{i}$'s (theta i's) in the hypothesis $h_\theta(x) =\theta_0 +\theta_1x$ are  the **parameters** of the model.
### Choosing the $\theta_{i}$'s
If $\theta_0 = 1.5$ and $\theta_1 = 0$, then the hypothesis will be a line parallel with the $x$ axis:

$h_\theta(x) = 1.5 + 0 \cdot x$

![theta0-1_5-theta1-0.png](images/theta0-1_5-theta1-0.png "Theta 0 = 1.5, Theta 1 = 0")

If $\theta_0 = 0$ and $\theta_1 = 0.5$, the hypothesis $h_\theta(x)$ will look like this:

$h_\theta(x) = 0 + 0.5\cdot x$

![theta0-0-theta1-0_5.png](images/theta0-0-theta1-0_5.png "Theta 0 = 0, Theta 1 = 0.5")

If $\theta_0 = 1$ and $\theta_1 = 0.5$, then $h_\theta(x)$ will look like this:

$h(x) = 1 + 0.5\cdot x$

![theta0-1-theta1-0_5.png](images/theta0-1-theta1-0_5.png "Theta 0 = 1, Theta 1 = 0.5")


In linear regression we want to come up with values for the parameters $\theta_0$ and $\theta_1$ so that the the straight line data fits the data well (passes through most of the data - see the image below):

![good_theta0_1.png](images/good_theta0_1.png "Good values for Theta 0 and 1")

We are going to choose the parameters for $\theta_0$ and $\theta_1$ so that the hypothesis is as close as possible to $y$ in our training examples $(x, y)$ - $x$ being the size of the house and $y$ it's value.

So, in our case, with linear regression we will solve a minimization problem:

<p style="text-align: center;"><span style="font-size:18px;">$\underset{\theta_0,\, \theta_1}{\text{minimize}}$</span></p>

We want to find the values of $\theta_0$ and $\theta_1$ so that the average between the *predictions (hypothesis) of the trainig set* minus the *actual values of the houses in the training set* is **minimized**

For this, we will have to **minimize** the *cost function* $J$:

<p style="text-align: center;"><span style="font-size:18px;">$\underset{\theta_0,\, \theta_1}{\text{minimize}}\, J(\theta_0,\, \theta_1)$</span></p>

And the *cost funtion* $J$ is:

<p style="text-align: center;"><span style="font-size:18px;">$J(\theta_0,\, \theta_1) = \frac{1}{2m}\displaystyle\sum_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2$</span></p>

The *cost function* formula is explained like this: in the training set $m$, we want to find the square diffrence between the output of the hypothesis $h_\theta(x^{(i)})$ - where the size $x$ of the house is considered as input - and the actual price of the house $y^{(i)}$. The hypothesis $h_\theta(x^{(i)})$ is calculated as $\theta_0 + \theta_1x^{(i)}$, as shown earlier
