# Introduction to Machine Learning

## What is Machine Learning?

Computational Statistical Inference

With a dataset ($D : (x, y)$), can we obtain a rule ($R : y = \hat{f}(x))$, by which can infer conclusion $ C : \text{Yes} \,\, \text{if} y > 0 $? 

Consider a film recommendation dataset which has $(Age, Like?)$ ...

Can we use this to determine a cutoff for $Age$ at which we should recommend the film?

In [5]:
d = [(10, +1), (30, -1), (25, +1)]

age = float(input("Age? "))

if age >= 25:
    print("Recommend!")
else:
    print("Consider another film!")

Age? 25
Recommend!


## Aside: What data do you need for ML?

Historical observations: $(x, y)$

* $x$ -- known *features* (, fields, columns, variables)
* $y$ -- unknown prediction *target*

$y$ is unknown at *prediction time*, but we require it in the historical data. 

* $Default = \hat{f}(Balance, ...)$
* $y = f(x..)$

* $...Balance, Default$

Everything in your historical data *must* be known. 

## What types of Machine Learning is there?



* Supervised Learning $(x, y)$
    * Regression -- "trending"
        * Rating = Age <- information about y from x
        * $y \in \mathbb{R}$ ie., float 
        * eg., Price, Profit, Age, Rating...
    * Classification -- "labelling"
        * eg., $y \in \{+1, -1\} $ ie., int
        * Yes|No, London|Manchester|Leeds, Fraud|NotFraud
* Unsupervised Learning ($x, ...$)
    * eg., mean(Age) <- information about x
    * Clustering
        * Do observations reflect groups?
            * eg., do we see:
                * Young & Leeds, Old & London ?
    * Compression
        * $x_{big} \rightarrow x_{small}$ 
    
    
Supervised Learning is whenever we have a historical dataset $(x, y)$ which contains a $y$, ie., the thing we are trying to predict. 

Supervised Learning is prediction using historical data. 

Unsupervised Learning is any other sort of statistical inference. Typically we are just analysing patterns in known (certain) data, and not looking to predict anything.



## What is Supervised Learning?

$\mathcal{D} = (x_0, x_1, \dots y)$

Find: $\hat{y} = \hat{f}(x_0, x_1, \dots)$

Given:  $l(a, b) = \hat{f}(x; a, b) - y$

## How does a machine find $\hat{f}$ ?

$f(x) = ax + b$

ie., how does it find $(a, b)$

One method: try lots of $(a, b)$ at random... and consider the dataset $D$, and look at the predictions made using $(a, b)$ and keep the closest. 

## How does Supervised Learning work?

In [32]:
dataset = [
    (1, 10), 
    (3, 20)
]

Suppose we try some random (a, b)

In [33]:
a = 5
b = 1

In [34]:
for (x, y) in dataset:
    print("Prediction:", a * x + b )
    print("Actual:",  y )
    print("Loss:", y - (a * x + b))
    print()

Prediction: 6
Actual: 10
Loss: 4

Prediction: 16
Actual: 20
Loss: 4



In [35]:
a = 6
b = 3

total_loss = 0
for (x, y) in dataset:
    # add to the total loss
    total_loss += y - (a * x + b)
                      # actual - prediction
total_loss

0

In [38]:
dataset

[(1, 10), (3, 20)]

In [37]:
for (x, y) in dataset:
    print("Prediction:", a * x + b )
    print("Actual:",  y )
    print("Loss:", y - (a * x + b))
    print()

Prediction: 9
Actual: 10
Loss: 1

Prediction: 21
Actual: 20
Loss: -1



## Aside: Improving our loss

Above, notice that we are told we have a $0$ error, depsite mispredicting **two** points (all of our points)!

In [44]:
a = 6
b = 3

total_loss = 0
for (x, y) in dataset:
    total_loss += (y - (a * x + b)) ** 2
                    
total_loss

2

## How does a machine use a loss?

We decide the $loss$, then the machine uses the dataset to minimize it. 

## Exercise

Consider the following dataset:

In [45]:
loans = [
    (1000, +1), # loan_amount, settled
    (2000, -1),
    (500,  +1)
]

### Q1. Predictions

Using the test `if amount > a` print, `1` if it passes otherwise `-1`

HINT: choose `a = 500`

### Q2. Compute Loss

How bad is this rule?

Loop over all loans, use your test above, to compute a prediction for loan. 

Compare this prediction with the historical answer. 

Print out the prediction and the loss for every point.


### Q3. Compute the total loss

Add up the loss for all points. 

HINT: in the exercise above, our prediction is computed using an `if`, not using a formula `a*x+b`. 