# Python for Machine Learning

## The (Supervised) Machine Learning Set Up

### General
#### target
the variable we are trying to predict
$y$ 

the estimate 
$\hat{y}$ -- 

#### feature(s)    
a single feature...

$x$ 

multiple features...
$X$ 

#### datasets
the training dataset (in-sample)

$\mathcal{D_{train}} = \{ (x_0^0, x_1^0, \dots y^0),  (x_0^1, x_1^1, \dots y^1) \dots \}$

#### relationships
you can calculate $y$ from $x$ ...

$y = f(x)$
    
the estimate for $f$    
$\hat{y} = \hat{f}(x)$ -- 


#### loss

how bad an estimated *point*...

$loss(\hat{y^i}, y^i)$

total loss
how wrong *every* point is

$L = \sum loss(\hat{y}, y)$ -- 

### An Example Problem

* $y$ : a film rating
* $x$ : a user's age

In [1]:
def f_rating(x):
    return 0.08 * x + 0.5

In [25]:
y0 = f_rating(21)

In [26]:
y0

2.1799999999999997

In [9]:
{
    10  : f_rating(10),
    0   : f_rating(0), 
    80  : f_rating(80)
}

{10: 1.3, 0: 0.5, 80: 6.9}

In [12]:
def fhat_rating(x):
    return 0.07 * x + 0.6

In [28]:
yhat0 = fhat_rating(21)

In [29]:
yhat0

2.0700000000000003

In [17]:
abs( fhat_rating(21) - f_rating(21) ) 

0.10999999999999943

In [18]:
def loss_rating(yhat, y):
    return (yhat - y) ** 2

In [30]:
loss_rating(yhat0, y0)

0.012099999999999875

In [31]:
Dtrain = [
    (10, 3), # (x, y)
    (17, 3.1),
    (18, 4.2),
    (21, 5.6),
    (32, 5.6),
    (41, 7),
    (70, 7.5),
]

In [39]:
yhat = []
loss = []

for (x, y) in Dtrain:
    prediction = fhat_rating(x)
    error = loss_rating(prediction, y)
    
    yhat.append(prediction)
    loss.append(error)

---

Regression

* $y \in \mathbb{R}$

```python
type(y) is float
```

In [43]:
type(y) is float

True

Classification

* binary classification
    * eg., Like vs Dislike
    * $y \in \{ -1, +1 \}$

* multiclass classification
    * $y \in \{London, Leeds, Manchester, \dots \}$

Classes require a numerical representation to arrivate a computational solution...

$y \in \{0, 1, 2, \dots \}$

In [44]:
def f_classify(x):
    if x > 200:
        return -1 
    else:
        return +1

In [46]:
y = f_classify(180)

In [47]:
classes = {-1, +1}

y in classes

True

## Unsupervised Learning

$X = x_1, x_2, \dots$

Clustering

* goal: find P(x_1, x_2, ...)

* finding patterns between features
* how features are (jointly) distributed
* how probable are certain observations (of particular features) with respect to each other
* eg., how likely is it to see a young person watching a long film?
    * P(x_1 is Young, x_2 is Long)
    * $P( 12 < x_1 < 18, 120 < x_2 < 360)$
        
        
Dimensionality Reduction (compression)

* len(Xold) < len(Xnew), and, info(Xold) ~= info(Xnew)

* reducing the number of columns in $X$
* either by...
    * using new column (that summarize/encode the original columns)
    * deleting columns 