# The Supervised Machine Learning Set Up


###  General
  * $y$ -- target
      * the variable we are trying to predict 
      
  * $x$ -- feature
      * $x$ -- multiple features
      

### Dataset
  * $\mathcal{D_{train}} = \{ (x_0^0, x_1^0, \dots y^0),  (x_0^1, x_1^1, \dots y^1), \dots \})$ 


### Relationships
  * $y = f(x)$
      * you can calculate y from x
      
  * $\hat{y}$ -- the estimate for $y$
  
  * $\hat{y} = \hat{f}(x)$ -- the estimate for $f$
  
### Loss
  
  * $loss(\hat{y^i}, y^i)$ -- how wrong we are at each point; how bad an estimated *point*
  
      * $L = \sum loss(\hat{y}, y)$ -- total loss
          * how wrong the entire model is; how wrong *every* point is 
      

### An example Problem

 * $y$: film rating 
 * $x$: user's age 

In [4]:
def f_rating(x):
    return(0.08 * x + 0.5)

In [24]:
y0 = f_rating(21)

y0

2.1799999999999997

In [50]:
### Let's capture age and film rating into a Data Dictionary 
### Data Dictionary contains a pair of values -- key & value --> 
### in our problem we have age and then the value rating of the movie

{
    
  10  : f_rating(10), 
   0  : f_rating(0), 
   80 : f_rating(80)
}



{10: 1.3, 0: 0.5, 80: 6.9}

In [51]:
def fhat_rating(x):
    return 0.07 * x + 0.6 

In [52]:
yhat0 = fhat_rating(21)

yhat0 

2.0700000000000003

In [54]:
def loss_rating(yhat, y): 
    return(yhat - y) ** 2

In [55]:
loss_rating(yhat0, y0)

0.012099999999999875

In [56]:
Dtrain = [
    (10, 3), # (x, y) = (age, rating)
    (17, 3.1),
    (18, 4.2),
    (21, 5.6),
    (32, 7),
    (41, 7.5),
    (51, 8),
    (69, 8.5),
    (81, 9),
]

In [57]:
yhat = []
loss = []

for (x, y) in Dtrain: 
    prediction = fhat_rating(x)
    error = loss_rating(prediction, y)
    
    yhat.append(prediction)
    loss.append(error)
    

### Regression 

   * $y \in \mathbb{R}$   
```python 
  this means y is in a set of real numbers 
```
     
```python

   type(y) is float 
```

In [58]:
type(y) is float

False

### Classification

* binary classification 
    * e.g. likes or dislike 
    * $y \in \{ -1 , 1 \}$
    
* multiclass classification 
    * $y \in \{London, Leeds, Manchester, \dots \}$
    
* Classes require a numerical representation to arrive at computational solution
    * $y \in \{0, 1,2,3 \dots \}$ 

In [42]:
def f_classify(x):
    if x > 200:
        return -1 
    else:
        return +1
    

In [45]:
y = f_classify(100)


In [None]:
classes = {-1, 1} 
# this is a set of classes not a dictionary (is a set of numbers)

```python
  y in classes
```