## The Mathematics of Machine Learning in Python

## Part 1: Sketching the need for "Empirical Loss Minimization"

$X$

* hours asleep
* amount of rem
* caffine intake (ml)
* resting hr 


$y$

* sleep quality

In [24]:
def f_true(hours, rem, caffine, hr):
    a, b, c, d = 1/12, 1/100, 1/1000, 1/120
    
    return (a * hours + b * rem + c * caffine + d * hr)/4

In [25]:
f_true(7, 50, 100, 60)

0.42083333333333334

In [26]:
D = [((h, r, c, hr), f_true(h, r, c, hr)) for h in range(0, 12) for r in range(0, 100, 10) for c in range(0, 1000, 100) for hr in range(0, 120)]

In [27]:
len(D)

144000

In [28]:
from random import sample

In [29]:
Din = sample(D, 100)

In [30]:
def f1(hours, rem, caffine, hr):
    return 0.1 * hours + 0.1 * rem + 0.2 * caffine + 0.3 * hr


def f2(hours, rem, caffine, hr):
    return 0.01 * hours + 0.01 * rem + 0.02 * caffine + 0.03 * hr

In [31]:
yhat = [ ((h, r, c, hr), f1(h, r, c, hr), y) for ((h, r, c, hr), y) in Din ]

In [32]:
# yhat = [ ((h, r, c, hr), f2(h, r, c, hr), y) for ((h, r, c, hr), y) in Din ]

## Part 2: A Systematic Procedure

In [33]:
def fhat(hours, rem, caffine, hr, w=[0.1, 0.1, 0.1, 0.1, 0.25]):
        return (w[0] * hours + w[1] * rem + w[2] * caffine + w[3] * hr) * w[4]

In [34]:
fhat(10, 10, 10, 10)

1.0

In [35]:
w = [0.01, 0.01, 0.01, 0.01,  0.25]
fhat(10, 10, 10, 10, w)

0.1

In [36]:
from random import random


w = [random(), random(), random(), random(),  random()]

In [37]:
history = []


for i in range(0, 100):
    w[0] = i/10000
    
    total_loss = sum([ (fhat(h, r, c, hr, w) - y)**2 for ((h, r, c, hr), y) in Din ])
    
    history.append(( total_loss, w))

In [38]:
min(history)

(232609.18662593488,
 [0.0099,
  0.1162779824635809,
  0.5787788124643216,
  0.7471934998739974,
  0.14586547779776948])

## Exercise

We imagine there is a true function `true_profit()` which could provide a universe of all possible $(X, y)$, where `X` here is the age of a customer as they enter our store, and $y$ is the profit we make from them.

$D$ is infinite in size (ie., cardinality), but below we simulate 100 values.

$D_{in}$ is always of a small, fixed size (eg., even 1bn). This is a sample from $D$. 

Below you are given such a sample, the exercise is to determine a best function to model $D_{in}$, ie., to predict $y$ from $x$. 

* Strategies
    * define lots of functions and try each by computing predictions with them 
    * use a loop and a python-function with a parameter that can vary
    * etc. 
    
* Justify your choice of function. 

$X : \text{Age}$

$y : \text{Profit}$

In [39]:
def true_profit(x_age):
    return 0.1 * x_age + 0.50

In [118]:
D = [(age,  true_profit(age) + random()) for age in range(0, 100)]

In [119]:
Din = sample(D, 10)

In [120]:
Din

[(21, 3.567284000051815),
 (90, 10.183520659825815),
 (3, 1.5165393365489808),
 (38, 5.123303975978282),
 (36, 4.835294664987518),
 (48, 5.782461108524877),
 (22, 3.0753825299250557),
 (44, 4.965009393320488),
 (15, 2.230077918425267),
 (5, 1.151635185643644)]

In [121]:
from itertools import product

weights  = list(product([a/10 for a in range(-100, 100)], repeat=2))

def loss_square(w, dataset, yhat):
    return sum(w) + sum([( y - yhat ) ** 2 for (x, y), yhat in zip(dataset, yhat)])

history = [
    (loss_square(w, Din, [w[0]*x + w[1] for x, y in Din]), w) for w in weights
]

min(history)

---

In [90]:
coin = {'H', 'T'}

In [91]:
from itertools import product

In [105]:
three = set(product(coin, coin, coin))

In [107]:
three

{('H', 'H', 'H'),
 ('H', 'H', 'T'),
 ('H', 'T', 'H'),
 ('H', 'T', 'T'),
 ('T', 'H', 'H'),
 ('T', 'H', 'T'),
 ('T', 'T', 'H'),
 ('T', 'T', 'T')}

In [108]:
{ same for same in three if len(set(same)) == 1}

{('H', 'H', 'H'), ('T', 'T', 'T')}