# Lecture 02 - Is Learning Feasible?

In [194]:
import math
import numpy as np
import pandas as pd
import seaborn as sns

### Hoeffding's Inequality

<img src="img/hoeffding-equation.PNG" />

When Hoeffding's Inequality equations is true, than, in a probabilistic point of view, when we can assume that the mean of a **sample A** obtained from a population is probably approximately equal to the population mean.

#### Single bin

In [25]:
# Creating the population.
bin = np.random.randint(2, size=1000)
mi = bin.mean()
print(mi)

# Choosing the sample.
nu = np.random.choice(bin, 400, replace=True).mean()
print(nu)

0.498
0.5375


In [79]:
error = 0.05
bad_event = 0

for i in range(10):
    bad_event = 0

    for j in range(1000):
        random_nu = np.random.choice(bin, 400, replace=False).mean()
        if abs(random_nu - mi) > error:
            bad_event += 1

    print(f"Round {i+1}: frequency of bad events equals {bad_event} of 1000")

Round 1: frequency of bad events equals 7 of 1000
Round 2: frequency of bad events equals 10 of 1000
Round 3: frequency of bad events equals 9 of 1000
Round 4: frequency of bad events equals 9 of 1000
Round 5: frequency of bad events equals 9 of 1000
Round 6: frequency of bad events equals 9 of 1000
Round 7: frequency of bad events equals 12 of 1000
Round 8: frequency of bad events equals 8 of 1000
Round 9: frequency of bad events equals 14 of 1000
Round 10: frequency of bad events equals 10 of 1000


In [33]:
# Applying Hoeffding's Inequality equation.
max_prob = 2 * math.e**(-2 * error**2 * 400)

print(f"Hoeffding's threshold of bad events: {int(max_prob * 100)} of 1000")

Hoeffding's threshold of bad events: 27 of 1000


#### Multiple Bins (coin example)

In [195]:
event_count = 0

for i in range(10000):
    coin_toss = np.random.randint(2, size=10)
    if coin_toss.sum() == 10:
        event_count += 1

print(f"All heads: {event_count} of 10000 ({round(event_count/100, 1)}%) rounds.")

All heads: 11 of 10000 (0.1%) rounds.


In [196]:
event_count = 0

for i in range(10000):
    df = pd.DataFrame(data=np.random.randint(0, 2, size=(1000, 10)))
    df['all_heads'] = df.sum(axis=1)
    if df['all_heads'].max() == 10:
        event_count += 1

print(f"All heads: {event_count} of 10000 ({round(event_count/100, 1)}%) rounds.")

All heads: 6309 of 10000 (63.1%) rounds.
