# Monte Carlo (Biased Coin Toss)

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/phunc20/biblio/blob/main/people/aurelien_geron/homl/07-ensemble_learning/notebooks/04.05.coin_toss_monte_carlo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
</table>

In [1]:
import sys

In [None]:
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
    %pip install matplotlib==3.6.1 numpy==1.23.1 \
         scikit-learn==1.1.2 scipy==1.9.0 tqdm==4.64.0

For those readers who are familiar with Monte Carlo methods,
they could safely skip this notebook.  
But, should they choose to stay, perhaps the most valuable thing
they could possibly learn is the `__new__` method of Python class.

If they've also, unluckily, already known about `__new__`, and
if they're busy, then skipping this notebook entirely might be a wiser decision than keeping on reading.

## Bernoulli Distribution

In mathematics, people name _distribution_ which behaves
like _coin toss_ after the family name of
Bernoulli, several of whose family members made great achievements in
physics and in math in the 17th and 18th centuries.

We won't dive into introducing what a distribution is or
what a random variable is. For us, it suffices to know that a _random
variable with Bernoulli distribution_ is simply sth which will turn out to be
- either $1$
- or $0$

just like a coin will only turn out to face up either Head or Tail,  
and that there is a constant probability $p \in [0, 1]$ such that

$$
\begin{aligned}
  P(X=1) &= p \\
  P(X=0) &= 1-p
\end{aligned}
$$

(For convenience, one can think of Head as $1$ and Tail as $0$.)

**Rmk.** If, at the end of this notebook, you find yourself becoming interested in random
variables or probability theory in general, which I find is of little chance though,
there are many online resources to dig in deeper. For example, the French school of
probability is particular famous, cf. e.g. [Jean-François Le Gall](https://www.imo.universite-paris-saclay.fr/~jean-francois.le-gall/IPPA2.pdf)

## Simulation

Too much theory. Let's go back to code.

Basically, we are simply going to simulate a lot of coin tosses
and see if we could obtain
- $75\%$ with $1,000$ tosses
- $97\%$ with $10,000$ tosses

In [1]:
import random

In [2]:
class Bernoulli(int):
    def __new__(self, p=0.51):
        """
        think of p as the probability of turning up Head
        """
        #return int(np.random.uniform() < p)
        return random.uniform(0,1) < p

**Rmk.** For more info on `__new__`, cf. e.g.
[YouTuber mCoding](https://www.youtube.com/watch?v=-zsV0_QrfTw)

### Unit Test(s)

Let's use the following function as a unit test for our
`Bernoulli` class: We expect to obtain `head_freq` close to
`p`.

In [3]:
def get_head_freq(*, n_tosses=10_000, p=0.51):
    Xs = [Bernoulli(p) for _ in range(n_tosses)]
    n_heads = sum(Xs)
    return n_heads/n_tosses

In [4]:
get_head_freq()

0.5039

In [5]:
p = 0
while p < 1:
    print(f"{p = :.1f}")
    print(f"{get_head_freq(p=p) = }")
    print()
    p += 0.1

p = 0.0
get_head_freq(p=p) = 0.0

p = 0.1
get_head_freq(p=p) = 0.1

p = 0.2
get_head_freq(p=p) = 0.2026

p = 0.3
get_head_freq(p=p) = 0.3055

p = 0.4
get_head_freq(p=p) = 0.4011

p = 0.5
get_head_freq(p=p) = 0.4949

p = 0.6
get_head_freq(p=p) = 0.5947

p = 0.7
get_head_freq(p=p) = 0.6935

p = 0.8
get_head_freq(p=p) = 0.8033

p = 0.9
get_head_freq(p=p) = 0.8985

p = 1.0
get_head_freq(p=p) = 1.0



Come back to Monte Carlo.

In [6]:
from tqdm.auto import tqdm

In [7]:
def get_freq_more_heads_Bernoulli(
    *,
    n_tosses=1_000,
    p=0.51,
    n_experiments=100,
):
    """
    In each experiment, we toss a biased coin n_tosses of times.
    """
    experiments = []
    for _ in tqdm(range(n_experiments)):
        Xs = [Bernoulli(p) for _ in range(n_tosses)]
        S = sum(Xs)
        more_heads_than_tails = S > n_tosses/2
        experiments.append(more_heads_than_tails)
    return sum(experiments)/n_experiments

In [8]:
get_freq_more_heads_Bernoulli()

  0%|          | 0/100 [00:00<?, ?it/s]

0.69

In [9]:
get_freq_more_heads_Bernoulli(n_experiments=1000)

  0%|          | 0/1000 [00:00<?, ?it/s]

0.718

In [10]:
get_freq_more_heads_Bernoulli(
    n_tosses=10_000,
    p=0.51,
    n_experiments=200,
)

  0%|          | 0/200 [00:00<?, ?it/s]

0.985

In [11]:
get_freq_more_heads_Bernoulli(
    n_tosses=10_000,
    p=0.51,
    n_experiments=1000,
)

  0%|          | 0/1000 [00:00<?, ?it/s]

0.975

## Another Simulation (Binomial Distribution)

Still in the style of Monte Carlo method, we could go a step
more high level and choose to use binomial distribution instead
of Bernoulli distribution. After all, a binomial distribution
simply tries to capture the probability of obtaining $k$ Heads in
$n$ coin tosses.

In [12]:
import numpy as np

In [13]:
def get_freq_more_heads_binomial(
    *,
    n_tosses=1_000,
    p=0.51,
    n_experiments=100,
):
    Ss = np.random.binomial(n_tosses, p, size=n_experiments)
    return np.sum(Ss > n_tosses/2) / n_experiments

In [14]:
get_freq_more_heads_binomial()

0.75

In [15]:
get_freq_more_heads_binomial(n_tosses=10_000)

0.98

## Q&A

**(?1)** Isn't `random.uniform(0, 1) < p` an instance of `bool`? Why `Bernoulli`'s `__new__` returns a `bool` instead of an `int`?

**(R1)** `bool` is a subclass of `int` in Python. Indeed,

In [16]:
isinstance(True, int), isinstance(False, int)

(True, True)

In [17]:
True == 1, False == 0

(True, True)

In [18]:
True == 2

False

In [19]:
isinstance(True, bool), isinstance(False, bool)

(True, True)

In [20]:
issubclass(bool, int)

True

**(?2)** Why couldn't we have defined our `Bernoulli` class as follows?
```python
class Bernoulli(int):
    def __new__(cls, p=0.51):
        return super().__new__(cls)

    def __init__(self, p=0.51):
        self = random.uniform(0,1) < p
```

**(R2)** Let's try to run it to better understand it.

In [21]:
class Bernoulli(int):
    def __new__(cls, p=0.51):
        return super().__new__(cls)

    def __init__(self, p=0.51):
        self = random.uniform(0,1) < p

In [22]:
for p in np.linspace(0, 1, num=11):
    print(f"{p = :.1f}")
    print(f"{Bernoulli(p) = }")
    print()

p = 0.0
Bernoulli(p) = 0

p = 0.1
Bernoulli(p) = 0

p = 0.2
Bernoulli(p) = 0

p = 0.3
Bernoulli(p) = 0

p = 0.4
Bernoulli(p) = 0

p = 0.5
Bernoulli(p) = 0

p = 0.6
Bernoulli(p) = 0

p = 0.7
Bernoulli(p) = 0

p = 0.8
Bernoulli(p) = 0

p = 0.9
Bernoulli(p) = 0

p = 1.0
Bernoulli(p) = 0



As You can see, such a definition always gives `0`. This is because
- `__new__` initializes the instance to `0`
- Since Python integers are immutable, once initialized to `0` in `__new__`, `__init__` couldn't further modify it.