# Lecture 4

## Random Experiments and Simulations

Consider the following questions:

1) If you flip a coin 20 times, how many times do you think it will come up heads?

2) If you flip the coin 20 times and it comes up heads 6 times, do you think it is a fair or unfair coin? How confident can you be in your answer?

Can you conduct an experiment to answer these questions? 

What problems may you encounter in conducting this experiment? How can we overcome these problems?

1) We expect the answer will be about 10 because the coin will come up heads one-half the time

2) We don't know yet how to answer this

If we take a fair coin and flip it 20 times and count the number of heads, and then repeat that experiment many times, we can estimate how often 6 or fewer heads occurs. If it occurs very rarely (say, less than 5% of the time, then we can say that the coin is unlikely to be fair)

*Here we use 6 or fewer heads because if 5 heads occurs, that is an even  more extreme outcome than 6 heads occuring, and so we want to count up how often we see an outcome as extreme OR MORE as 6 heads occurring*

The problem is that we may need to repeat the experiment (of flipping the coin 20 times) many times to accurately estimate how often 6 or fewer heads come up. This may require thousands of coin flips

We can overcome this problem by using a computer to flip the coin in a **simulation**. A **computer simulation** is a computer program that models reality and allows us to conduct experiments that:
* would require a lot of time to carry out in real life
* would require a lot of resources to carry out in real life
* would not be possible to repeat in real life (for instance, simulation of the next day's weather or stock market performance)

Let's build simulations of our coin flip experiment and learn about some Python libraries

In [1]:
import random

In [1]:
faces=['H','T']

Suppose we want to see how often 6 or fewer heads occurs. We can reduce the printing by only printing those extreme events:

We really don't care about the particular experiment on which those events occur. Instead, we are really just looking at how often these events occur. Let's add a counter to our simulation:

Let's visualize this data using a bar graph:

In [2]:
import matplotlib.pyplot as plt
%matplotlib inline

**Definition** The **relative frequency** of an event is the number of times that an event occurs divided by the number of times the experiment is conducted. 

Let $N$ denote the number of experiments, and $K$ denote the number of possible outcomes. 

Let $a_i$ denote the number of times the $i$th outcome is observed.

Then 
$$ \sum_{i=1}^{K} a_i =N $$

Let $r_i$ denote the relative frequency of outcome $i$. Then
\begin{align} \sum_{i=1}^{K} r_i &= \sum_{i=1}^{K}
\frac{a_i}{N} \\
&= \frac{1}{N} \sum_{i=1}^{K} a_i \\
&=\frac{1}{N} N = 1
\end{align}



We still haven't answered our question about whether the coin is fair. Often in simulation, we don't care about determinging the relative frequencies for all the outcomes. We only care about determining the probability of some event. Let's modify the experiment to calculate the relative frequency of getting 6 or fewer heads on 20 flips of a fair coin:

* How does the relative frequency change with the number of experiments simulated?

* For 1M simulated experiments, what is the relative frequency of 6 or fewer heads?

* What is your conclusion about whether this could be a fair coin?

* For 1M simulated experiments, what is the relative frequency of 4 or fewer heads?

This is an example of **binary hypothesis testing**. In this case, we set up two hypothesis:

$H_0$: (the *null hypothesis*) is that the observed effect is just caused by randomness in the sampling. It is not real in the underlying system or data. For this exampe, our null hypothesis is that the coin is actually fair

$H_1$: (the *alternative hypothesis*) is that the observed effect is not just caused by random sampling. In this example, the coin is biased toward Tails.

In classical statistics/hypothesis testing, we say that an effect is statistically significant if the probability of observing an effect of that size under the null hypothesis is smaller than some small value $p$. Typical values of $p$ are 0.05 or 0.01, but many argue for even smaller values now. **The threshold to determine statistical significance must always be determined before the experiment is conducted -- otherwise, there is too much temptation to adjust the threshold based on the observed $p$-value.**

In classical hypothesis testing, we do *not* test the alternative hypothesis directly, nor can we utilize side information that we may already have about the two hypotheses

**Definition** The **probability** of an event is a number between 0 and 1 that quantifies how likely that event is to occur. An event that cannot occur has probability 0, and an event that is sure to occur has probability 1. The probabilities of the outcomes sum to 1.

**Definition** We say an experiment is **fair** if every outcome is equally likely.

Consider a fair experiment with $N$ outcomes, and let $p_i$ denote the probability of outcome $i$, then

$$
\sum_{i=1}^{N} p_i = 1 \\
\sum_{i=1}^{N} p_1 = 1 \\
Np_1=1\\
p_1 = \frac 1 N\\
p_i = \frac 1 N\\
$$

So, for instance, the probability of getting any number on a fair die is 1/6. Let's compare these to the relative frequencies: