## Introductory experiment: drawing cards from two stacks

- Imagine a game with **two stacks of cards**
- Each stack contains **winning cards** and **blanks**.
- You have to decide **which stack has more wins**
- How oftes do you have to draw pairs of cards (one from each stack)?

In [1]:
from IPython.core.display import HTML

In [2]:
# setup notebook
%matplotlib inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('svg')
from cards_code import *

### Draw cards from two different stacks, one from each stack at a time

In [3]:
interactive_experiment(seed=0, initial_cards=1)

HBox(children=(Output(), Button(description='Draw Cards', layout=Layout(margin='25px'), style=ButtonStyle())),…

You can keep on drawing cards for as long as you like and the results keep on changing!

**Spoilers below!**
.

.

.

Scroll down when you are finished drawing cards

.

.

.

.

.

.

.

.

.

.

.

.

.

**Spoilers below!**

### What can we know after drawing a certain number of cards?

In [None]:
interactive_experiment(seed=0, initial_cards=25)

### When have we drawn enough cards to be certain?

In [None]:
interactive_experiment(seed=0, initial_cards=100)

### It can take a while to see which stack is better

In [None]:
stack1, stack2 = draw_cards(
    p_win_1 = 0.5, # winning probability for stack 1
    p_win_2 = 0.4, # stack 2 is 20% worse than stack 1!
    n_cards = 150,
    seed     = 0
)

plot_wins(stack1, stack2);

### Repeat the experiment
- Draw 25 cards from each stack
- Calculate each stack's winning probability = number of wins / 25
- Repeat 100 times

### Each repetition yields a different winning probability for each stack. 

In [None]:
means = repeated_experiment_means(
    p_win_1  = 0.5,
    p_win_2  = 0.4,
    n_cards  = 25,
    n_repeats = 100,
    seed      = 1
)
fig, ax = plot_experiments(means)
fig.tight_layout(pad=2)

- A **histogram** can quantify how often we observed a certain outcome
- The **standard deviation (std.)** over the repetitions expresses this uncertainty in one number
- It corresponds to the **standard error** of the mean obtained in a single experiment.

### A single small experiment (25 cards)

- **Error bars** show the **standard error** of a single experiment
- Here they **overlap** - the difference betwen the stacks is smaller than the uncertainty
- **We can't decide** which stack is better

In [None]:
stack1, stack2 = draw_cards(
    p_win_1 = 0.5,
    p_win_2 = 0.4,
    n_cards = 25,
    seed     = 0
)
plot_p_win(stack1, stack2);

### An bigger experiment (100 cards)

- The **error bars don't overlap**
- We can say the difference is **statistically significant**
- We can be very certain that stack 1 has more wins

In [None]:
# add a star here. explain significance

In [None]:
stack1, stack2 = draw_cards(
    p_win_1 = 0.5,
    p_win_2 = 0.4,
    n_cards = 1000,
    seed     = 0
)
plot_p_win(stack1, stack2);

# How to calculate these quantities? 
See next section `practical basics`!

In [None]:
# export to slideshow AFTER saving the notebook
#!jupyter nbconvert --execute 1_cards.ipynb --to slides --no-input
# --ExecutePreprocessor.store_widget_state=True