# Simple Pólya urn simulation

This notebook follows the discussion in chapters 4 and 10 of the book *Odds and ends* by Jonathan Weisberg, [available here](https://jonathanweisberg.org/vip/the-gamblers-fallacy.html#the-gamblers-fallacy).

We have heard about urns and about sampling those urns with and without replacement:


*   When sampling with replacement, when we take out a marble, we put it back in the urn.  
*   When sampling without replacement, when we take out a marble, we don't put it back in the urn.  

A Pólya urn involves a different type of procedure:  when we take out a marble, we put it back in the urn and put **one extra marble** of the same colour into the urn. So, the number of marbles in the urn keeps growing as we sample.

Run this simulation to see how the Pólya urn changes over time. You can change

```
 my_urn = ['B', 'W']
```

to internalize different urns.

In [4]:
import random

def polya(urn, sampling_num):

  for i in range(sampling_num):
    print("the urn now")
    print(urn)

    sampled_marble = random.choice(urn)
    print("we sampled a ", sampled_marble, " marble.")
    urn += sampled_marble

  return urn

my_urn = ['B', 'W']
polya(my_urn, 10)

the urn now
['B', 'W']
we sampled a  W  marble.
the urn now
['B', 'W', 'W']
we sampled a  W  marble.
the urn now
['B', 'W', 'W', 'W']
we sampled a  B  marble.
the urn now
['B', 'W', 'W', 'W', 'B']
we sampled a  W  marble.
the urn now
['B', 'W', 'W', 'W', 'B', 'W']
we sampled a  W  marble.
the urn now
['B', 'W', 'W', 'W', 'B', 'W', 'W']
we sampled a  W  marble.
the urn now
['B', 'W', 'W', 'W', 'B', 'W', 'W', 'W']
we sampled a  W  marble.
the urn now
['B', 'W', 'W', 'W', 'B', 'W', 'W', 'W', 'W']
we sampled a  W  marble.
the urn now
['B', 'W', 'W', 'W', 'B', 'W', 'W', 'W', 'W', 'W']
we sampled a  W  marble.
the urn now
['B', 'W', 'W', 'W', 'B', 'W', 'W', 'W', 'W', 'W', 'W']
we sampled a  W  marble.


['B', 'W', 'W', 'W', 'B', 'W', 'W', 'W', 'W', 'W', 'W', 'W']



What makes the Polya urn interesting? We can think of the Pólya urn as a simple model for inductive reasoning about some unknown proportion in nature, where the urn models our subjective estimation. Say (as usual) that you want to know the proportion of red to non-red squirrels in the University forest. Before going into the forest, you have a $0.5$ probability that the next squirrel you observe is a red. We can model this situation as the urn ```['red', 'not-red']```.

When you see a red squirrel, you become a bit more confident that there are more red squirrels in the forest: you change the urn to be ```['red', 'not-red', 'red']```. Now the probability of seeing a red squirrel is $\frac{2}{3}$. This is the same process as the Polya urn sampling process.

In fact, if you start with an urn with exactly one positive instance and one negative instance ```['positive','negative']```, and you update the urn in the as a Polya urn whenever you make an observation, then your probabilities will match those prescribed the **rule of succession**.

According to the rule of succession, after $n$ observation, $k$ of which are positive, and $n-k$ are negative, the probability that the $n+1$ observation is positive is $\frac{k+1}{n+2}$.

 From the urn perspective, you start with a ```['positive','negative']``` Polya urn. After $n$ observation, you add $n$ objects to your urn. So the urn size is $n+2$ (the urn started with two objects). Among the $n$ observations, $k$ of which are positive. So you will have $k+1$ positive objects (we started with one positive object). So the fraction of positive objects in the urn is $\frac{k+1}{n+2}$, just like the rule of succession.




