# Simulation of random events with Python 

This notebook contains a function that can be used as a numerical simulation of the sampling from a bag filled with tiles  showing the numbers 1,2,3,4,5, or 6.

It can also be interpreted as a simulation of rolling six-sided dice.



In [None]:
# -*- coding: utf-8 -*-
"""
Created on Mon Feb 20 14:10:29 2017

@author: OET
"""
import numpy as np
def sample_from_population(n, loaded=False):
    """This function will simulate the tile sampling experiment.
    You can also think of it as simulator for rolling a 6-sided.
    
    Usage: res = sample_from_population(30) 
    Input parameters:
           n: the number of times to draw a tile from the population (or roll the die)
    Optional input parameters:
            loaded: either True of False 
            (False is default, creating uniform probability for all events)           
    Output: 
           a numpy array with the event numbers
           
           
    Last Update: 2019-03-28 by OET"""
           
    if loaded==True:
        # loaded dice simulation / tile population with one event having lower probability
        population = np.array([1,1,1,1,1,1,\
                               2,2,2,2,2,2,\
                               3,3,3,3,3,3,\
                               4,4,4,      \
                               5,5,5,5,5,5,\
                               6,6,6,6,6,6]) 
    else:
        # fair dice simulation / tile population with uniform (even) probability
        population = np.array([1,1,1,1,1,1,\
                               2,2,2,2,2,2,\
                               3,3,3,3,3,3,\
                               4,4,4,4,4,4,\
                               5,5,5,5,5,5,\
                               6,6,6,6,6,6,]) 
    #return events
    return population[np.random.randint(0,np.size(population),size=n)]


In [None]:
help(sample_from_population)

### Supporting code: Calculating the frequency of events (histograms)

In [None]:
# 30 trials sampling from the population of tiles / 30 times rolling a 6-sided die 

n=30

# store the summary result from one experiment
yfair=sample_from_population(n,loaded=False)
yloaded=sample_from_population(n,loaded=True)

# bin ranges
use_bins=np.arange(0.5,6.5+1,1) # [0.5, 1.5, ... ,6.5]
count_fair,index=np.histogram(yfair,bins=use_bins)


### Plotting the histogram

In [None]:
% matplotlib inline
import matplotlib.pyplot as plt
plt.hist(yfair,bins=use_bins,color="gold",edgecolor="purple",width=0.8)
plt.show()

---
### Breakout group work (30min)

Inform yourself with the help of the help-function, how the function *sample_from_population()* can be applied. 
 - What arguments does the function expect?
 - How can you use the keyword parameter to toggle between a population with equal probability for each event ('fair' population), and a population that has one event occur less frequently than the other five elementary events ('loaded' population).
 - Which event (which number) is has a lower probability than the other five events in the 'loaded' simulation.
 
 What is the function returning when you call it?
     What is the first object (type, values and what do they represent?)
     What is the second object (type, values and what do they represent?) 

Apply the function several times and study with the code below the frequency of the events.





### Programming activity:

Develop a loop that can repeat the simulation of the whole experiment 'sampling with replacement' (the experiment we had done manually with the real tiles):

- You want to repeat the whole experiment, as if we had 10, 20, or 1000 more days and each day you would repeat the sampling-with-replacement experiment with your chosen sample size (e.g. 30+30=60 trials). 

- Calculate the relative frequency for each event in the 'fair' and the 'loaded' experiment during each iteration.
- Calculate the difference between the relative frequency of the 'fair' and 'loaded' experiment.
- use a 2-dimensional array to save all results


### Tip: Decide first how often you want to repeat the whole experiments. Then create first a 2-dimensional empty array that can store the differences from each iteration. For example an array with 1000 rows and 6 columns can store all differences for thousand experiments. 

Alternatively, you can concentrate only on the suspected event number that has a lower probability in the loaded experiment. Start with a 1-dimensional array and assign the corresponding difference in the rel. frequency to that array.
