# Lecture 4a

## Introduction to Functions, Counting in Numpy, and Accelerating Simulations

## Introduction to Functions

In [2]:
import random
faces=['T','H']

Consider the simulation from last time for determining the relative frequency of getting 6 or fewer heads on 20 flips of a fair coin:

In [3]:

num_sims=1000000
    flips=20

    event_count=0
    for sim in range(num_sims):
        coins=random.choices(faces, k=flips)
        num_heads=coins.count('H')
        if num_heads <= 6:
            event_count+=1

    print("Relative frequency of 6 or fewer heads is ~ ", event_count/num_sims)


Relative frequency of 6 or fewer heads is ~  0.057854


Let's consider how to further improve this code. 

We begin by turning this simulation into a function. New functions in Python are defined using the ```def``` keyword, followed by the function name, the arguments in parentheses, and then a colon. The commands to be run in the function follow in an indented block.

Note that it is helpful to know how to indent a whole block of code in Jupyter. Choose the Help->Keyboard Shortcuts menu and then look under the Edit Mode section for the Indent command. For instance, on the Mac, it is Command-].  When you want to turn a code block into a function, copy and paste it into a new cell and then indent it using the keyboard command. Then add the ```def``` statement above it.

It is easiest to understand through and example:

In [None]:
num_sims=1000000
flips=20

event_count=0
for sim in range(num_sims):
    coins=random.choices(faces, k=flips)
    num_heads=coins.count('H')
    if num_heads <= 6:
        event_count+=1
        
print("Relative frequency of 6 or fewer heads is ~ ", event_count/num_sims)


Now we can call the function by its name followed by parentheses. Since we have provided default values for all of the function's arguments, we do not have to even provide any arguments:

We can pass arguments to the function according to their position, keyword, or both. For instance, to only run 100k simultions, we can do either of the following:

Keyword arguments can appear in any order and can appear after positional arguments:

However, positional arguments cannot follow keyword arguments:

Now let's see how long it takes to run this function. We will use Jupyter's built-in ```%timeit``` magic:

If you have programmed in Matlab, you may have heard to avoid ```for``` loops because they slow everything down. The same is true in Python. Instead, we replace the lists with 2-D arrays, where one dimension is for the different dice, and the other dimension is for the different experiments.

Since we are creating an *array* of values, we will be generating 1s and 0s instead of 'H's and 'T's.  We will use the ```numpy.random``` submodule. It will be convenient to import it as ```npr```. We will also use other parts of ```numpy```, so we will import it as ```np```, as usual:

In [3]:
import numpy as np
import numpy.random as npr

Let's start by simulating flipping a fair coin 20 times again. Here we just randomly choose 20 random values that are equally likely to be 0 (representing tails) or 1 (representing heads). We use the ```randint()``` method:

Now, we can generate multiple rows like this by changing the size to a tuple. The tuple is interpreted as (rows, columns), so to conduct 5 simulations, we can do:

Next, we need to learn how to translate the simulated coin flips into the counts of the number of heads. We can do that by summing across the columns. The rows are dimension 0, and the columns are dimension 1. We can use numpy's sum method to carry out the sum over the columns as follows:

We can perform comparisons on numpy arrays, and it will compare every element:

If we sum over an array of True/False values, it will treat True as 1 and False as 0. Thus, we can count how many items satisfy some condition easily:

Now we are ready to put all of that into practice. Let's make a new function using these principles:

That is about a 25 times speed up!