# Day03 Assignment

Submit this notebook back to me via Slack with your comments/annotations on the code and the results, along with your interpretation of the results and answers the questions at the end of each part.

When you submit, make sure your notebook's filename is: `[FirstName]-[LastName]_Assignment-03.ipynb`

# Power analysis and power curves

In [None]:
## Imports
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from scipy import stats

## Part 1 - Calculating power and generating a power curve for detecting unfair coins

The code below implements an experimental design to measure the bias of a given coin:
1. Flip the given coin `num_flips` times  
2. Record the number of heads  
3. Compare to the null distribution  
4. Get a `p_value`  
5. Reject or accept the null based on comparison to `alpha`  

In [None]:
num_flips = 20   ## sample size
alpha     = 0.05 ## significance threshold

_Put your notes here!_

In [None]:
# ! Add comments next to each code chunk to describe the data analysis steps ! #

####################################
# Setting up the null distribution #
####################################

num_permutations = 10000

fair_num_heads = []
for x in range(num_permutations):
    num_heads = 0
    for i in range(num_flips):
        if random.random() <= 0.5:
            num_heads += 1
    fair_num_heads.append(num_heads)

plt.hist(fair_num_heads, bins = 10, edgecolor = "k")

_Put your notes here!_

In [None]:
# ! Add comments next to each code chunk to describe the data analysis steps ! #

# A new coin
coin_bias = 0.8 ## effect size

## Perform the experiment once to test if this coin is biased
num_heads = 0 
for i in range(num_flips):
    if random.random() <= coin_bias:
        num_heads += 1


plt.hist(fair_num_heads, bins = 10, edgecolor = "k")
plt.axvline(num_heads, color = "red", lw = 2)
print(num_heads, "heads in", num_flips, "flips\n")

number_above_experiment_val = 0
for i in range(num_permutations):
    null_observation      = abs((num_flips/2) - fair_num_heads[i])  
    experiment_observation = abs((num_flips/2) - num_heads)
    if experiment_observation < null_observation:
        number_above_experiment_val += 1

p_value = number_above_experiment_val / len(fair_num_heads)

if(p_value < alpha):
    print("P-value is", p_value, "<", alpha, "\nReject the null hypothesis.\nThe coin is biased!")
else:
    print("P-value is", p_value, "≥", alpha, "\nAccept the null hypothesis.\nThe coin is unbiased!")

_Put your notes here!_

In [None]:
# ! Add comments next to each code chunk to describe the data analysis steps ! #

## Calculate the power of this experiment
num_null_rejects = 0
for num_experiments in range(1000):
    num_heads = 0
    for i in range(num_flips):
        if random.random() <= coin_bias:
            num_heads += 1

    number_above_experiment_val = 0
    for i in range(num_permutations):
        null_observation      = abs((num_flips/2) - fair_num_heads[i])  
        experiment_observation = abs((num_flips/2) - num_heads)
        if experiment_observation < null_observation:
            number_above_experiment_val += 1

    p_value = number_above_experiment_val / len(fair_num_heads)
    
    if p_value <= alpha:
        num_null_rejects += 1 

estimated_power = (num_null_rejects / 1000)
print("The power is", estimated_power)

**Question 1:**  
Define the power you just obtained in terms of this specific experiment.

_Write your answer here._

In [None]:
# ! Add comments next to each code chunk to describe the data analysis steps ! #

# Power curve

biases = np.arange(0.0, 1.01, 0.01)
effectsize_estimatedpower = []

for coin_bias in biases: 
    
    num_null_rejects = 0
    for num_experiments in range(1000):
        
        num_heads = 0
        for i in range(num_flips):
            if random.random() <= coin_bias:
                num_heads += 1

        number_above_experiment_val = 0
        for i in range(num_permutations):
            
            null_observation      = abs((num_flips/2) - fair_num_heads[i])  
            experiment_observation = abs((num_flips/2) - num_heads)
            
            if experiment_observation < null_observation:
                number_above_experiment_val += 1

        p_value = number_above_experiment_val / len(fair_num_heads)

        if p_value <= alpha:
            num_null_rejects += 1 
  
    estimated_power = (num_null_rejects / 1000)
  
    effectsize_estimatedpower.append([coin_bias, estimated_power])

_Put your notes here!_

In [None]:
## No need to add comments to this chunk that's just making the plot
X = np.array(effectsize_estimatedpower)
plt.figure(figsize = (12,7))
plt.plot(X[:,0], X[:,1], "ko-")
plt.xlabel("Effect Size")
plt.ylabel("Estimated Power")

**Question 2:**  
What does this power curve tell you?

_Write your answer here._

## Part 2 - Generating multiple power curves for detecting unfair coins

Here you will be generating multiple power curves to establish the relationship between power, effect size, and sample size. You will notice that much of the code  above will be reused to generate the curve like the one above but for various sample sizes.

In [None]:
# ! Add comments next to each code chunk to describe the data analysis steps ! #

alpha = 0.05 ## significance threshold

num_permutations = 10000
flips = [5, 10, 50, 100]

samplesize_fairnumheads = []

for num_flips in flips:
    fair_num_heads = []
    for x in range(num_permutations):
        num_heads = 0
        for i in range(num_flips):
            if random.random() <= 0.5:
                num_heads += 1
        fair_num_heads.append(num_heads)
    
    samplesize_fairnumheads.append([num_flips, fair_num_heads])

_Put your notes here!_

In [None]:
## No need to add comments to this chunk that's just making the plot
fig, axarr = plt.subplots(nrows = 1, ncols = len(flips), figsize = (16,5))
for i in range(len(axarr)):
    axarr[i].set_title(samplesize_fairnumheads[i][0], fontsize = 16)
    axarr[i].hist(samplesize_fairnumheads[i][1], bins = 10, edgecolor = "k")

**Question 3:**  
What are your observations on how the null distribution changes with sample size?

_Write your answer here._

In [None]:
# ! Add comments next to each code chunk to describe the data analysis steps ! #
# NOTE: This part might take quite a while to run. Please be patient.

biases = np.arange(0.0, 1.01, 0.01)
samplesize_effectsize_estimated_power = []
for num_flips in flips:
    print(num_flips, "flips")
    ## Get null distribution for sample size
    fair_num_heads = [samplesize_fairnumheads[i][1] for i in range(len(flips)) if samplesize_fairnumheads[i][0] == num_flips][0]
    
    for coin_bias in biases: 
        num_null_rejects = 0
        for num_experiments in range(1000):
            num_heads = 0
            for i in range(num_flips):
                if random.random() <= coin_bias:
                    num_heads += 1

            number_above_experiment_val = 0
            for i in range(num_permutations):
                null_observation      = abs((num_flips/2) - fair_num_heads[i])  
                experiment_observation = abs((num_flips/2) - num_heads)
                if experiment_observation < null_observation:
                    number_above_experiment_val += 1

            p_value = number_above_experiment_val / len(fair_num_heads)

            if p_value <= alpha:
                num_null_rejects += 1 

        estimated_power = (num_null_rejects / 1000)
        samplesize_effectsize_estimated_power.append([num_flips, coin_bias, estimated_power])

_Put your notes here!_

In [None]:
## No need to add comments to this chunk that's just making the plot
df = pd.DataFrame.from_records(samplesize_effectsize_estimated_power, columns = ["sample size", "coin bias", "estimated power"])
fig, axarr = plt.subplots(nrows = 1, ncols = len(flips), figsize = (20,5))
for i in range(len(axarr)):
    subset_df = df[df["sample size"] == flips[i]]
    axarr[i].plot(subset_df["coin bias"], subset_df["estimated power"], "ko-")
    axarr[i].set_xlabel("Effect Size")
    axarr[0].set_ylabel("Estimated Power")

In [None]:
## No need to add comments to this chunk that's just making the plot
df = pd.DataFrame.from_records(samplesize_effectsize_estimated_power, columns = ["sample size", "coin bias", "estimated power"])

plt.style.use('seaborn-white')

shapes=['o','^','s','p']
palette = plt.get_cmap('Set1')

plt.figure(figsize = (12,7))

for i in range(len(axarr)):
    subset_df = df[df["sample size"] == flips[i]]
    plt.plot(subset_df["coin bias"], subset_df["estimated power"], marker = shapes[i], color=palette(i+1), linewidth=1, label=flips[i])

plt.legend(loc="lower right")
plt.xlabel("Effect Size")
plt.ylabel("Estimated Power")

**Question 4:**  
What are your interpretations of these curves? Write your thoughts in terms of the dependence of power on both effect size and sample size.

_Write your answer here._

**Question 5:**  
Say a national sport organization comes to you and says that they want to design an experiment to detect biased coins so that they can eliminate them and use only unbiased coins for pre-game tosses. How would you use the power analysis/curves above to go about helping them design the experiment? In your description, include questions you would want to know answers to before you can offer them reasonable recommendations in terms of sample size?

_Write your answer here._

**Question 6:**  
If you make a specific sample size recommendation, write down what you will convey to organization in terms of error rates (i.e. false positive rate and false negative rate).

_Write your answer here._

**Question 5:**  
Which parts of your reasoning and recommendations change if they say that they cannot tolerate more than 1 biased coin for every 100 coins they end up using?
(Hint: This means changing alpha from 0.05 to 0.01. Makes sense?)

_Write your answer here._