References

Test sizing - https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_power/bs704_power_print.html

Hypothesis testing - https://www.statsmodels.org/devel/generated/statsmodels.stats.proportion.proportions_ztest.html

Import packages and set random seed

In [1]:
import math
from random import randrange, seed

seed(5)

Function to simulate tossing a loaded coin, which lands on heads more often than usual

In [2]:
def loaded_coin_toss_simulator():
    seed = 4
    toss = randrange(2, 10)/10
    if toss >= 0.5:
        return 1
    else:
        return 0

As a first step, we run a few trials to understand what is the probability of heads when tossing a loaded coin. 

Ideally, the probability should be 0.5

However, if the coin lands on heads more often, then the probability will be higher than 0.5

In [3]:
tosses = []

for i in range(75):
    tosses.append(loaded_coin_toss_simulator())

print("Results of trial")
print(tosses)

Results of trial
[1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1]


In the above trial, we toss a coin 75 times to see how many heads we get. 

In [4]:
import numpy as np

average = np.mean(tosses)

print("The probability of this coin toss landing on heads is : ")
print(average)

p_test = average

The probability of this coin toss landing on heads is : 
0.5866666666666667


This is above 0.5

My hypothesis is that the coin is not fair, it's loaded.

Can this be a random occurrence?

Are 75 coin tosses sufficient to conclude that the coin is loaded?

We need to test.

Alternate hypothesis:
p(heads) > 0.5

Null Hypothesis:
p(heads) = 0.5

Our test is a one sided hypothesis test with one sample binary outcome

In [5]:
# Test sizing - https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_power/bs704_power_print.html

def sample_size(p1, p0, z_alpha=1.645, z_beta=0.84):
    es = (p1 - p0) / math.sqrt(p0 * (1 - p0))
    sample = pow(((z_alpha + z_beta)/es), 2)
    return sample

In [6]:
print("Number of coin tosses needed to guarantee statistically significant result at 95% significance and 80% power")
print(round(sample_size(p_test, 0.5)))

Number of coin tosses needed to guarantee statistically significant result at 95% significance and 80% power
206


In [7]:
from statsmodels.stats.proportion import binom_test, proportions_ztest

Try different sample sizes, see how much sample size is necessary to get a p-value of < 0.05

In [9]:
for i in range(10, 310, 10):
    coin_toss_for_test = []
    for j in range(0,i):
        coin_toss_for_test.append(loaded_coin_toss_simulator())
    p_value = binom_test(sum(coin_toss_for_test), len(coin_toss_for_test), 0.5, "larger")
    sample_size = len(coin_toss_for_test)
    avg = round(sum(coin_toss_for_test) / len(coin_toss_for_test), 2)
    # message = "Sample Size : " + str(sample_size) + " | avg heads : " + str(avg) + " | p-value : " + str(p_value)
    # print(message)
    print("Sample Size : %3d | avg heads : %5.2f | p-value : %5.5f" % (sample_size, avg, p_value))

Sample Size :  10 | avg heads :  0.50 | p-value : 0.62305
Sample Size :  20 | avg heads :  0.80 | p-value : 0.00591
Sample Size :  30 | avg heads :  0.57 | p-value : 0.29233
Sample Size :  40 | avg heads :  0.65 | p-value : 0.04035
Sample Size :  50 | avg heads :  0.62 | p-value : 0.05946
Sample Size :  60 | avg heads :  0.67 | p-value : 0.00674
Sample Size :  70 | avg heads :  0.51 | p-value : 0.45249
Sample Size :  80 | avg heads :  0.64 | p-value : 0.00916
Sample Size :  90 | avg heads :  0.69 | p-value : 0.00022
Sample Size : 100 | avg heads :  0.57 | p-value : 0.09667
Sample Size : 110 | avg heads :  0.59 | p-value : 0.03478
Sample Size : 120 | avg heads :  0.64 | p-value : 0.00122
Sample Size : 130 | avg heads :  0.66 | p-value : 0.00014
Sample Size : 140 | avg heads :  0.62 | p-value : 0.00255
Sample Size : 150 | avg heads :  0.67 | p-value : 0.00003
Sample Size : 160 | avg heads :  0.59 | p-value : 0.01079
Sample Size : 170 | avg heads :  0.65 | p-value : 0.00008
Sample Size : 

What we notice above is that we need to run the test with a minimum sample size in order to generate statistically significant results.

There is a chance that we ca get a statistically significant result with a sample smaller than what we determined.

However, that does not mean that statistical significance is guaranteed at that sample size.

If we run the test again, we may not get a statistically significant result at that sample size again.

However, when we pass the sample size threshold we calculated earlier, we do get statistically significant results on a consistent basis.

We can prove this by running the above code snippet again.