# Calculating the required sample size

## Setup the notebook

In [1]:
# import the packages
from scipy.stats import norm
from math import ceil

In [2]:
# declare functions
def sample_size_one_tail(diff, power, alpha, sd_alt):
    var_alt = sd_alt**2
    diff_sq = diff**2
    q = norm().ppf(1-alpha)
    q_alt = norm().ppf(1-power)
    return ceil((var_alt/diff_sq) * (q - q_alt)**2)

In [3]:
def sample_size_two_tail(diff, power, alpha, sd_alt):
    var_alt = sd_alt**2
    diff_sq = diff**2
    q = norm().ppf(1-alpha/2)
    q_alt = norm().ppf(1-power)
    return ceil((var_alt/diff_sq) * (q - q_alt)**2)

## Introduction

## Scenario one

*reference: PEx 9.48*

A breakfast cereal manufacturer is concerned about the effectiveness of the packaging process for one of its cereals.
Boxes should nominally contain 750 grams of the cereal and it is known, through extensive investigation in the past, that the weight of cereal in a box is approximately normally distributed with standard deviation 3 grams. 

The manufacturer decides to investigate the packaging process by weighing the contents of a random sample of filled cereal boxes.
A two-sided hypothesis test at the 5% significance level will be performed to detect a mean difference of 0.75 grams in weight of cereal.

The manufacturer has some flexibility over the sample size that can be used for the test.
What is the minimum sample size required in order for the power to be at least 0.9?

In [4]:
# declare parameters
d = 0.75
test_power = 0.9
a = 0.05
sd = 3

In [5]:
# calculate required sample size
sample_size_two_tail(
    diff=d, power=test_power, alpha=a, sd_alt=sd)

169

## Scenario two

*reference: PEx 9.50*

An engineer is planning an experiment to test the null hypothesis that walls insulated with a traditional material have the same level of dampness as those insulated with a new type of material.
Past experience with the traditional material has shown that the level of dampness of walls insulated with the traditional material is normally distributed with mean 14.6 units and standard deviation 0.7 units.
The engineer is prepared to assume that the dampness levels of walls treated with the new material will also have a normal distribution with standard deviation 0.7 units. 

The engineer wishes to measure the dampness of a random sample of walls treated with the new material in order to carry out a two-sided test with a 5% significance level: she wishes to detect a difference in the mean dampness level of 0.4 units.

What is the minimum sample size required in order for the power to be at least 0.9?

In [6]:
# declare parameters
d = 0.4
test_power = 0.9
a = 0.05
sd = 0.7

In [7]:
# calculate required sample size
sample_size_two_tail(
    diff=d, power=test_power, alpha=a, sd_alt=sd)

33

## Scenario three

*reference: PEx 9.49*

A massive outbreak of food-borne illness in a city was attributed to Salmonella enteritidis.
Epidemiologists determined that the source of the illness was ice-cream.
They intend to take samples from production runs from the company that had produced the ice-cream to determine the average level of Salmonella enteritidis.
It is thought likely that the standard deviation of the average level of Salmonella enteritidis in the sampled ice-cream will be 0.28 MPN/g (in accordance with similar cases investigated in the past).
The epidemiologists will perform a one-sided test at the 10% significance level to determine if the average level of Salmonella enteritidis in the company’s produced ice-cream exceeds 0.3 MPN/g (a level considered very dangerous).
Suppose the true value of $\mu$, the mean level of Salmonella enteritidis, was 0.36 MPN/g.

What is the minimum number of samples of ice-cream required in order for the power to be at least 0.8?

In [8]:
# declare parameters
d = 0.06
test_power = 0.8
a = 0.1
sd = 0.28

In [9]:
# calculate required sample size
sample_size_one_tail(
    diff=d, power=test_power, alpha=a, sd_alt=sd)

99