In [10]:
import pandas as pd
import numpy as np
from scipy.stats import norm

data = pd.read_csv('data/data.csv', ',')

# <font color=green>Sample size calculation</font>
***

## <font color = 'red'> Problem </font>

We are studying the monthly income of heads of households with an income of up to R$\$$ 5,000.00 in Brazil. Our supervisor determined that the ** maximum error in relation to the average is R$\$$ 10.00 **. We know that the ** population standard deviation ** of this group of workers is ** R$\$$ 1,082.79 **. For a ** 95% confidence level **, what should the sample size of our study be?

## <font color = green> Quantitative variables and infinite population </font>
***

# $$e = z \frac{\sigma}{\sqrt{n}}$$

#### With known standard deviation

## $$n = \left (z\frac{\sigma}{e}\right) ^ 2 $$

#### With unknown standard deviation

## $$n = \left (z\frac {s} {e} \right) ^ 2 $$

Where:

$ z $ = standard normalized variable

$\sigma $ = population standard deviation

$ s $ = sample standard deviation

$ e $ = inferential error

### <font color = 'red'> Observations </font>

1. The standard deviation ($ \ sigma $ or $ s $) and the error ($ and $) must be in the same unit of measure.

2. When the error ($ and $) is represented in percentage terms, it must be interpreted as a percentage related to the average.

## <font color = 'blue'> Example: Average Income </font>

We are studying the monthly income of heads of households in Brazil. Our supervisor determined that the ** maximum error in relation to the average is R $ \ $$ 100.00 **. We know that the ** population standard deviation ** of this group of workers is ** R $ \ $$ 3,323.39 **. For a ** 95% confidence level **, what should the sample size of our study be?

In [11]:
0.95 / 2

0.475

In [12]:
0.5 + (0.95 / 2)

0.975

In [13]:
z = norm.ppf(0.975)
z

1.959963984540054

### Getting $\sigma$

In [15]:
sigma = 3323.39
sigma

3323.39

### Getting $e$

In [17]:
e = 100
e

100

### Getting $n$

In [19]:
n = (z * (sigma / e)) ** 2
int(n.round())

4243

## <font color='red'>Problem</font>

In a batch of ** 10,000 cans ** of soda, a simple random sample of ** 100 cans ** was performed and the ** sample standard deviation of the contents of the cans equal to 12 ml ** was obtained. The manufacturer stipulates a ** maximum error on the population average of only 5 ml **. To guarantee a ** 95% confidence level ** which sample size should be selected for this study?

## <font color = green> Quantitative variables and finite population </font>
***

#### With known standard deviation

## $$n = \frac {z ^ 2 \sigma ^ 2 N} {z ^ 2 \sigma ^ 2 + e ^ 2 (N-1)} $$

#### With unknown standard deviation

## $$n = \frac {z ^ 2 s ^ 2 N} {z ^ 2 s ^ 2 + e ^ 2 (N-1)} $$

Where:

$N$ = population size

$ z $ = standard normalized variable

$ \sigma $ = population standard deviation

$ s $ = sample standard deviation

$ e $ = inferential error

## <font color = 'blue'> Example: Soft drink industry </font>

In a batch of ** 10,000 cans ** of soda, a simple random sample of ** 100 cans ** was performed and the ** sample standard deviation of the contents of the cans equal to 12 ml ** was obtained. The manufacturer stipulates a ** maximum error on the population average of only 5 ml **. To guarantee a ** 95% confidence level ** which sample size should be selected for this study?

### Getting $N$

In [41]:
N = 10000
N

10000

### Getting $Z$

In [39]:
z = norm.ppf(0.5 + (0.95/2))
z

1.959963984540054

### Getting $s$

In [43]:
s = 12

### Getting $e$

In [44]:
e = 5

### Getting $n$

## $$n = \frac{z^2 s^2 N}{z^2 s^2 + e^2(N-1)}$$

In [48]:
n = ((z ** 2) * (s ** 2) * N) / (((z ** 2) * (s ** 2)) + ((e ** 2) * (N -1)))
int(n.round())

22