# Large-sample approximate confidence intervals for proportions

**date**
: 2021-04-16

**data**
: `accidents.csv`

**Module.function**
: `statsmodels.stats.weightstats.proportion.proportion_confint`

**ref**
: Computer book B, Activity 18

**desc**
: Calculate a large-sample confidence interval for a proportion.
The notebook begins with some processing to acquire data in the right format.

## Import the modules

In [1]:
import src.load
from statsmodels.stats.proportion import proportion_confint
import seaborn as sns
import matplotlib.pyplot as plt

## Import the data

In [2]:
sample = src.load.accidents()

In [3]:
sample.head()

Unnamed: 0,Accidents
0,0
1,1
2,0
3,0
4,0


## Approximate confident intervals of proportions

We require an estimate of the proportion of workers who experience one or more accidents.
Declare local variables for the number of successes and total observations.


In [4]:
# count number of entries where accidents != 0
x = sample.query('Accidents != 0').count()[0]

In [5]:
# count the number of rows in totals
n = sample["Accidents"].size

Now use `proportion_confint()` to calculate a 99% $z$-interval of a proportion.

In [6]:
proportion_confint(
    count=x,
    nobs=n,
    alpha=0.01
)

(0.22787583995834104, 0.3421724692204029)

In [7]:
help(proportion_confint)

Help on function proportion_confint in module statsmodels.stats.proportion:

proportion_confint(count, nobs, alpha=0.05, method='normal')
    confidence interval for a binomial proportion
    
    Parameters
    ----------
    count : int or array_array_like
        number of successes, can be pandas Series or DataFrame
    nobs : int
        total number of trials
    alpha : float in (0, 1)
        significance level, default 0.05
    method : {'normal', 'agresti_coull', 'beta', 'wilson', 'binom_test'}
        default: 'normal'
        method to use for confidence interval,
        currently available methods :
    
         - `normal` : asymptotic normal approximation
         - `agresti_coull` : Agresti-Coull interval
         - `beta` : Clopper-Pearson interval based on Beta distribution
         - `wilson` : Wilson Score interval
         - `jeffreys` : Jeffreys Bayesian Interval
         - `binom_test` : experimental, inversion of binom_test
    
    Returns
    -------
    ci_l