# Large-sample confidence intervals for a difference between proportions

**date**
: 2021-04-16

**data**
: `skulls.csv`

**Module.function**
: `statsmodels.stats.proportion.confint_proportions_2indep`

**ref**
: Computer book B, Activity 20

**desc**
: Calculating large-sample confidence intervals for a difference between proportions.
There is some light data processing to gather the information needed to invoke the function.

## Import the modules

In [1]:
import src.load
from statsmodels.stats.proportion import confint_proportions_2indep

## Import the data

In [2]:
sample = src.load.sewer()

In [3]:
sample.head()

Unnamed: 0,Immunity,Age,Exposure,Children
0,0,41.43,2,2
1,0,51.79,2,2
2,1,41.12,2,2
3,1,29.95,3,2
4,1,40.78,2,2


## Approximate confidence interval for the difference between two proportions

We want to estimate the difference between the proportion infected among sewerage workers without children and the proportion infected among sewerage workers with children, together with a 95% confidence interval for this difference.

Declare and initialise a number of diffrent local variables.

In [4]:
# get immunity & children, immunity=1, children=2
children = sample.query('Immunity == 1 & Children == 2').count()[0]

# get count children
size_children = sample.query('Children == 2').count()[0]

In [5]:
# get immune no children, immunity=1, children=1
no_children = sample.query('Immunity == 1 & Children == 1').count()[0]

# get count no children
size_no_children = sample.query('Children == 1').count()[0]

Note the default actual argument for formal agument `method` is `newcomb`, but this does not return the result expected in **M248.**

In [6]:
confint_proportions_2indep(
    count1=children,
    nobs1=size_children,
    count2=no_children,
    nobs2=size_no_children,
    method="wald"
)

(0.1582197356059404, 0.38825404377923145)

In [7]:
help(confint_proportions_2indep)

Help on function confint_proportions_2indep in module statsmodels.stats.proportion:

confint_proportions_2indep(count1, nobs1, count2, nobs2, method=None, compare='diff', alpha=0.05, correction=True)
    Confidence intervals for comparing two independent proportions
    
    This assumes that we have two independent binomial samples.
    
    Parameters
    ----------
    count1, nobs1 : float
        Count and sample size for first sample.
    count2, nobs2 : float
        Count and sample size for the second sample.
    method : str
        Method for computing confidence interval. If method is None, then a
        default method is used. The default might change as more methods are
        added.
    
        diff:
         - 'wald',
         - 'agresti-caffo'
         - 'newcomb' (default)
         - 'score'
    
        ratio:
         - 'log'
         - 'log-adjusted' (default)
         - 'score'
    
        odds-ratio:
         - 'logit'
         - 'logit-adjusted' (default)
  