# How to Build an Interactive Confidence Interval Calculator in Python

## Building a re-usable interactive and batch calculator for normal and binomial confidence intervals in Python and Jupyter Notebook

![avel-chuklanov-DUmFLtMeAbQ-unsplash.jpg](attachment:avel-chuklanov-DUmFLtMeAbQ-unsplash.jpg)

Photo by <a href="https://unsplash.com/@rechaoktaviani?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">recha oktaviani</a> on <a href="https://unsplash.com/s/photos/calculator?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
  

### Background
It is very common that we can obtain a sample of values, maybe a sample of 200 users clicking on our new web-site page or a sample of 30 voters in an exit poll on their way out of the voting station.

In order to make a statement about what these observations we need to use confidence intervals rather than just quoting the percentage or result of the trial.

A Confidence Interval is a range of values we are fairly sure our true value lies in i.e. if we asked 30 voters who they voted for and 75% said "Candidate A" what is the range of votes cast for Candidate A across the entire electorate?

If you would like to know more about confidence intervals, this online resource is a great place to start - https://www.mathsisfun.com/data/confidence-interval.html, but it is not essential to know any more that the background information to make confidence intervals work for you.

### The Idea
I had used confidence intervals several times in work situations to make a statement about the likelihood of something happening based on sample observations and every time I either created the formulae from scratch again in Excel or wrote the Python code again.

I decided to write an interactive confidence interval calculator in Jupyter that I could just re-use next time I needed to do the same thing which led to the code and to this article

### Let's Get Started!
To begin with we need to import some libraries which we are going to use to build our calculator ...

In [1]:
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import scipy.stats as stats
import numpy as np
import pandas as pd

pd.set_option("max_colwidth", 150)

### A Quick Refresher on the Formulae
The formulae for confidence intervals look scary but as a non-statistician I can say that they are not too horrendous. If you would like a more in-depth review of how they work please take a look at another of my articles that goes through the detail - https://grahamharrison-86487.medium.com/hypothesis-testing-in-python-made-easy-7a60e8c27a36.

To review the essentials though there are just two formulae we are going to implement in Jupyter and Python, one for normal distributions and one for binomial distributions. 

The formula for normal distribution will be used for all normal distributions, for example the distribution of male shoe size in the UK or the distribution of end-of-year test scores for a group of students studying an online course.

The formula for binomial distributions will be used where there is a simple binary outcome e.g. 75% of voters surveyed voted for Candidate A and 25% did not.

Here is the formula for calculating confidence intervals for a normal distribution ...

$$CI = \bar{x} \pm Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}$$

where ...
- $CI$ = the confidence interval
- $\bar{x}$ = the mean of the sample 
- $Z_{\alpha/2}$ = the z-score for the required confidence interval e.g. 95% confidence = 1.96 z-score
- $\sigma$ = the standard distribution of the sample
- $n$ = the sample size

... and here is the formula for binomial ...

$$CI = \hat{p} \pm Z_{\alpha/2} \sqrt{ \frac{\hat{p}(1-\hat{p}}{n}  }$$

where ...
- $CI$ = the confidence interval
- $\hat{p}$ = is the proportion of successes in a Bernoulli trial process
- $Z_{\alpha/2}$ = the z-score for the required confidence interval e.g. 95% confidence = 1.96 z-score
- $n$ = the sample size

So with those nasty formulae out of the way, let's get on with the comparatively straight forward task of putting them to work and create a couple of Python functions to implement them ...

### Implementing the Formulae in Python

In [2]:
def normal_distribution_ci(confidence, x_bar, sigma, n):
    z_score = stats.norm.interval(confidence)[1]
    sigma_over_root_n = sigma / np.sqrt(n)
    ci = [x_bar - z_score * sigma_over_root_n, x_bar + z_score * sigma_over_root_n]
    return ci

def binomial_distribution_ci(confidence, p_hat, n):
    z_score = stats.norm.interval(confidence)[1]
    rhs = z_score * np.sqrt(p_hat*(1-p_hat))/n
    ci = [p_hat - rhs, p_hat + rhs]
    return ci

... and let's give our functions a quick test ...

In [3]:
x_bar_test = 73.0592655207448
sigma_test = 4.500032137012057
n_test = 30

normal_distribution_ci(0.95, x_bar_test, sigma_test, n_test)

[71.4489792915286, 74.669551749961]

I have tested the output using other programs and it is correct but just to note, if you are calculating the standard deviation ($\sigma$) from a set of test data you need to be aware of the following -

The ``scipy.stats.sem function()`` uses a default value of ``ddof=1`` for the number-of-degrees-of-freedom parameter while ``numpy.std()`` uses ``ddof=0`` by default and this can lead to a different result depending on whether you use ``scipy`` or ``numpy`` to perform the calculation.

For a full explanation see this article on stackoverflow: https://stackoverflow.com/questions/54968267/scipy-stats-sem-calculate-of-standard-error

In [4]:
p_hat_test = 0.78
n_test = 32

binomial_distribution_ci(0.95, p_hat_test, n_test)

[0.7546278801351438, 0.8053721198648562]

This output has also been verified using a simple Excel formula - ``=1.96*(SQRT(B2*(1-B2))/Sheet2!$B$1)`` where 1.96 is the z-score for a 95% confidence interval, cell B2 holds the value for $\hat{p}$ and ``Sheet2!$B$1`` holds the value for the sample size.

### Creating an Interactive Calculator
The interactive calculator is created using Jupyter interactive widgets and the full documentation can be found here -
- https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20List.html
- https://ipywidgets.readthedocs.io/en/stable/examples/Using%20Interact.html

The first thing we need to do is to define our input controls ...

In [5]:
x_bar_input = widgets.FloatText(value=75.7, min=0, max=100000, step=0.01, description='x bar:', disabled=False)
sigma_input = widgets.FloatText(value=7.3, min=0, max=100000, step=1, description='sigma:', disabled=False)
n_normal_input = widgets.BoundedIntText(value=30, min=0, max=100000, step=1, description='n:', disabled=False)
n_binomial_input = widgets.BoundedIntText(value=32, min=0, max=100000, step=1, description='n:', disabled=False)

Now we need a couple of simple wrapper functions to show the controls and call the appropriate confidence interval function when the button is clicked ...

#### Normal Distribution Confidence Interval Calculator
This is the key part of the solution; in just a few lines of Python and Jupyter code the interactive calculators are created such that you can change the input parameters and click on “Run Interact” to re-run the calculation for the normal distribution as often as you like -

In [6]:
print("Normal Distribution Confidence Interval Calculator")
@interact_manual(confidence=(0.5, 0.99,0.01), x_bar=x_bar_input, sigma=sigma_input, n=n_normal_input)
def confidence_interval_normal(confidence=0.95, x_bar=1000, sigma=1000, n=1000):
    ci = normal_distribution_ci(confidence, x_bar, sigma, n)

    print(f"The population mean lies between {ci[0]:.2f} and {ci[1]:.2f} with {confidence:.0%} confidence")

Normal Distribution Confidence Interval Calculator


interactive(children=(FloatSlider(value=0.95, description='confidence', max=0.99, min=0.5, step=0.01), FloatTe…

#### Binomial Distribution Confidence Interval Calculator
And here is the equivalent code to create the interactive calculator for binomial confidence intervals …

In [7]:
print("Binomial Distribution Confidence Interval Calculator")
@interact_manual(confidence=(0.5, 0.99,0.01), p_hat=(0.0,1.0,0.01), n=n_binomial_input)
def confidence_interval_binomial(confidence=0.95, p_hat=0.78, n=32):
    ci = binomial_distribution_ci(confidence, p_hat, n)
    
    print(f"The population mean lies between {ci[0]:.1%} and {ci[1]:.1%} with {confidence:.0%} confidence")

Binomial Distribution Confidence Interval Calculator


interactive(children=(FloatSlider(value=0.95, description='confidence', max=0.99, min=0.5, step=0.01), FloatSl…

### Creating a Batch Calculator
With the interactive confidence interval calculator working for individual calculations I decided to extend it to work in batch mode for situations where I want to process several calculations at once.

First we need to create a few helper functions to update the values in the ``DataFrame`` ...

In [8]:
def normal_distribution_ci_pop_min(confidence, x_bar, sigma, n):
    return normal_distribution_ci(confidence, x_bar, sigma, n)[0]

def normal_distribution_ci_pop_max(confidence, x_bar, sigma, n):
    return normal_distribution_ci(confidence, x_bar, sigma, n)[1]

def normal_distribution_ci_text(confidence, x_bar, sigma, n):
    ci = normal_distribution_ci(confidence, x_bar, sigma, n)
    return f"The population mean lies between {ci[0]:.2f} and {ci[1]:.2f} with {confidence:.0%} confidence"

def binomial_distribution_ci_pop_min(confidence, p_hat, n):
    return binomial_distribution_ci(confidence, p_hat, n)[0]

def binomial_distribution_ci_pop_max(confidence, p_hat, n):
    return binomial_distribution_ci(confidence, p_hat, n)[1]

def binomial_distribution_ci_text(confidence, p_hat, n):
    ci = binomial_distribution_ci(confidence, p_hat, n)
    return f"The population mean lies between {ci[0]:.1%} and {ci[1]:.1%} with {confidence:.0%} confidence"

#### Normal Distribution Confidence Interval Batch Calculator

In [9]:
df_normal_ci = pd.read_csv("data/normal_ci_batch.csv")
df_normal_ci

Unnamed: 0,Measure,confidence,x_bar,sigma,n
0,The mean score for a classroom test was 74 with a standard deviation of 5 for a group of 30 students,0.95,74.0,5,30
1,The mean score for an online test was 76.7 with a standard deviation of 8 for a group of 30 students,0.95,76.7,8,30


In [10]:
df_normal_ci['pop_min'] = df_normal_ci.apply(lambda row: normal_distribution_ci_pop_min(row['confidence'], row['x_bar'], row['sigma'], row['n']), axis=1)
df_normal_ci['pop_max'] = df_normal_ci.apply(lambda row: normal_distribution_ci_pop_max(row['confidence'], row['x_bar'], row['sigma'], row['n']), axis=1)
df_normal_ci['confidence_interval'] = df_normal_ci.apply(lambda row: normal_distribution_ci_text(row['confidence'], row['x_bar'], row['sigma'], row['n']), axis=1)
df_normal_ci

Unnamed: 0,Measure,confidence,x_bar,sigma,n,pop_min,pop_max,confidence_interval
0,The mean score for a classroom test was 74 with a standard deviation of 5 for a group of 30 students,0.95,74.0,5,30,72.210806,75.789194,The population mean lies between 72.21 and 75.79 with 95% confidence
1,The mean score for an online test was 76.7 with a standard deviation of 8 for a group of 30 students,0.95,76.7,8,30,73.837289,79.562711,The population mean lies between 73.84 and 79.56 with 95% confidence


#### Binomial Confidence Interval Batch Calculator

In [11]:
df_binom_ci = pd.read_csv("data/binomial_ci_batch.csv")
df_binom_ci

Unnamed: 0,Measure,confidence,p_hat,n
0,78% of staff surveyed identified three key areas of strength and three areas for development.,0.95,0.78,32
1,35% of reviews sampled had identified at least three objectives for next year,0.95,0.35,32
2,81% of reviews sampled include an actionable development plan,0.95,0.81,32
3,97% of reviews sampled were completed using the correct paperwork,0.95,0.97,32


In [12]:
df_binom_ci['pop_min'] = df_binom_ci.apply(lambda row: binomial_distribution_ci_pop_min(row['confidence'], row['p_hat'], row['n']), axis=1)
df_binom_ci['pop_max'] = df_binom_ci.apply(lambda row: binomial_distribution_ci_pop_max(row['confidence'], row['p_hat'], row['n']), axis=1)
df_binom_ci['confidence_interval'] = df_binom_ci.apply(lambda row: binomial_distribution_ci_text(row['confidence'], row['p_hat'], row['n']), axis=1)
df_binom_ci

Unnamed: 0,Measure,confidence,p_hat,n,pop_min,pop_max,confidence_interval
0,78% of staff surveyed identified three key areas of strength and three areas for development.,0.95,0.78,32,0.754628,0.805372,The population mean lies between 75.5% and 80.5% with 95% confidence
1,35% of reviews sampled had identified at least three objectives for next year,0.95,0.35,32,0.320786,0.379214,The population mean lies between 32.1% and 37.9% with 95% confidence
2,81% of reviews sampled include an actionable development plan,0.95,0.81,32,0.785972,0.834028,The population mean lies between 78.6% and 83.4% with 95% confidence
3,97% of reviews sampled were completed using the correct paperwork,0.95,0.97,32,0.959552,0.980448,The population mean lies between 96.0% and 98.0% with 95% confidence


### Conclusion
We have briefly reviewed the statistical formulae that calculate confidence intervals for normal and binomial distributions and shown that it is extremely useful to be able to calculate them interactively rather than having to re-invent the code in Excel or Python each time we want to work them out.

We have defined Python functions to calculate the confidence intervals and then quickly created two interactive forms that we can run over and over again to calculate the confidence intervals that we are interested in.

Finally we have extended the interactive calculator to read parameters from a ``.csv`` file in order to quickly perform confidence interval calculations on a batch of data in order to speed uo the process where we want to run the calculations on several sets of data rather than just one.