# Binomial Test
*By P. Stikker*<br>
https://PeterStatistics.com<br>
https://www.youtube.com/stikpet

Libraries and modules used in this notebook:

In [3]:
import pandas as pd
import numpy as np

import pyreadstat # for the example data
from scipy.stats import binom_test # for the binomial test itself

Loading some example data:

In [23]:
myDf, meta = pyreadstat.read_sav("StudentStatistics.sav")

For the binomial test we'll need a dichotomous/binary variable. I'll use sex.

Lets have a quick look at the counts:

In [17]:
myCd = myDf['Gen_Gender'].value_counts()
myCd

2.0    34
1.0    12
Name: Gen_Gender, dtype: int64

The value_counts returns a dictionary, so we can get the sum of the values by using:

In [18]:
mySum = sum(myCd.values)
mySum

46

Lets simply pick the first found value in myCd:

In [21]:
myX = myCd.values[0]
myX

34

Now for the Exact Binomial Test:

In [22]:
binom_test(myX, mySum, 1/2, alternative='two-sided')

0.0016414913408482337

The .0016 indicates that there is a 0.16% chance of getting 34 'successes' out of 46, if the chance of success was 50% on each trial.

Usually the threshold for this is set at .05 (5%). If the chance is below this, we'd reject the assumption that the chance of success was 50%. It indicates that the two proportions were significantly different. In this research one could conclude that the percentage of Male was significantly different than the percentage of Female. (See https://PeterStatistics.com for more details on how to report this).

It is recommended to follow this up with an effect size. Cohen's g is one that might be suited. See separate documentation on Cohen's g.

*Final remark:* 

The statsmodels library also has a binomial test, but uses the scipy version for the two-sided version (see their notes on the bottom of the page at https://www.statsmodels.org/stable/generated/statsmodels.stats.proportion.binom_test.html)

# Fully Naive

In case you really don't want to install any package there is some more work to do.

First create a function for 'factorial'. In mathematics this could be defined as:

\begin{equation*}
x! = x\times (x - 1) \times (x - 2) \times ... \times 2 \times 1
\end{equation*}

A simple for loop will do the trick:

In [29]:
def myFact(anInt):
    fact=1
    for i in range(anInt):
        fact = fact*(i+1)
    return fact

In [30]:
myFact(6)

720

In [48]:
# check if it works
6*5*4*3*2

720

Next the number of possible combinations of k out of n. In mathematics this can be calculated using:

\begin{equation*}
\binom{n}{k} = \frac{n!}{k!\times(n-k)!}
\end{equation*}

Simple in Python now that we have the factorial function:

In [34]:
def myCombin(n, k):
    return myFact(n)/(myFact(k)*myFact(n-k))    

In [36]:
myCombin(8,3)

56.0

Finally the Cumulative Probability function for the Binomial:

\begin{equation*}
CPB(n,k,p) = \sum_{i=0}^{k}\binom{n}{i}\times p^i\times(i-p)^{(n-i)}
\end{equation*}

In [46]:
def CPB(n, k, p):
    sig=0
    for i in range(k+1):
        sig = sig + myCombin(n,i)*p**i*(1-p)**(n-i)
    return sig

In [47]:
CPB(46,12,0.5)*2

0.0016414913408482334

Note that I used the 12 and not the 34, and also that I multiplied the result simply by 2.

The multiplication by 2 for the two-sided significance is only valid if p = 0.5. In other cases this becomes more difficult.

I used the smallest of the two, to make the computation straight forward. You could add a check on if k > n/2 then use n - k.

The math module from Python actually has a factorial function, and a comb function for combinations.