# Binomial Distribution- Theory


**Binomial distribution** is a probability distribution that summarises the likelihood that a variable will take one of two independent values under a given set of parameters. The distribution is obtained by performing a number of **Bernoulli** trials.

A Bernoulli trial is assumed to meet each of these criteria :

- There must be only 2 possible outcomes.
- Each outcome has a fixed probability of occurring. A success has the probability of p, and a failure has the probability of 1 – p.
- Each trial is completely independent of all others.

The binomial random variable represents the number of successes(r) in n successive independent trials of a Bernoulli experiment.

Probability of achieving r success and n-r failure is :

$$p^r * (1-p)^{n-r}$$

The number of ways we can achieve r successes is : 

$$\frac{n!}{(n-r)!\ *\ r!}$$

Hence, the probability mass function(pmf), which is the total probability of achieving r success and n-r failure is :

$$\frac{n!}{(n-r)!\ *\ r!}\ *\ p^r * (1-p)^{n-r}$$

An example illustrating the distribution :
Consider a random experiment of tossing a biased coin 6 times where the probability of getting a head is 0.6. If ‘getting a head’ is considered as ‘success’ then, the binomial distribution table will contain the probability of r successes for each possible value of r.




# Binomial Distribution in model ensemble validation

The ensemble in entirety, makes an error in case (M+1)/2 or more models make an error $\epsilon$ simultaneously. Here, M is number of independent models and is assumed to be an odd number. The probability that exactly k independet models make an error is:

$${P(exactly\ k\ hypotheses\ make\ an\ error)} = \binom{M}{k}\ \varepsilon^k{(1-\varepsilon)}^{(M-k)}$$

and probability of at least $(M+1)/2$ errors is the sum of errors for K, K+1, ...,M:

$$P(errors) = \sum_{k=(M+1)/2}^{M}{P(exactly\ k\ hypotheses\ make\ an\ error)} = \sum_{k=(M+1)/2}^{M}{\binom{M}{k}\ \varepsilon^k{(1-\varepsilon)}^{(M-k)}}$$

, where

$$\binom{M}{k}\ = \frac{M!}{k!(M-k)!}$$





In [154]:
import math

In [170]:
# This function computes the probability that (M+1)/2 or more models make an error 'e'.
def ComputeErrorProbability(M,e):

    # k is the minimum number of models making error 'e'.
    k = int((M+1)/2)
    prob = 0
    # total probability is the sum of probabilities for values of k, K+1,..M
    # M+1 because the range function is inclusive on lower bound of k, and exclusive on upper bound M+1 
    for i in range(k,M+1):
        prob += (math.factorial(M)/(math.factorial(i)* math.factorial(M-i)))*pow(e,i)*(pow((1-e),M-i))
        
    
    return prob


## 1. The ensemble contains 11 independent models, all of which have an error rate of 0.2.



In [171]:
print(ComputeErrorProbability(11,.2))

0.011654205440000008


## 2. The ensemble contains 11 independent models, all of which have an error rate of 0.49.


In [172]:
print(ComputeErrorProbability(11,.49))

0.47294772571497457


## 3. The ensemble contains 21 independent models, all of which have an error rate of 0.49.

In [173]:
print(ComputeErrorProbability(21,.49))

0.46304790101273546
