**UNIVERSIDADE DE SÃO PAULO (USP)**

**_Author_**: Carlos Filipe de Castro Lemos

**_Academic Study_**: Discrete - Hypergeometric Distribution

In [32]:
import random
import math
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from scipy.stats import hypergeom

In [33]:
def combination(n, k):
    return math.factorial(n)/(math.factorial(k)*math.factorial(n-k))

# Hypergeometric Distribution

The hypergeometric distribution is very similar to the binomial distribution. However it presents a distintive characteristic: there will be no return to the sample space of the selected element. In this context, there is a correspondence of variables between the universe of the population and the sample. 

For example, "N" is the number of elements in the population, while "n" is the number of elements in the sample; while, in the sample, we still have "K" being the number of elements with the characteristic of success in the population and, in the sample, we have the variable "k".

In [34]:
N = 10
n = 3
K = 4
k = 2

### Probabily Mass Function (PMF)


The formula can be understood as the ratio between a denominator that involves the number of possible combinations of successful cases multiplied by the number of possible combinations of elements that are not successful cases divided by a denominator that is the number of combinations between the number of possible samples that can be formed from the population.

$$P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$$

Where:

* N: population size.
* n: number of draws which forms a sample.
* K: number of success states in the population.
* k: number of observed success in the sample.

In [35]:
def hypergeometric_pmf(N, n, K, k):
    return (combination(K,k)*combination(N-K, n-k))/combination(N,n)

In [36]:
hypergeometric_pmf(N,n,K,k)

0.3

### Expectation (E(X))


$$E(X) = np$$

In [37]:
def hypergeometric_expectation(N,n,K):
    return (n)*(K/N)

In [38]:
hypergeometric_expectation(N,n,K)

1.2000000000000002

In [39]:
hypergeom.mean(N,K,n)

1.2

### Variance (Var(x))


$$Var(X) = np(1-p)\frac{N-n}{N-1}$$

In [40]:
def hypergeometric_variance(N, n, K):
    return n*(K/N)*(1-K/N)*((N-n)/(N-1))

In [41]:
hypergeometric_variance(N,n,K)

0.56

In [42]:
hypergeom.var(N,K,n)

0.56

### Standard Deviation (STD(X))


In [43]:
def hypergeometric_std(N,n,K):
    return hypergeometric_variance(N,n,K)**(1/2)

In [44]:
hypergeometric_std(N,n,K)

0.7483314773547883

In [45]:
hypergeom.std(N,K,n)

0.7483314773547883