# Plot Probability Distribution Functions

## Plot the main probability functions with matplotlib and seaborn libraries, with parameters interaction

### The probability functions are:

1 - Normal

2 - Gamma

3 - Exponential

4 - Uniform

5 - Logistic

6 - Poisson

7 - Chi2

8 - Student t

9 - Binomial

10 - Weibull

### Import libraries

In [12]:
from ipywidgets import interact
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

%matplotlib notebook

sns.set_style('whitegrid')

# Matplotlib + Seaborn

## 1 - Normal Distribution

In probability theory, a <b>normal</b> (or <b>Gaussian</b> or Gauss or Laplace–Gauss) distribution is a type of <i>continuous probability distribution</i> for a real-valued <i>random variable</i>.
The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently, is often called the <b>bell curve</b> because of its characteristic shape (see KDE below). Ahead the PDF (probability distribution function) and CDF (cumulative distribution function) as functios of the mean ($\mu$) and variance ($\sigma^2$):

$$\begin{align} PDF = \frac{1}{\sigma\sqrt{2\pi}}e^{\frac{1}{2}(\frac{x-\mu}{\sigma})^2} \end{align}$$

$$ CDF = \frac{1}{2}[1+erf(\frac{x-\mu}{\sigma\sqrt{2}})] $$

The normal distributions <i>occurs often in nature</i>. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random disturbances, each with its own unique distribution, being important in <i>statistics, often used in the natural and social sciences</i> to <b>represent real-valued random variables</b> whose <b>distributions are not known</b>.

[1] https://en.wikipedia.org/wiki/Normal_distribution

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.normal.html#numpy.random.Generator.normal

In [13]:
def normal_dist(loc, sigma, size):
    x = np.random.normal(loc, sigma, size)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,6))
    
    ax1.hist(x, bins=20, color='blue', alpha=.5)
    ax1.set_ylabel('Sample Values')
    ax1.set_title('Normal Distribution')
    
    sns.distplot(x, ax=ax2, bins=20, color='blue')
    ax2.set_ylabel('Probability')
    ax2.set_title('Normal Distribution + KDE')
    
    plt.draw()
    
interact(normal_dist, loc=(-2,2,1), sigma=(0,5,0.05), size=(10,500,10))

interactive(children=(IntSlider(value=0, description='loc', max=2, min=-2), FloatSlider(value=2.0, description…

<function __main__.normal_dist(loc, sigma, size)>

## 2 - Gamma

In probability theory and statistics, the <b>gamma distribution</b> is a <b>two-parameter</b> family of <i>continuous probability distributions</i>. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. Ahead the PDF and CDF as functions of shape ($k$) and scale ($\theta$):

$$\begin{align} PDF = \frac{1}{\Gamma(k)\theta^{k}} x^{k-1} e^{\frac{-z}{\theta}} \end{align}$$

$$ CDf = \frac{1}{\Gamma(k)\theta^{k}} \gamma(k,\frac{x}{\theta}) $$

The Gamma distribution is often <i>used</i> to model the <b>times to failure of electronic components</b>, and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.

[1] https://en.wikipedia.org/wiki/Gamma_distribution

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.gamma.html#numpy.random.Generator.gamma

In [3]:
def gamma_dist(shape, scale, size):
    x = np.random.gamma(shape, scale, size)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,6))
    
    ax1.hist(x, bins=20, color='green', alpha=.7)
    ax1.set_ylabel('Sample Values')
    ax1.set_title('Gamma Distribution')
    
    sns.distplot(x, ax=ax2, bins=20, color='green')
    ax2.set_ylabel('Probability')
    ax2.set_title('Gamma Distribution + KDE')
    
    plt.draw()

interact(gamma_dist, shape=(0,5,0.05), scale=(0,5,0.05), size=(10,500,10))

interactive(children=(FloatSlider(value=2.0, description='shape', max=5.0, step=0.05), FloatSlider(value=2.0, …

<function __main__.gamma_dist(shape, scale, size)>

## 3 - Exponential

In probability theory and statistics, the <b>exponential distribution</b> is the probability distribution of the time between events in a Poisson point process, i.e. a <i>process in which events occur continuously and independently at a constant average rate</i>. Ahead the PDF and CDF as functions of the rate ($\lambda$):

$$ PDF = \lambda e^{-\lambda x} $$

$$ CDF = 1 - e^{-\lambda x} $$

The exponential distribution is a <i>continuous analogue of the geometric distribution</i>. It describes many common situations, such as the <i>size of raindrops measured over many rainstorms, or the time between page requests to Wikipedia</i>.

[1] https://en.wikipedia.org/wiki/Exponential_distribution

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.exponential.html#numpy.random.Generator.exponential

In [4]:
def exponential_dist(scale, size):
    x = np.random.exponential(scale, size)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,6))
    
    ax1.hist(x, bins=20, color='orange', alpha=.8)
    ax1.set_ylabel('Sample Values')
    ax1.set_title('Exponential Distribution')
    
    sns.distplot(x, ax=ax2, color='orange', bins=20)
    ax2.set_ylabel('Probability')
    ax2.set_title('Exponential Distribution + KDE')
    
    plt.draw()

interact(exponential_dist, scale=(0,5,0.05), size=(10,500,10))

interactive(children=(FloatSlider(value=2.0, description='scale', max=5.0, step=0.05), IntSlider(value=250, de…

<function __main__.exponential_dist(scale, size)>

## 4 - Uniform

In probability theory and statistics, the <b>continuous uniform distribution</b> or rectangular distribution is a family of <i>symmetric probability distributions</i>. Ahead the PDF and CDF as functions of the interval [a,b]:

$$ PDF = \begin{cases} \frac{1}{b-a} \rightarrow x \in [a,b] \\ 0 \rightarrow otherwise \end{cases} $$

$$ CDF =  \begin{cases} 0 \rightarrow x < a \\ \frac{x-a}{b-a} \rightarrow x \in [a,b] \\ 1 \rightarrow x > b\end{cases} $$

The distribution <i>describes</i> an experiment where there is an <i>arbitrary outcome that lies between certain bounds</i> [low and high below). The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable.

[1] https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.uniform.html#numpy.random.Generator.uniform

In [5]:
def uniform_dist(low, high, size):
    x = np.random.uniform(low, high, size)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,6))
    
    ax1.hist(x, bins=20, color='orangered', alpha=.7)
    ax1.set_ylabel('Sample Values')
    ax1.set_title('Uniform Distribution')
    
    sns.distplot(x, ax=ax2, bins=20, color='orangered')
    ax2.set_ylabel('Probability')
    ax2.set_title('Uniform Distribution + KDE')
    
    plt.draw()

interact(uniform_dist, low=(-5,5,0.1), high=(0,5,0.1), size=(10,500,10))

interactive(children=(FloatSlider(value=0.0, description='low', max=5.0, min=-5.0), FloatSlider(value=2.0, des…

<function __main__.uniform_dist(low, high, size)>

## 5 - Logistic

In probability theory and statistics, the <b>logistic distribution</b> is a <i>continuous probability distribution</i>. Its <i>cumulative distribution function</i> is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails (higher kurtosis). Ahead the PDF and CDF as functions of location ($\mu$) and scale ($s>0$):

$$ PDF = \frac{e^{(x-\mu)/s}}{s(1+e^{-(x-\mu)/s})^2} $$

$$ CDF = \frac{1}{(1+e^{(x-\mu)/s}} $$

The Logistic distribution is used in Extreme Value problems where it can act as a mixture of Gumbel distributions, in Epidemiology, and by the World Chess Federation (FIDE) where it is used in the Elo ranking system, assuming the performance of each player is a logistically distributed random variable.

[1] https://en.wikipedia.org/wiki/Logistic_distribution

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.logistic.html#numpy.random.Generator.logistic

In [6]:
def logistic_dist(loc, scale, size):
    x = np.random.logistic(loc, scale, size)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))
    
    ax1.hist(x, bins=20, color='forestgreen', alpha=.7)
    ax1.set_ylabel('Sample Distribution')
    ax1.set_title('Logistic Distribution')
    
    sns.distplot(x, bins=20, color='forestgreen', ax=ax2)
    ax2.set_ylabel('Probability')
    ax2.set_title('Logistic Distribution + KDE')
    
    plt.draw()

interact(logistic_dist, loc=(-5,5,0.1), scale=(0,5,0.1), size=(10,500,10))

interactive(children=(FloatSlider(value=0.0, description='loc', max=5.0, min=-5.0), FloatSlider(value=2.0, des…

<function __main__.logistic_dist(loc, scale, size)>

## 6 - Poisson

In probability theory and statistics, the <b>Poisson distribution</b> is a <i>discrete probability distribution</i> that expresses the <b>probability</b> of a given <b>number of events</b> occurring in a <b>fixed interval</b> of <b>time</b> or <b>space</b> if these events occur with a known constant mean rate and independently of the time since the last event. Ahead the PMF (prob. mass function) and CDF as functions of the rate ($\lambda$) and number of occurences ($k$):

$$ PMF = \frac{\lambda^k e^\lambda}{k!} $$

$$ CDF = \frac{\Gamma(\left\lfloor k+1 \right\rfloor,\lambda)}{\left\lfloor k \right\rfloor!} \rightarrow k \ge 0$$

The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

[1] https://en.wikipedia.org/wiki/Poisson_distribution

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.poisson.html#numpy.random.Generator.poisson

In [7]:
def poisson_dist(lam, size):
    x = np.random.poisson(lam, size)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))
    
    ax1.hist(x, bins=20, color='firebrick', alpha=.7)
    ax1.set_ylabel('Sample Values')
    ax1.set_title('Poisson Distribution')
    
    sns.distplot(x, bins=20, color='firebrick', ax=ax2)
    ax2.set_ylabel('Probability')
    ax2.set_title('Poisson Distribution + KDE')
    
    plt.draw()

interact(poisson_dist, lam=(0,10,0.1), size=(10,500,10))

interactive(children=(FloatSlider(value=5.0, description='lam', max=10.0), IntSlider(value=250, description='s…

<function __main__.poisson_dist(lam, size)>

## 7 - Chi2

In probability theory and statistics, the <b>chi-square distribution</b> (also chi-squared or $\chi^2$-distribution) with <b>k degrees of freedom</b> is the distribution of a <i>sum of the squares of k independent standard normal random variables</i>. The chi-square distribution is a <i>special case of the gamma distribution</i> and is one of the <b>most widely used probability distributions</b> in <b>inferential statistics</b>, notably in <b>hypothesis testing</b> and in <b>construction of confidence intervals</b>. Ahead the PDF and CDF as functions of the degrees of freedom ($k$):

$$ PDF = \frac{1}{2^{\frac{k}{2}}\Gamma(k/2)} x^{(k/2)-1} e^{-x/2} $$

$$ CDF = \frac{1}{\Gamma(k/2)} \gamma(\frac{k}{2}, \frac{x}{2})$$

[1] https://en.wikipedia.org/wiki/Chi-square_distribution

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.chisquare.html#numpy.random.Generator.chisquare

In [8]:
def chi2_dist(df, size):
    x = np.random.chisquare(df, size)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))
    
    ax1.hist(x, bins=20, color='darkslategrey', alpha=.7)
    ax1.set_ylabel('Sample Values')
    ax1.set_title('Chi2 Distribution')
    
    sns.distplot(x, bins=20, ax=ax2, color='darkslategrey')
    ax2.set_ylabel('Probability')
    ax2.set_title('Chi2 Distribution + KDE')
    
    plt.draw()

interact(chi2_dist, df=(0,10,0.1), size=(10,500,10))

interactive(children=(FloatSlider(value=5.0, description='df', max=10.0), IntSlider(value=250, description='si…

<function __main__.chi2_dist(df, size)>

## 8 - Student t

In probability and statistics, <b>Student's t-distribution</b> (or simply the t-distribution) is any member of a family of <i>continuous probability distributions</i> that arises when <b>estimating the mean of a normally distributed population</b> in situations where the <b>sample size is small</b> (usually <i>less than 30 samples</i>) and the population <b>standard deviation is unknown</b>. Ahead the PDF and CDF as functions of the number of degrees of freedom ($\nu$):

$$ PDF = \frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu \pi} \Gamma(\frac{\nu}{2})} (1+\frac{x^2}{\nu})^{\frac{-\nu+1}{2}} $$

$$ CDF = \frac{1}{2} + x\Gamma(\frac{\nu+1}{2}) * \frac{_{2}F_{1}(\frac{1}{2};\frac{\nu+1}{2};\frac{3}{2};\frac{-x^2}{\nu})}{\sqrt{\nu\pi}\Gamma(\nu/2)} \rightarrow _{2}F_{1} = hypergeometric function $$


The t test is based on an assumption that the data come from a Normal distribution. The t test provides a way to test whether the sample mean (that is the mean calculated from the data) is a good estimate of the true mean.

[1] https://en.wikipedia.org/wiki/Student's_t-distribution

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.standard_t.html#numpy.random.Generator.standard_t

In [9]:
def t_dist(df, size):
    x = np.random.standard_t(df, size)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))
    
    ax1.hist(x, bins=20, color='royalblue', alpha=.8)
    ax1.set_ylabel('Sample Values')
    ax1.set_title('Student-t Distribution')
    
    sns.distplot(x, bins=20, ax=ax2, color='royalblue')
    ax2.set_ylabel('Probability')
    ax2.set_title('Student-t Distribution + KDE')
    
    plt.draw()

interact(t_dist, df=(0,10,0.1), size=(10,500,10))

interactive(children=(FloatSlider(value=5.0, description='df', max=10.0), IntSlider(value=250, description='si…

<function __main__.t_dist(df, size)>

## 9 - Binomial

In probability theory and statistics, the <b>binomial distribution</b> with parameters $n$ and $p$ is the <i>discrete probability distribution</i> of the <b>number of successes in a sequence of n independent experiments</b>, each asking a <i>yes–no question</i>, and each with its own boolean-valued outcome: success/yes/true/one (with probability p) or failure/no/false/zero (with probability q = 1 − p). Ahead the PMF and CDF as functions of number of trials ($n$), number of successes ($k$), probability of success ($p$) and $q=1-p$:

$$ PMF = \left(\begin{array}{cc} n \\ k \end{array}\right) p^k q^{n-k}$$

$$ CDF = I_{q}(n-k, 1+k) $$

A single success/failure experiment is also called a <b>Bernoulli trial</b> or <b>Bernoulli experiment</b> and a sequence of outcomes is called a Bernoulli process.
When estimating the standard error of a proportion in a population by using a random sample, the normal distribution works well unless the product p*n <=5, where p = population proportion estimate, and n = number of samples, in which case the binomial distribution is used instead.

[1] https://en.wikipedia.org/wiki/Binomial_distribution

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.binomial.html#numpy.random.Generator.binomial

In [10]:
def binomial_dist(trials, prob, size):
    x = np.random.logistic(trials, prob, size)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 6))
    
    ax1.hist(x, bins=20, color='brown', alpha=.7)
    ax1.set_ylabel('Sample Values')
    ax1.set_title('Binomial Distribution')
    
    sns.distplot(x, bins=20, color='brown', ax=ax2)
    ax2.set_ylabel('Probability')
    ax2.set_title('Binomial Distribution + KDE')
    
    plt.draw()

interact(binomial_dist, trials=(1,50,1), prob=(0,1,0.05), size=(10,500,10))

interactive(children=(IntSlider(value=25, description='trials', max=50, min=1), FloatSlider(value=0.0, descrip…

<function __main__.binomial_dist(trials, prob, size)>

### 10 - Weibull

In probability theory and statistics, the <b>Weibull distribution</b> is a <i>continuous probability distribution</i>, also called Type III asymptotic extreme value distribution for smallest values, SEV Type III, or Rosin-Rammler distribution, and is one of a class of <b>Generalized Extreme Value (GEV) distributions</b> used in <b>modeling extreme value problems</b>. Ahead the PDF and CDF as functions of the scale ($\lambda$) and shape ($k$):

$$ PDF = \begin{cases} \frac{k}{\lambda}(\frac{x}{\lambda})^{k-1} e^{-(x/\lambda)^k}) \rightarrow x \ge 0 \\ 0 \rightarrow x < 0 \end{cases} $$

$$ CDF = \begin{cases} 1 - e^{(x-\lambda)^k} \rightarrow x \ge 0 \\ 0 \rightarrow x < 0 \end{cases} $$

The Weibull distribution is used in several different filds analysis, such as <i>survival analysis, electrical engineering, weather forecasting and wind power industry, used to describe wind speed distributions, hydrology, among others</i>.

[1] https://en.wikipedia.org/wiki/Weibull_distribution

[2] https://numpy.org/doc/1.18/reference/random/generated/numpy.random.Generator.weibull.html#numpy.random.Generator.weibull

In [11]:
def weibull_dist(a, size):
    x = np.random.weibull(a, size)
    fig, (ax1, ax2) = plt.subplots(1,2, figsize=(10,6))
    
    ax1.hist(x, bins=20, color='grey', alpha=.8)
    ax1.set_ylabel('Probability')
    ax1.set_title('Weibull Distribution')
    
    sns.distplot(x, ax=ax2, color='grey', bins=20)
    ax2.set_title('Weibull Distribution + KDE')
    
    plt.draw()

interact(weibull_dist, a=(0,5,0.1), size=(10,500,10))

interactive(children=(FloatSlider(value=2.0, description='a', max=5.0), IntSlider(value=250, description='size…

<function __main__.weibull_dist(a, size)>