# An informal Introduction to Probability Distributions

In this notebook, we provide a relatively informal introduction the concept of **probability distribution*. The significance of this subject lies in its foundational role across various disciplines (data science as well as scientic subjects such as physics or chemistry), where understanding the probability distribution shapes of data or phenomena under examination serves as a starting point for most applications. There are lots of mathematical details that could be covered when talking about probability distributions, since the subject is vast and can be discussed under different persepctives. The aim of this notebook is to provide just an introductory overview, but without compromising too heavily on the mathematical aspects. Specifically, we will delve into some of the most prevalent probability distributions and their properties, such as the *Gaussian* or *Beta* distribution. Additionally, we will include Python simulations to facilitate a better grasp of the concepts discussed.

## Probability Densities

In a *frequentistic* approach, we define the probability of an event as a measure of the frequency for the event to occur, in the limit that the total number of trials goes to infinity. For instance, the probabilty of rolling a dice and get "head" is 1/2, since in the limit of an infinite number of trials, we should expect to get "head" half of the time and "tail" the other half. Such probabilities are defined over discrete sets of events, in the provided example $\{H, T\}$, i.e. "head" or "tail". However, most of the time we deal with continuous variables and we need to extend probabilities to such a continiuous case. 

If $x$ is a real-value continious variable, the *probability density* over $x$ is the defined as the quantity $p(x)$ such that $p(x)\delta x$ is the probability for $x$ to fall in the interval $(x, x+\delta x)$, in the limit $\delta x \to 0$. Then, we may define:

$$P(x \in (a, b)) = \int_a^b p(x) dx $$

which is the probability for $x$ to fall in the interval $(a, b)$. The probability density $p(x)$ must satisfy the following two conditions:
$$ \begin{gather}
p(x) \geq 0 \\
\int_{-\infty}^{+\infty} p(x) dx = 1
\end{gather}
$$
The sum and product rules, as well as Beyes' Theorem, apply equally to the case of probability densities. If $x$ and$y$ are two real variables, then the product and sum rules take the form:
$$\begin{gather}
p(x) = \int p(x,y) dy \\
p(x,y) = p(y|x) p(x)
\end{gather}
$$
where $p(y|x)$ is the *conditional probability* of y given x. 
In analogy with the discrete case, we may define the average value of some function $f(x)$, assuming that  $x$ follows the probability distributions $p(x)$, as:
$$
\mathbb{E}[f] = \int p(x) f(x) dx
$$
The operator $\mathbb{E}(\cdot)$ is generally known as the **expectation** (of $f(x)$). The **variance** of $f(x)$ is instead a quantity measuring the variability in $f(x)$ around its mean value $\mathbb{E}[f(x)]$ and is defined as:
$$
\text{var}[f] = \mathbb{E}[(f(x)-\mathbb{E}[f(x)])^2]
$$
in other words, it is the expectation value of the squared difference between $f(x)$ and its mean value. We take the *squared* difference since the expected value of the mere difference $f(x) - \mathbb{E}[f(x)]$ is identically vanishing. Working out the expected value and reminding that $\mathbb{E}$ is a *linear* operator, we can re-write the variance in a pretty simple form: 
$$\begin{align}
\text{var}[f(x)] &= \mathbb{E}[f(x)^2 -2f(x)E[f(x)] + (\mathbb{E}[f(x)])^2] \\
&=\mathbb{E}[f(x)^2] - 2(\mathbb{E}[f(x)])^2+(\mathbb{E}[f(x)])^2 \\
&= \mathbb{E}[f(x)^2]-\mathbb{E}[f(x)]^2
\end{align}
$$
For two random variables $x$ and $y$ we define also the **covariance** as:
$$\begin{align}
\text{cov}[x,y] &= \mathbb{E}_{x,y}[\{x-\mathbb{E}[x]\}\{y-\mathbb{E}[y]\}] \\
&=\mathbb{E}_{x,y}[xy] - \mathbb{E}[x]\mathbb{E}[y]
\end{align}
$$
which expresses the extent to which $x$ and $y$ vary together (i.e. "co-vary"). Indeed, if $x$ and $y$ are independent variables, then:
$$\begin{align}
\mathbb{E}_{x,y}[xy] &= \int \int xy p(x)p(y) dxdy \\
&= \left( \int x p(x) dx)\right) \left( \int y p(y) dy\right) \\
&= \mathbb{E}_x[x]\mathbb{E}_y[y]
\end{align}
$$
hence their covariance vanishes.

In [1]:
import numpy as np
import pandas as pd

In [2]:
import ipywidgets

In [3]:
import matplotlib.pyplot as plt

In [24]:
def central_limit_theorem_demo(N):
    sample_size = 10000
    
    sample_means = np.mean(np.random.rand(sample_size, N), axis=1)

    # Plot histograms of sample means
    fig, ax = plt.subplots(figsize=(8, 6))
    fig.suptitle('Central Limit Theorem Demonstration')

    ax.set_xlim([0,1])
    ax.hist(sample_means, bins=30, density=True, alpha=0.6, color='royalblue', edgecolor='black')
    ax.set_title(f'Distribution of Sample Means (N = {N})')
    ax.set_xlabel('Sample Mean')
    ax.set_ylabel('Density')

    
    plt.tight_layout()
    plt.show()

In [26]:
# Demonstrate Central Limit Theorem
ipywidgets.interact(central_limit_theorem_demo, N=(1,100,2))

interactive(children=(IntSlider(value=49, description='N', min=1, step=2), Output()), _dom_classes=('widget-in…

<function __main__.central_limit_theorem_demo(N)>