# Introduction to distributions

## Introduction

In previous labs, we have discussed probability density functions, measures of center and measures of dispersion. All these concepts relate to one main concept, the concept of a **random variable** or a **stochastic variable**. An endless amount of concepts observed in daily life are random variables. examples include:

- The number of customers in a supermarket tomorrow between 1PM and 2PM.
- The maximum temperature in Boston next Friday.
- The share price on the NY stock exchange, tomorrow at 3:23PM.
- The outcome of throwing a dice.
- etc.

We can not calculate the exact values of all these quantities in advance.

We can (hopefully) perceive them at the moment they actually happen. In any case, we can ask ourselves the questions like:

what is the probability that the temperature is higher than 73 degrees Fahrenheit? Or that there will be no more than 20 customers visiting our supermarket tomorrow? Or that we throw exactly a 6 when rolling a dice?

The example of rolling a dice might sound familiar from the combinatorics section.
Here, events were collections of outcomes in a given universe. Now, when we talk about random variables, we are of course dealing with an experiment where randomness plays even a bigger role, yet in theory we can again define the collection of all possible outcomes.

For each possible result (outcome) in $\Omega$, the random variable assumes a certain value. The random variable can thus be seen as a function that depicts each outcome on a (real / integer) number. Because in advance it is not certain which outcome will occur, the
function value also incurs randomness.

In practice it is often rather abstract to imagine a collection of all possible outcomes in an experiment. For example: which outcomes collection is behind a stock price, a maximum temperature, etc.? In practice, a lot of random events approximately follow some common **probability distributions**. Therefore, this is what will be covered in this section. Broadly, you can distinguish two types of probability distributions: **discrete probability distributions** and **continuous probability distributions**.

## Discrete probability distributions

 A discrete random variable can only assume isolated values, that is, a finite number or an infinite but countable number of values. A continuous variable can assume a continuum of values.
 
Examples of discrete random variables are: the number of customers in a store, the outcome when rolling a di ce,...  

### The probability density function

$p_X(x)=P(X=x)$ denotes the probability that our random variable $X$ takes the value $x$.

For example, when rolling a dice: $P(X=6) = \dfrac{1}{6}$

Another example, when rolling a dice:
$P(X=7) = 0$


properties: 
- $0 \leq p_X(x)  \leq 1 $ 
- $\displaystyle\sum_xp_X(x)=1$

![title](px_dice.png)

### The cumulative distribution function

The cumulative distribution tells you for each value what the probability is that random variable $X$ is smaller than a certain value $x$. The cumulative distribution function is given by $F_X(x)=P(X \leq x)$.

![title](fx_dice.png)

- $0 \leq F_X(x)  \leq 1 $ 
- $S_X(x)= 1-F_X(x) $ is often denoted the _survival_ function.

## Continuous probability distributions

Many random variables are continuous: the maximum temperature in Boston is, and so is a stock value at a certain point in time. Note that often, a continuous variable can only be measured with finite precision (a fixed number of significant figures before and/or after the decimal point), and will still be stored as discrete.

### The probability density function

As mentioned before, the maximum temperature tomorrow can take a continuous spectrum of values. Because of this, the probability of one specific value is usually zero. To understand this, think of the probability that it will be exactly 74.8123 degrees Fahrenheit tomorrow. This probability is extremely small, so basically $P (X = x) = 0$. Because of this reason, it does not make much sense to denote a chance mass function. A cumulative distribution function is possible.

### The cumulative distribution function

- $F_X(x)$ is a non-decreasing positive function with values smaller than 1.
- $\displaystyle\lim_{x\rightarrow -\infty} F_X(x)= 0$
- $\displaystyle\lim_{x\rightarrow +\infty} F_X(x)= 1$
- for 2 values $a$ and $b$, $P(a\leq X \leq b) = F_X(b)-F_X(a) $

![title](fx_cont.png)