# Statistical Distribution Theory
   
I. Population and Sample  
II. Univariate Probability Distribution  
III. Distribution Functions (PDF, PMF, CDF)  
IV. Measure of Distribution  

## Population and Sample

In the statistical analysis of data, we typically use data from a few selected samples to draw conclusions about the population from which these samples were taken. Correct study design should ensure that the sample data are representative of the population from which the samples were taken.

The main difference between a population and a sample has to do with how
observations are assigned to the data set.

**Population**: Includes all of the elements from a set of data.

**Sample**: Consists of one or more observations from the population.

More than one sample can be derived from the same population.

When estimating a parameter of a population, e.g., the weight of male Europeans, we typically cannot measure all subjects. We have to limit ourselves to investigating a (hopefully representative) random sample taken from this group. Based on the sample statistic, i.e., the corresponding value calculated from the sample data, we use statistical inference to find out what we know about the corresponding parameter in the population.

**Parameter**: Characteristic of a population, such as a mean or standard
deviation. Often notated using Greek letters.

**Statistic**: A measurable characteristic of a sample. Examples of
statistics are the mean value of the sample data, range of the sample data, deviation of the data from the sample mean, etc.

**Sampling Distribution**: The probability distribution of a given statistic based on a random sample.

**Statistical Inference**: Enables you to make an educated guess about a population parameter based on a statistic computed from a sample randomly drawn from that population.

Examples of parameters and statistics are given in below table. Population parameters are often indicated using Greek letters, while sample statistics typically use standard letters.

|  | Population Parameter | Sample Statistic |
| :---: | :---: | :---: |
| Mean | $\mu$ | $\bar{x}$ |
| Standard Deviation | $\sigma$ | $s$ |


### Expected Value and Standard Deviation

The **expected value** is often referred to as the "*long-term*" average or mean. This means that over the long term of doing an experiment over and over, we would expect this average.

**The Law of Large Numbers** stats that, as the number of trials in a probability experiment increases, the difference between the theoretical probability of an event and the relative frequency approaches zero (the theoretical probability and the relative frequency get closer and closer together). 

**Standard deviation** is a measure of dispersement (variability) in statistics. "Dispersement" tells us how much the data is spread out. Specifically, it shows us how much the data is spread out around the mean or average. 

The equations for expected value and standard deviation for different probability distributions can be different. 


## Probability Distribution

**Probability Distribution** is a mathematical tool that is used to describe the distribution of numerical data in population and samples. Generally speaking, there are two main types of probability distributions, "Continuous Distribution" and "Discrete Distribution".

![](images/distributions.png)

### Continuous Distribution

With a continuous distribution, you can expect to get any value within a range. Think about measuring the length of something. The reported measurement can always be more or less precise.

For example, the weight of a person can be any positive number. In this case, the curve describing the probability for each value, i.e., the probability distribution, is a continuous function, the *probability density function* (PDF).

The PDF, or density of a continuous random variable, is a function that describes the relative likelihood of a random variable X to take on a given value x. In the mathematical fields of probability and statistics, a random variate x is a particular outcome of a random variable X: the random variates which are other outcomes of the same random variable might have different values.

![continuous distribution](https://intellipaat.com/wp-content/uploads/2015/09/probability-for-a-continuous-random-variable-460x264.png)

#### Examples of Continuous Distributions:

- **Continuous Uniform**
    - A continuous distribution that takes values within a specified range *a* to *b*, when each value within the range is equally likely.
    - e.g. the time it takes an elevator arrive at your floor.

- **Normal (Gaussian)**
    - A probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean.
    - e.g. individual height in a population
    
- **Exponential**
    - Used to model the time elapsed between events.
    - e.g. amount of time a postal clerk spends with a customer


### Discrete Distribution

With discrete distributions, you can only get certain specific values, not all values in a range. Take, for example, a roll of a single six-sided die.

There are 6 possible outcomes of the roll. The probability (odds) that on the throw of a die the side showing the number $i$ faces upward, $p_i$, is

| Roll ($i$) | 1 | 2 | 3 | 4 | 5 | 6 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Odds ($P_i$) | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 | 1/6 |

The set of all these probabilities {$p_i$} makes up the **probability distribution** for the die roll. And the sum of the probability from all possible outcomes equals 1.

$$\sum_{i=1}^6 P_i = 1$$ 

![discrete distribution](https://miro.medium.com/max/1400/1*MDQLxss6LQm4cZizkGZbkw.png)

#### Examples of Discrete Distributions:

- **Uniform Distribution**
    - Occurs when all possible outcomes are equally likely.
    - e.g. rolling a six-sided die
    
- **Bernoulli Distribution**
    - Represents the probability of success for a certain experiment (binary outcome).
    - e.g. flipping a fair coin
    
- **Binomial Distribution**
    - Represents the probability of observing a specific number of successes (Bernoulli trials) in a specific number of trials.
    - e.g. number of defects found from a 100-random sample from the production line
    
### Center and Spread of a Distribution

#### Expected Value / Mean

The expected value, or the mean, describes the 'center' of the distribution (you may hear this called the first moment).  The 'center' refers loosely to the middle-values of a distribution, and is measured more precisely by notions like the mean, the median, and the mode.

For a discrete distribution, working from the vantage point of a collected sample of n data points:

$$mean = \mu = \frac{\Sigma^n_{i = 1}x_i}{n}$$

If we are working from the vantage point of known probabilities, the mean is referred to as the expected value. The expected value of a discrete distribution is the weighted sum of all values of x, where the weight is their probability.
 
The expected value of the Lotto example is:
${\displaystyle \operatorname {E} [X]= \Sigma^n_{i=1}p(x_i)x_i}$

#### Variance / Standard Deviation

Variance describes the spread of the data (it is also referred to as the second moment).  The 'spread' refers loosely to how far away the more extreme values are from the center.

Standard deviation is the square root of variance, and effectively measures the *average distance away from the mean*.

From the standpoint of a sample, the variance of a discrete distribution of n data points is:

$$std = \sigma = \sqrt{\frac{\Sigma^n_{i = 1}(x_i - \mu)^2}{n}}$$


Variance is the expectation of the squared deviation of a random variable from its mean.

For our Lotto PMF, that means:

$$E((X-\mu)^2) = \sigma^2 = \Sigma^n_{i=1}p(x_i)(x_i - \mu)^2$$

## Distribution Functions (PDF, PMF, CDF)

### Probability Density Function (Continuous Distribution)

Probability density functions are similar to PMFs, in that they describe the probability of a result within a range of values. But where PMFs are appropriate for discrete variables and so can be descibed with barplots, PDFs are smooth curves that describe continuous random variables.

We can think of a PDF as a bunch of bars of probabilities getting smaller and smaller until each neighbor is indistinguishable from its neighbor.

It is then intuitive that you cannot calculate expected value and variance in the same way as we did with PMFs. Instead, we have to integrate over the entirety of the curve to calculate the expected value.

A normal distribution with zero mean:

![continuous normal](../images/normal_pdf.png)

### Probability Mass Function (Discrete Distribution)

The probability mass function (PMF) for a random variable gives, at any value *k*, the probability that the random variable takes the value *k*. Suppose, for example, that I have a jar full of lottery balls containing:

- 50 "1"s,
- 25 "2"s,
- 15 "3"s,
- 10 "4"s

We then represent this function in a bar plot like so:

![discrete_distribution](../images/discrete_distribution.png)

**Note:** The main visual tool used to describe the PMF is a bar plot, which is often confused with a histogram. Since a discrete variable is not continuous (which means we can have 2 eggs, 3 eggs, 4 eggs, ..., but we can't have 2.567 eggs in the data), the bar plot allows the frequency count specific to the discrete value in the data.

### Cumulative Distribution Function (Continuous & Discrete Distributions)

The cumulative distribution function describes the probability that your result will be of a value equal to or below a certain value. It can apply to both discrete or continuous functions.

For the scenario above, the CDF would describe the probability of drawing a ball equal to or below a certain number.

In order to create the CDF from a sample, we:

- align the values from least to greatest
- for each value, count the number of values that are less than or equal to the current value
- divide that count by the total number of values

The CDF of the Lotto example plots how likely we are to get a ball less than or equal to a given example.

![discrete cdf](../images/discrete_cdf.png)

Note: The CDF for discrete variable is usually a step function.

For continuous random variables, obtaining probabilities for observing a specific outcome is not possible. Be careful with interpretation of PDF

We can use the CDF to learn the probability that a variable will be less than or equal to a given value.

Typically, you'll see something like this equation associated with the CDF:

$$F(x) = P(X\leq x)$$

Example of Height in US:

![continuous cdf](../images/continuous_cdf.png)

Note: The CDF for continuous variable is a smooth curve instead of a step function.

## Measures of Distribution

### Measure of Skewness

Probability distributions can have skew, meaning they have more mass further from the mean on one side of the distribution than on another. A skew of zero is perfectly symmetrical about the mean.   

![skew](../images/skew.png)

### Measure of Kurtosis

Kurtosis (Ku) is a measure of relative peakedness of a distribution. It is a shape parameter that characterizes the degree of peakedness. A distribution is said to be leptokurtic when the degree of peakedness is greater than 3, it is mesokurtic when the degree of peakedness is equal to 3, and it is platykurtic when the degree of peakedness is less than 3. The degree of kurtosis, Ku, is given by:

![kurtosis](../images/kurtosis.png)

### Data Transformations

We may want to transform our skewed data to make it approach symmetry.

Common transformations of this data include

- Root Transformation: $x \rightarrow \sqrt[n]{x}$

- Logarithmic Transformation: $x \rightarrow log_n x$

- Power Transformation: $x \rightarrow x^n$

