# Probability distributions

Probability distributions help you understand the distribution of values and calculate probabilities.

## Table of Contents

- [Random Variables and Probability Distribution](#rvars)
    - [Discrete Probability Distributions](#dpd)
    - [Continuous Probability Distributions](#cpd)
- [xxxSummary Statistics](#sum)
    - [xxxMeasures of Central Tendency](#sum-central)
    - [xxxMeasures of Variability](#sum-var)
    - [xxxCorrelation (coefficient)](#corr)
- [xxxResources](#res)

<img src="images/stat-dpd.png" alt="" style="width: 400px;"/>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

---
<a id='rvars'></a>

## Random Variables

In statistics, **random variables** are characteristics that you can observe, but you don’t control them. They can be a `characteristic, measurement, or a count that varies randomly according to a function`. **Random** in this context indicates that you don’t know the value of the next observation, but you do know the probability associated with values and ranges of values.

## Probability Distribution

A **probability distribution** is a mathematical function that describes the probabilities for all possible outcomes of a **random variable**. In other words, the frequency of the observed values varies based on the underlying **probability distribution**.

Properties of **distributions in histograms**, and **probability distributions** are similar. They have a `shape`, `center`, and `spread`. However, the focus for probability distributions is on the `probabilities of the outcomes`. Importantly, **probability distributions** describe populations while **histograms** represent samples.

**Probability distributions** indicate the likelihood of an event or outcome. Statisticians use the following notation to describe probabilities: `p(x) =` the likelihood that random variable takes a specific value of x. The sum of all probabilities for all possible values must equal `1`. Furthermore, the probability for a particular value or range of values must be between `0` and `1`, inclusive.

**Probability distributions** describe the dispersion of the values for a random variable. Consequently, `the kind of variable determines the type of probability distribution`. For a single random variable, statisticians divide distributions into the following two types:

- **Discrete probability distributions** for discrete variables
- **Probability density functions** for continuous variables

---
<a id='dpd'></a>

## Discrete Probability Distributions

**Discrete probability functions** are also known as **probability mass functions** and can assume a set of distinct values. 

For **discrete probability distribution functions**, each possible value has a non-zero likelihood. Furthermore, the probabilities for all possible values must sum to one. Because the total probability is 1, one of the values must occur for each opportunity.

If the **discrete distribution** has a finite number of values, you can dis-play all the values with their corresponding probabilities in a table.

<img src="images/stat-dpd.png" alt="" style="width: 300px;"/>

### Types of Discrete Distribution

There are a variety of **discrete probability distributions** that you can use to model different types of data. The correct discrete distribution depends on the properties of your data. For example, use the:

- **Binomial distribution** to model binary data, such as coin tosses.
- **Poisson distribution** to model count data, such as the count of library book checkouts per hour.
- **Uniform distribution** to model multiple events with the same probability, such as rolling a die.

#### Binomial and Other Distributions for Binary Data

Binary data occur when you can place an observation into only two categories. It tells you that an event occurred or that an item has a particular characteristic. For instance, sale or no sale, pass or fail result.

Binary data allow you to `calculate proportions and percentages` easily. What is the proportion of items that pass the inspection? What percentage of customers make a purchase?

To use the `binomial`, `geometric`, `negative binomial`, and the `hypergeometric` distributions, you need to satisfy the following assumptions:

1. **There are only two possible outcomes per trial**. For example, accept or reject, sale or no sale, etc.
2. **Each trial is independent** (except for hypergeometric). The result of one trial does not affect the results of another trial. For instance, when flipping a coin, the outcome of a coin toss doesn’t influence the next coin toss.
3. **The probability remains constant over time** (except for hypergeometric). In some cases, this assumption is valid based on the physical properties, such as flipping a coin. However, if there is a chance the probability can change over time, you can use the **P chart** (a control chart) to confirm this assumption. For example, the likelihood that a process produces defective products might change over time.

The **binomial**, **geometric**, **negative binomial**, and **hypergeometric** distributions describe the probabilities associated with the number of events and when they occur.

#### Binomial Distribution

Use the **binomial distribution** to calculate probabilities that an event occurs a certain number of times in a set number of trials. Specifically, it calculates the probability of X events happening within N trials.

<img src="images/stat-dpd2.png" alt="" style="width: 400px;"/>

The graph displays the probability of rolling a 6 each number of times when you roll the die ten times. The shaded area sums the probabilities for four events and higher to calculate this **cumulative probability**. The **cumulative probability** of rolling at least four 6s is 0.06977.

#### Geometric Distribution

Use the **geometric distribution** when you know the probability of an event occurring and want to calculate the likelihood of the event first occurring during a specific trial. In other words, if you keep drawing random samples, what is the probability of the event/characteristic first appearing on each draw?

<img src="images/stat-dpd3.png" alt="" style="width: 400px;"/>

Each bar in the graph represents the probability of rolling the first six on a specific trial. For instance, the likelihood of rolling the first 6 on the third roll specifically is 0.11. The red shaded region indicates that you have a 33% cumulative chance of rolling the first 6 on the 7th roll or later.

#### Negative Binomial Distribution

Use the **negative binomial distribution** to calculate the number of tri- als that are required to observe the event a specific number of times. In other words, given a known probability of an event occurring and the number of events that you specify, this distribution calculates the probability for observing that number of events within N trials.

<img src="images/stat-dpd4.png" alt="" style="width: 400px;"/>

In the plot, each bar represents the probability of rolling precisely five 6s in the specified number of rolls. For example, the maximum likeli- hood (0.04) of rolling exactly five 6s occurs at 24 rolls, which is the peak of the histogram. Additionally, the shaded area indicates that the cumulative probability of obtaining five 6s in the first 27 rolls is nearly 0.5.

#### Hypergeometric Distribution

Use the hypergeometric distribution when you are drawing from a small population without replacement, and you want to calculate probabilities that an event occurs a certain number of times in a set amount of trials. Like the binomial distribution, the hypergeometric distribution calculates the probability of X events in N trials. How- ever, unlike the binomial distribution, it does not assume that the like- lihood of an event’s occurrence is constant. Instead, the hypergeometric distribution assumes that the probability changes be- cause you are drawing from a small population without replacement.

We’ll draw candy blindly from a jar. Suppose there are 15 candies of various colors in the jar and our favorite candies are red. For this scenario, the binary data values are “red” and “not red.” At the start, 5 out of the 15 (33%) candies are red. We’ll use the **hypergeometric distribution** to calculate the probabilities of drawing red candies when we draw five candies from the jar. The `probabilities in this scenario are not constant` because each draw from the jar affects the probabilities for the next draw.

<img src="images/stat-dpd5.png" alt="" style="width: 400px;"/>

The graph displays the probability of drawing each possible number of red candies when you draw 5 candies altogether.

---
<a id='cpd'></a>

## Continuous Probability Distributions

**Continuous probability functions** are also known as **probability density functions**. You know that you have a continuous distribution if the variable can assume an infinite number of values between any two values. **Continuous variables** are often measurements on a scale, such as height, weight, and temperature.

`Unlike discrete probability distributions where each particular value has a non-zero likelihood, specific values in continuous distributions have a zero probability`. For example, the likelihood of measuring a temperature that is exactly 32 degrees is zero.

Why? Consider that the temperature can be an infinite number of other temperatures that are infinitesimally higher or lower than 32. Statisticians say that an individual value has an infinitesimally small probability that is equivalent to zero.

### How to Find Probabilities for Continuous Data

`Probabilities for continuous distributions are calculated for ranges of values rather than single points`. A probability indicates the likelihood that a value will fall within an interval. This property is simple to demonstrate using a **probability distribution plot**.

`On a probability plot, the entire area under the distribution curve equals 1. This fact is equivalent to how the sum of all probabilities must equal one for discrete distributions`. The proportion of the area under a curve that falls within a range of values along the X-axis represents the likelihood a value will fall within that range. Finally, you can’t have an area under the curve with only a single value, which explains why the probability equals zero for an individual value.

### Characteristics of Continuous Probability Distributions

Just as there are different types of discrete distributions for different kinds of discrete data, there are different distributions for continuous data. Each probability distribution has parameters that define its shape. Most distributions have between 1-3 parameters. Specifying these parameters establishes the shape of the distribution and all of its probabilities entirely. These parameters represent essential properties of the distribution, such as the central tendency and the variability.

The most well-known continuous distribution is the **normal distribution**, which is also known as **the Gaussian distribution** or the **“bell curve.”** This symmetric distribution fits a wide variety of phenomena, such as human height and IQ scores. It has two parameters - the mean and the standard deviation. 

The **Weibull distribution** and the **lognormal distribution** are other common continuous distributions. Both of these distributions can fit skewed data.

**Distribution parameters** are values that apply to entire populations. Unfortunately, **population parameters** are generally unknown because it’s usually impossible to measure an entire population. However, `you can use random samples to calculate estimates of these parameters and use them with probability distributions`.

`To determine which distribution provides the best fit for your sample data, you’ll need to perform hypothesis tests and use special graphs`.

#### Normal Probability Distribution

The **normal distribution** is the most important probability distribution in statistics because it fits many natural phenomena. For example, heights, blood pressure, measurement error, and IQ scores follow the **normal distribution**. It is also known as the **Gaussian distribution** and the **bell curve**.

The **normal distribution** is a probability function that describes how the values of a variable are distributed. It is a symmetric distribution where most of the observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely.

<img src="images/stat-cpd3.png" alt="" style="width: 400px;"/>

As you can see, the distribution of heights follows the typical pattern for all normal distributions. Most girls are close to the **average** (1.512 meters). Small differences between an individual’s height and the mean occur more frequently than substantial deviations from the mean. The **standard deviation** is 0.0741m, which indicates the typical distance that individual girls tend to fall from mean height.

As with any **probability distribution**, the **parameters** for the **normal distribution** define its shape and probabilities entirely. The **normal distribution** has two parameters, the **mean** and **standard deviation**.

The **mean** is the `central tendency of the distribution`. It defines the location of the peak for normal distributions. Most values cluster around the **mean**. 

<img src="images/stat-cpd4.png" alt="" style="width: 400px;"/>

The **standard deviation** is a `measure of variability`. It defines the width of the normal distribution. The **standard deviation** determines how far away from the mean the values tend to fall. It represents `the typical distance between the observations and the average`.

<img src="images/stat-cpd5.png" alt="" style="width: 400px;"/>

#### Population parameters versus sample estimates

The **mean** and **standard deviation** are **parameter values** that apply to entire populations. For the normal distribution, statisticians signify the parameters by using the Greek symbol `μ` (mu) for the population mean and `σ` (sigma) for the population standard deviation.

Unfortunately, **population parameters** are usually unknown because it’s generally impossible to measure an entire population. However, `you can use random samples to calculate estimates of these parameters`. Statisticians represent sample estimates of these parameters using `x̅` for the **sample mean** and `s` for the **sample standard deviation**.

#### Properties of the Normal Distribution

Despite the different shapes, all forms of the normal distribution have the following characteristic properties.

- They’re all symmetric. The normal distribution **cannot model skewed distributions**.
- The mean, median, and mode are all equal.
- Half of the population is less than the mean and half is greater than the mean.
- The **Empirical Rule** allows you to determine the proportion of values that fall within certain distances from the mean.

#### The Empirical Rule

When you have normally distributed data, the **standard deviation** becomes particularly valuable. You can use it to determine the proportion of the values that fall within a specified number of standard deviations from the mean.

<img src="images/stat-cpd6.png" alt="" style="width: 400px;"/>

Assume that a pizza restaurant has a mean delivery time of 30 minutes and a standard deviation of 5 minutes. Using the Empirical Rule, we can determine that 68% of the delivery times are between 25-35 minutes (30 +/- 5), 95% are be- tween 20-40 minutes (30 +/- 2*5), and 99.7% are between 15-45 minutes (30 +/-3*5).

<img src="images/stat-cpd7.png" alt="" style="width: 400px;"/>

#### Standard Normal Distribution and Standard Scores



##### Example

The distribution of IQ scores is defined as a **normal distribution** with a mean of 100 and a standard deviation of 15. We’ll create the probability plot of this distribution. Additionally, let’s determine the likeli- hood that an IQ score will be between 120-140.

<img src="images/stat-cpd.png" alt="" style="width: 400px;"/>

It is a symmetric distribution where values occur most frequently around 100, which is the mean. The probabilities drop-off equally as you move away from the mean in both directions. The shaded area for the range of IQ scores between 120-140 contains 8.738% of the total area under the curve. Therefore, the likelihood that an IQ score falls within this range is 0.08738.

#### Lognormal Probability Distribution

Suppose you are told that the body fat percentages for teenage girls follow a lognormal distribution with a location of 3.32317 and a scale of 0.24188. Furthermore, you’re asked to determine the probability that body fat percentage values will fall between 20-24%.

<img src="images/stat-cpd2.png" alt="" style="width: 400px;"/>

It is a right-skewed distribution, and the most common values fall near 26%. Furthermore, our range of interest falls below the curve’s peak and contains 18.64% of the occurrences.


---
<a id='res'></a>

# Resources

- [Statistics by Jim](https://statisticsbyjim.com/)
- [onlinemathlearning.com](https://www.onlinemathlearning.com)