A statistical distribution is a representation of the frequencies of potential events or the percentage of time each event occurs.

![image.png](attachment:image.png)

**Discrete distribution**: The number of outcomes is finite and the outcome is a set of values. When dealing with discrete data you use a Probability Mass Function (PMF) (as in our dice example). 

EXAMPLES OF DISCRETE DISTRIBUTIONS:
    The Bernoulli Distribution
    The Bernoulli distribution represents the probability of success for a certain experiment (the outcome being "success or not", so there are two possible outcomes). A coin toss is a classic example of a Bernoulli experiment with a probability of success 0.5 or 50%, but a Bernoulli experiment can have any probability of success between 0 and 1.

    The Poisson Distribution
    The Poisson distribution represents the probability of $n$ events in a given time period when the overall rate of occurrence is constant. A typical example is pieces of mail. If your overall mail received is constant, the number of items received on a single day (or month) follows a Poisson distribution. Other examples might include visitors to a website, or customers arriving at a store, or clients waiting to be served in a queue.

    The Uniform Distribution
    The uniform distribution occurs when all possible outcomes are equally likely. The dice example shown before follows a uniform distribution with equal probabilities for throwing values from 1 to 6. The dice example follows a discrete uniform distribution, but continuous uniform distributions exist as well.

**A probability mass function (PMF)**, sometimes referred to as a frequency function, is a function that associates probabilities with discrete random variables. You already learned about this in the context of coin flips and dice rolls. The discrete part in discrete distributions means that there is a known number of possible outcomes.

PMF Intuition
Let's work through a brief example calculating the probability mass function for a discrete random variable!

You have previously seen that a probability is a number in the range [0,1] that is calculated as the frequency expressed as a fraction of the sample size. This means that, in order to convert any random variable's frequency into a probability, we need to perform the following steps:

Get the frequency of every possible value in the dataset
Divide the frequency of each value by the total number of values (length of dataset)
Get the probability for each value

*You can inspect the probability mass function of a discrete variable by visualizing the distribution using matplotlib.

NOTE: In some literature, the PMF is also called the probability distribution. The phrase distribution function is usually reserved exclusively for the cumulative distribution function CDF.

***To double-check that your PMF function was run correctly:***
import numpy as np 
np.array(pmf).sum()
--> must equal 1

In [None]:
Example: calculating PMF
    
sum_class = sum(size_and_count.values())
print(sum_class)

# Divide each class size value by the total number of classes
pmf = [round(value/sum_class, 3) for value in size_and_count.values()]    

sizes = list(size_and_count.keys())
sizes, pmf

# The output should be 1
import numpy as np 
np.array(pmf).sum()

# Calculate the expected value (mu) using formula above
mu = np.multiply(sizes, pmf).sum()

# Plot pmfs side by side
new_figure = plt.figure(figsize=(14, 5.5))

ax = new_figure.add_subplot(121)
ax2 = new_figure.add_subplot(122)

ax.bar(size_and_count.keys(), pmf, color = "black");
ax2.bar(size_and_count.keys(), pmf2, color="yellow");

ax.set_title ("Probability Mass Function - Actual");
ax2.set_title ("Probability Mass Function - Observed");

plt.show()

**Continuous distribution**: When dealing with continuous data, you use a Probability Density Function (PDF) (see our weather example).

EXAMPLES OF CONTINUOUS DISTRIBUTION:
    The Normal or Gaussian distribution
    A normal distribution is the single most important distribution, you'll basically come across it very often. The normal distribution follows a bell shape and is a foundational distribution for many models and theories in statistics and data science. A normal distribution turns up very often when dealing with real-world data including heights, weights of different people, errors in some measurement or grades on a test. Our temperature example above follows a normal distribution as well!

**Probability Density Function**: A Probability Density Function (PDF) helps identify the regions in the distribution where observations are more likely to occur, in other words, where the observation occurrence is more dense. 

Density Estimation and Plotting
As you've seen before, a density plot is a "smoothed" version of a histogram estimated from the observations. To estimate a density function from given continuous data, you can use parametric or non-parametric methods.

Parametric methods use parameters like mean and standard deviation of given data and attempt to work out the shape of the distribution that the data belongs to. These may implement maximum likelihood methods to fit a distribution to the given data. You'll learn more about this later.

Kernel density estimation or KDE is a common non-parametric estimation technique to plot a curve (the kernel) at every individual data point. These curves are then added to plot a smooth density estimation. The kernel most often used is a Gaussian (which produces a bell curve at each data point). Other kernels can be used in special cases when the underlying distribution is not normal.

![image.png](attachment:image.png)


## How does a Cumulative Density Function (CDF) work?
**Cumulative density function**: what "cumulative" means - you're simply adding up probabilities.

The CDF is a function of $x$ just like a PMF or a PDF, where $x$ is any value that can possibly appear in a given distribution. To calculate the $CDF(x)$ for any value of $x$, we compute the proportion of values in the distribution less than or equal to $x$ as follows:

$$\large F(x) = P(X \leq x)$$

> The Cumulative Distribution Function, CDF, gives the probability that the variable $X$ is less than or equal to a certain possible value $x$. 

The cumulative distribution functions for a dice roll and the weather in NYC are plotted below.


![image.png](attachment:image.png)

You'll notice that in general, CDFs are smooth curves for continuous random variables, where they are "step functions" when looking at discrete random variables. Looking at these curves, we can answer questions by looking at the y-axis.
