# <a id='toc1_'></a>[Commonly Used Distributions in Numpy Library](#toc0_)

In [53]:
import numpy as np
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns


**Table of contents**<a id='toc0_'></a>    
- [Normal Distribution](#toc2_)    
- [Binomial Distribution](#toc3_)    
- [Poisson Distribution](#toc4_)    
- [Uniform Distribution](#toc5_)    
- [Logistic Distribution](#toc6_)    
- [Multinomial Distribution](#toc7_)    
- [Exponential Distribution](#toc8_)    
- [Chi Square Distribution](#toc9_)    
- [Rayleigh Distribution](#toc10_)    
- [Pareto Distribution](#toc11_)    
- [Zipf Distribution](#toc12_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc2_'></a>[Normal Distribution](#toc0_)

- **Use Case:** The normal distribution is used in statistics to represent real-valued random variables of unknown distributions. It is the most important probability distribution in statistics because it fits many natural phenomena like heights, blood pressure, measurement errors, and IQ scores.

- **Example:** If the heights of a large group of individuals are measured, it can be expected that the distribution of values would follow a normal distribution.

- **Formula:** The probability density function of a normal distribution is given by:

    $$ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} } $$

    where:
    - \( \mu \) is the mean of the distribution
    - \( \sigma \) is the standard deviation of the distribution

- **Code:**
   


In [54]:
# from scipy.stats import norm

# # Define the mean and standard deviation
# mu = 0
# sigma = 1

# # Generate a range of x values
# x = np.linspace(-10, 10, 1000)

# # Generate the normal distribution for the range
# y = norm.pdf(x, mu, sigma)

# # Plot the distribution
# plt.plot(x, y)
# plt.title('Normal Distribution')
# plt.show()


- **Assumptions/Constraints:**
    - The mean, mode and median are all equal.
    - The curve of the distribution is bell-shaped and symmetrical about the mean.
    - The total area under the curve is 1.
    - Approximately 68% of the data falls within one standard deviation of the mean, 95% falls within two standard deviations, and 99.7% falls within three standard deviations.


# <a id='toc3_'></a>[Binomial Distribution](#toc0_)
- **Use Case:** The binomial distribution model deals with finding the probability of success of an event which has only two possible outcomes in a series of experiments. For example, tossing of a coin always gives a head or a tail. The probability of finding exactly 3 heads in tossing a coin repeatedly for 10 times is estimated during the binomial distribution.
- **Example:** Flipping a coin and counting the number of heads.
- **Formula:**
  - Probability Mass Function (PMF): 
    $$P(X=k) = C(n, k) p^k (1-p)^{n-k}$$
  - Where:
    - \( n \) is the number of trials
    - \( k \) is the number of successes
    - \( p \) is the probability of success
    - \( C(n, k) \) is the binomial coefficient
- **Code:**


In [55]:
# x = random.binomial(n=10, p=0.5, size=10)

# sns.distplot(random.binomial(n=10, p=0.5, size=1000), hist=True, kde=False)


- **Assumptions/Constraints:**
  - There are only two possible outcomes in a trial- either a success or a failure.
  - Each trial is identical.
  - The probability of success and failure is same for all trials.
  - Each trial is independent, because the outcome of the previous toss doesn't determine or affect the outcome of current toss.



# <a id='toc4_'></a>[Poisson Distribution](#toc0_)
- **Use Case:** The Poisson distribution is popular for modelling the number of times an event occurs in an interval of time or space. It can be used to model events such as the number of emails arriving in your mailbox in one hour or the number of customers arriving at a salon in one day.
-

- **Example:** The number of emails arriving in your mailbox in one hour.
- **Formula:**
  - Probability Mass Function (PMF): 
    $$P(X=k) = \frac{e^{-\mu} \mu^k}{k!}$$
  - Where:
    - \( \mu \) is the average rate (mean number of occurrences) 
    - \( k \) is the actual number of successes
- **Code:**


In [56]:
# mu = 0.6  # mean
# s = np.random.poisson(mu, 10000)

# comparison between normal dist and poisson dist
# sns.distplot(random.normal(loc=50, scale=7, size=1000), hist=False, label='normal')
# sns.distplot(random.poisson(lam=50, size=1000), hist=False, label='poisson')



- **Assumptions/Constraints:**
  - Any successful event should not influence the outcome of another successful event.
  - The probability of success over a short interval must equal the probability of success over a longer interval.
  - The probability of success in an interval approaches zero as the interval becomes smaller.



# <a id='toc5_'></a>[Uniform Distribution](#toc0_)
- **Use Case:** The uniform distribution is a probability distribution where each value within a certain range is equally likely to occur and values outside of the range never occur. If we make a density plot of a uniform distribution, it appears flat because no value is any more likely (and hence has any more density) than another.
- **Example:** Rolling a single die. Each outcome (1,2,3,4,5,6) is equally likely.
- **Formula:**
  - Probability Density Function (PDF): 
    $$f(x) = \frac{1}{b - a}$$
  - Where:
    - \( a \) is the minimum value
    - \( b \) is the maximum value
- **Code:**


In [57]:
# s = np.random.uniform(-1,0,1000)

# sns.distplot(random.uniform(size=1000), hist=False)



- **Assumptions/Constraints:**
  - Data is uniformly distributed across the range.
  - All intervals of the same length are equally probable.



# <a id='toc6_'></a>[Logistic Distribution](#toc0_)
- **Use Case:** The logistic distribution is used in logistic regression to model the relationship between a binary dependent variable and one or more independent variables. It is also used in economics to model growth.
- **Example:** Predicting whether a student will pass or fail an exam based on the number of hours they study.
- **Formula:**
  - Probability Density Function (PDF): 
    $$f(x) = \frac{e^{-(x-\mu)/s}}{s(1+e^{-(x-\mu)/s})^2}$$
  - Where:
    - \( \mu \) is the location parameter (mean)
    - \( s \) is the scale parameter (standard deviation)
- **Code:**


In [58]:
# x = random.logistic(loc=1, scale=2, size=(2, 3))

# # comparing logistic and normal
# sns.distplot(random.normal(scale=2, size=1000), hist=False, label='normal')
# sns.distplot(random.logistic(size=1000), hist=False, label='logistic')



- **Assumptions/Constraints:**
  - Data follows a logistic distribution.
  - The logistic distribution is unimodal.


# <a id='toc7_'></a>[Multinomial Distribution](#toc0_)
- **Use Case:** This distribution is used when you have multiple outcomes and you want to know the probability of each outcome occurring a certain number of times in a fixed number of trials.
- **Example:** The probability of getting 2 heads, 3 tails, and 5 blanks when flipping a three-sided coin 10 times.
- **Formula:** 
  - $$P(X_1 = x_1, X_2 = x_2, ..., X_k = x_k) = \frac{n!}{x_1!x_2!...x_k!}p_1^{x_1}p_2^{x_2}...p_k^{x_k}$$
- **Code:**


In [59]:
# x = random.multinomial(n=6, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6])



- **Assumptions/Constraints:** 
  - Each trial is independent.
  - The probability of each outcome remains constant from trial to trial.
  - The number of trials, n, is fixed.



# <a id='toc8_'></a>[Exponential Distribution](#toc0_)
- **Use Case:** This distribution is used to model the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate.
- **Example:** The amount of time (in minutes) that a call center worker spends on the phone with a customer.
- **Formula:** 
  - $$f(x|\lambda) = \lambda e^{-\lambda x}$$
- **Code:**


In [60]:
# x = random.exponential(scale=2, size=(2, 3))

# sns.distplot(random.exponential(size=1000), hist=False)


- **Assumptions/Constraints:** 
  - Events are independent.
  - The rate at which events occur is constant.
  - Two events cannot occur at exactly the same instant.



# <a id='toc9_'></a>[Chi Square Distribution](#toc0_)
- **Use Case:** This distribution is used in hypothesis testing and is the distribution of a sum of the squares of k independent standard normal random variables.
- **Example:** Testing whether two categorical variables are independent.
- **Formula:** 
  - $$f(x|k) = \frac{1}{2^{k/2}\Gamma(k/2)}x^{k/2-1}e^{-x/2}$$
- **Code:**


In [61]:
# x = random.chisquare(df=2, size=(2, 3))

# sns.distplot(random.chisquare(df=1, size=1000), hist=False)

- **Assumptions/Constraints:** 
  - The observations are independently drawn from a normal distribution.
  - The sample size is sufficiently large (usually, the rule of thumb is that each cell should have at least 5 count).



# <a id='toc10_'></a>[Rayleigh Distribution](#toc0_)
- **Use Case:** This distribution is used to model the distribution of the magnitude of a two-dimensional random vector whose coordinates are independent, identically distributed, zero-mean, Gaussian random variables. It's often used in signal processing.
- **Example:** The distribution of wind speeds.
- **Formula:** 
  - $$f(x|\sigma) = \frac{x}{\sigma^2}e^{-x^2/(2\sigma^2)}$$
- **Code:**


In [62]:

# from scipy.stats import rayleigh
# # define the parameters of the distribution
# scale = sigma
# # create a Rayleigh distribution
# rv = rayleigh(scale=scale)
# # get the probability density at a certain point
# rv.pdf(5)


# x = random.rayleigh(scale=2, size=(2, 3))

# print(x)
# sns.distplot(random.rayleigh(size=1000), hist=False)


- **Assumptions/Constraints:** 
  - The observations are independently drawn from a normal distribution.
  - The sample size is sufficiently large.



# <a id='toc11_'></a>[Pareto Distribution](#toc0_)
- **Use Case:** This distribution is used to model situations where a large number of small events coexist with a small number of large events (also known as the 80-20 rule).
- **Example:** The distribution of wealth in a society.
- **Formula:** 
  - $$f(x|k, x_m) = \frac{kx_m^k}{x^{k+1}}$$
- **Code:**


In [63]:
# from scipy.stats import pareto
# # define the parameters of the distribution
# b = 2
# # create a Pareto distribution
# rv = pareto(b)
# # get the probability density at a certain point
# rv.pdf(5)

# x = random.pareto(a=2, size=(2, 3))
# sns.distplot(random.pareto(a=2, size=1000), kde=False)



- **Assumptions/Constraints:** 
  - The minimum possible value, \(x_m\), is known.
  - The shape parameter, \(k\), is known.



# <a id='toc12_'></a>[Zipf Distribution](#toc0_)
- **Use Case:** This distribution is used to model the frequency of words in a language or the popularity of websites.
- **Example:** The frequency of words in a large text corpus.
- **Formula:** 
  - $$f(k; s, N) = \frac{1/k^s}{\sum_{n=1}^N (1/n^s)}$$
- **Code:**
  ```python


In [64]:
# # from scipy.stats import zipf
# # # define the parameters of the distribution
# # a = s
# # # create a Zipf distribution
# # rv = zipf(a)
# # # get the probability mass at a certain point
# # rv.pmf(5)

# x = random.zipf(a=2, size=1000)
# sns.distplot(x[x<10], kde=False)



- **Assumptions/Constraints:** 
  - The rank order of the data is known.
  - The exponent parameter, \(s\), is known.