<a href="https://colab.research.google.com/github/nee1k/ab_testing/blob/main/Population_Distributions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Binomial Distribution

The Binomial distribution models the probability of exactly *k* successes in *n* independent trials, each with a probability *p* of success.

**Formula:**

$$
P(X = k) = \binom{n}{k} \cdot p^k \cdot (1 - p)^{n - k}
$$

**Example:**  
Suppose we show an ad to 10 users (*n = 10*), each with a 20% chance (*p = 0.2*) of clicking.  
What is the probability that exactly 3 users click the ad (*k = 3*)?


In [5]:
from math import comb

# Parameters
n = 10       # number of trials
k = 3        # number of successes
p = 0.2      # probability of success

# Binomial probability calculation
P_X_equals_k = comb(n, k) * (p ** k) * ((1 - p) ** (n - k))
print(f"P(X = {k}) = {P_X_equals_k:.3f}")

P(X = 3) = 0.201


## Poisson Distribution

The Poisson distribution models the probability of observing exactly *k* events in a fixed interval of time or space when the events occur independently and at a constant average rate *λ*.

**Formula:**

$$
P(X = k) = \frac{{\lambda^k \cdot e^{-\lambda}}}{{k!}}
$$

**Example:**  
On average, there are 4 comments posted per minute (λ = 4).  
What is the probability of receiving exactly 6 comments in a minute (k = 6)?


In [6]:
import math

# Parameters
λ = 4       # average number of comments per minute
k = 6       # desired number of comments

# Poisson probability calculation
P_X_equals_k = (λ ** k) * math.exp(-λ) / math.factorial(k)
print(f"P(X = {k}) = {P_X_equals_k:.3f}")


P(X = 6) = 0.104


## Normal Distribution

The Normal distribution is a continuous probability distribution with a bell-shaped curve, symmetric around the mean μ. It is commonly used to model naturally occurring variables such as user session duration.

**Formula (Probability Density Function):**

$$
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \cdot \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)
$$

**Example:**  
Suppose the time spent on Instagram per session follows a normal distribution with a mean (μ) of 25 minutes and standard deviation (σ) of 5 minutes.  
What is the probability that a randomly selected session lasts less than 30 minutes?


In [7]:
from scipy.stats import norm

# Parameters
mu = 25      # mean session duration
sigma = 5    # standard deviation
x = 30       # time threshold

# Probability that session is less than 30 minutes
probability = norm.cdf(x, loc=mu, scale=sigma)
print(f"P(X < {x}) = {probability:.3f}")


P(X < 30) = 0.841


## Exponential Distribution

The Exponential distribution models the time between **independent events** that occur at a **constant average rate** λ (lambda). It is useful for modeling "time until" events, such as user actions or server requests.

**Formula (Probability Density Function):**

$$
f(x; \lambda) = \lambda e^{-\lambda x}, \quad x \geq 0
$$

**Example:**  
Let’s assume messages in a group chat arrive at an average rate of 2 messages per minute (λ = 2).  
What is the probability that the next message will arrive in **less than 30 seconds (0.5 minutes)**?


In [8]:
from scipy.stats import expon

# Parameters
rate_lambda = 2   # average rate (messages per minute)
x = 0.5           # time in minutes

# Probability that next message arrives within 0.5 minutes
probability = expon.cdf(x, scale=1/rate_lambda)
print(f"P(X < {x} minutes) = {probability:.3f}")


P(X < 0.5 minutes) = 0.632


## Uniform Distribution

The Uniform distribution is used when **all outcomes in a range are equally likely**. This is often applied to randomized scheduling, simple simulations, and assumptions of equal likelihood.

**Formula (Probability Density Function):**

$$
f(x) = \frac{1}{b - a}, \quad a \leq x \leq b
$$

**Example:**  
A push notification can be sent at any random time between 1 PM and 2 PM (i.e., between minute 0 and 60).  
What is the probability that a notification is sent within the **first 20 minutes**?


In [9]:
from scipy.stats import uniform

# Parameters
a = 0         # start of the interval (0 minutes past 1 PM)
b = 60        # end of the interval (60 minutes = 2 PM)
x = 20        # time threshold we're interested in

# Probability that the event occurs before x minutes
probability = uniform.cdf(x, loc=a, scale=b-a)
print(f"P(X < {x} minutes) = {probability:.3f}")


P(X < 20 minutes) = 0.333
