## Probability Distributions

In [149]:
# imports
import numpy as np
import scipy.stats as stats


### Random Variables

In [150]:
die_6 = range(1,7)
num_rolls = 5
rolls = np.random.choice(die_6, size=num_rolls, replace=True)
rolls

array([1, 2, 2, 2, 5])

### Mass Functions
- x: the value of interest
- n: the number of trials
- p: the probability of succes

In [151]:
# value of interest
x = 3
# sample size
n = 10

prob_1 = stats.binom.pmf(x, n, 0.5)
prob_1

0.11718750000000004

In [152]:
# probability of observing between 4 to 6 heads from 10 coin flips
prob_2 = stats.binom.pmf(4,n=10, p=.5) + stats.binom.pmf(5,n=10, p=.5) + stats.binom.pmf(6,n=10, p=.5)
print(prob_2)

0.6562500000000002


In [153]:
# probability of observing more than 2 heads from 10 coin flips
prob_3 = 1 - (stats.binom.pmf(0, n=10, p=0.5) + stats.binom.pmf(1, n=10, p=0.5) + stats.binom.pmf(2, n=10, p=0.5))
prob_3


0.9453124999999999

### Cumulative Distribution
we wanted to know the probability from 3 to 6, which includes 3. Mathematically, this looks like the following equation:

- P(X≤6) = CDF(X=6) = 0.83
- P(X≤2) = CDF(X=2) = 0.05

- P(X≤6) - P(X≤2) = P(3≤X≤6)
- CDF(X=6) - CDF(X=2) = P(3≤X≤6)
- 0.83 - 0.05 = 0.78

In [154]:
# 6 or fewer heads from 10 fair coin flips
stats.binom.cdf(6, 10, 0.5)

0.828125

In [155]:
# P(4 to 8 Heads) = P(0 to 8 Heads) − P(0 to 3 Heads)
stats.binom.cdf(8,10,0.5) - stats.binom.cdf(3,10,0.5)

0.8173828125

In [156]:
# more than 6 heads from 10 fair coin flips (Note that “more than 6 heads” does not include 6.)
1 - stats.binom.cdf(6, 10, 0.5)

0.171875

###  Density Functions
- x: the value of interest
- loc: the mean of the probability distribution
- scale: the standard deviation of the probability distribution

In [157]:
# the probability that a randomly chosen woman is less than 175 cm tall.
# stats.norm.cdf(x, loc, scale)
stats.norm.cdf(175, 167.64, 8)

0.8212136203856288

#### Probability Density Functions and Cumulative Distribution Function

In [158]:
# The weather in the Galapagos islands follows a Normal distribution with a mean of 20 degrees Celcius and a standard deviation of 3 degrees.
degrees_mean = 20
degrees_std = 3

# probability that the weather on a randomly selected day will be between 18 to 25 degrees Celcius 
stats.norm.cdf(25,degrees_mean,degrees_std) - stats.norm.cdf(18,degrees_mean, degrees_std)

0.6997171101802624

In [159]:
# probability that the weather on a randomly selected day will be greater than 24 degrees Celsius
1 - stats.norm.cdf(24,degrees_mean,degrees_std)

0.09121121972586788

### Poisson Distribution
The Poisson distribution is another common distribution, and it is used to describe the number of times a certain event occurs within a fixed time or space interval.

The Poisson distribution is defined by the rate parameter, symbolized by the Greek letter lambda, λ.

#### with pmf

In [160]:
# expected value = 10, probability of observing 6
stats.poisson.pmf(6,10)


0.06305545800345125

In [161]:
# expected value = 10, probability of observing 12-14
prob_4 = stats.poisson.pmf(12,10) + stats.poisson.pmf(13,10) + stats.poisson.pmf(14,10)
prob_4

0.21976538076223123

#### with cdf

In [162]:
# expected value = 10, probability of observing 6 or less
stats.poisson.cdf(6,10)

0.130141420882483

In [163]:
# expected value = 10, probability of observing 12 or more
1 - stats.poisson.cdf(11,10)

0.30322385369689386

In [164]:
# expected value = 10, probability of observing between 12 and 18
stats.poisson.cdf(18, 10) - stats.poisson.cdf(11, 10)

0.29603734909303947

#### Expectation of the Poisson Distribution

In [165]:
# generate random variable
# stats.poisson.rvs(lambda, size = num_values)
rvs = stats.poisson.rvs(10, size = 1000)
rvs.mean()

9.977

#### Spread of the Poisson Distribution
In probability distributions, variance measures the spread of values and probabilities. For the Poisson distribution, variance is equal to lambda (λ), meaning the expected value and variance are the same.

In [166]:
# We can calculate the variance of a sample using the numpy.var() method:
rand_vars = stats.poisson.rvs(4, size = 1000)
print(np.var(rand_vars))


4.173856


To observe the increase in possible values, we can consider the range of a sample, which is the difference between the minimum and maximum values in a set. For example, using Python, we can draw 1000 random variables from a Poisson distribution with lambda = 4 and print the minimum and maximum values with .min() and .max() functions:

In [167]:
min(rand_vars), max(rand_vars)


(0, 12)

In [168]:
rand_vars_2 = stats.poisson.rvs(10, size =1000)
min(rand_vars_2), max(rand_vars_2)


(2, 26)

### Expected Value of the Binomial Distribution


In [169]:
# A certain basketball player has an 85% chance of making a given free throw and takes 20 free throws.
expected_baskets = 20*0.85
expected_baskets

17.0

#### Variance of the Binomial Distribution
The variance of a binomial distribution is how much the expected value of success may vary.
The variance of a single coin flip will be the probability that the success happens times the probability that it does not happen: p·(1-p), or 0.5 x 0.5. Because we have n = 10 number of coin flips, the variance of a single fair coin flip is multiplied by the number of flips. Thus we get the equation:

- Variance(#ofHeads)=Var(X)=n×p×(1−p)
- Variance(#ofHeads)=10×0.5×(1−0.5)=2.5
​	 

In [170]:
variance_baskets = 20 * 0.85 * 0.15
variance_baskets

2.55

### Properties of Expectation and Variance
#### Properties of Expectation
- $E(X+Y)=E(X)+E(Y)$
- $E(aX)=aE(X)$
- $E(X+a)=E(X)+a$

#### Properties of Variance
- $Var(X+a)=Var(X)$
- $Var(aX)=a^2Var(X)$
- $Var(X+Y)=Var(X)+Var(Y)$ (This principle ONLY holds if the X and Y are independent random variables.)





In [171]:
num_goals =  stats.poisson.rvs(4,size = 100)
np.var(num_goals)


4.3275

The variance of num_goals_2 is equal to the variance of num_goals times two squared

In [172]:
num_goals_2 = num_goals * 2
np.var(num_goals_2)

17.31

### Review
- The Poisson distribution and its parameter lambda (λ)
- How the probability mass function of the Poisson distribution changes with different values of lambda (λ)
- Calculating probabilities of specific values and ranges of values from the Poisson distribution
- Calculating probabilities of ranges using the cumulative density function of the Poisson distribution
- Generating random values from a distribution
- Principles of expectation and variance of various distributions
- Universal properties of expectation and variance