# Probability For Data Science

## 1. Addition Rule
P(A or B) = P(A) + P(B) - p(A and B)

In [3]:
def prob_a_or_b(a, b, all_possible_outcomes):
    prob_a = len(a)/len(all_possible_outcomes)
    prob_b = len(b)/len(all_possible_outcomes)
    inter = a.intersection(b)
    prob_inter = len(inter)/len(all_possible_outcomes)
    return prob_a + prob_b - prob_inter

### Exercise 1
Rolling a die once and getting an even number or an odd number.

In [4]:
evens = {2, 4, 6}
odds = {1, 3, 5}
all_possible_rolls = {1, 2, 3, 4, 5, 6}

In [5]:
prob_a_or_b(evens, odds, all_possible_rolls)

1.0

In [6]:
type(evens) # You can use intersection() with "set"

set

### Exercise 2
Rolling a die once and getting an odd number or a number greater than 2.

In [7]:
greater_than_two = {3, 4, 5, 6}
odds = {1, 3, 5}
all_possible_rolls = {1, 2, 3, 4, 5 ,6}

In [8]:
prob_a_or_b(greater_than_two, odds, all_possible_rolls)

0.8333333333333333

### Exercise 3
Selecting a diamond card or a face card from a standard deck of cards.

In [9]:
diamond_cards = {'ace_diamond', '2_diamond', '3_diamond', '4_diamond', '5_diamond', 
                 '6_diamond', '7_diamond', '8_diamond', '9_diamond', '10_diamond', 
                 'jack_diamond', 'queen_diamond', 'king_diamond'}

face_cards = {'jack_diamond', 'jack_spade', 'jack_heart', 'jack_club', 'queen_diamond', 
              'queen_spade', 'queen_heart', 'queen_club', 'king_diamond', 'king_spade', 'king_heart', 'king_club'}

all_possible_cards = {'ace_diamond', '2_diamond', '3_diamond', '4_diamond', '5_diamond', '6_diamond', 
                      '7_diamond', '8_diamond', '9_diamond', '10_diamond', 'jack_diamond', 'queen_diamond', 
                      'king_diamond', 'ace_heart', '2_heart', '3_heart', '4_heart', '5_heart', '6_heart', 
                      '7_heart', '8_heart', '9_heart', '10_heart', 'jack_heart', 'queen_heart', 'king_heart', 
                      'ace_spade', '2_spade', '3_spade', '4_spade', '5_spade', '6_spade', '7_spade', '8_spade', 
                      '9_spade', '10_spade', 'jack_spade', 'queen_spade', 'king_spade', 'ace_club', '2_club', 
                      '3_club', '4_club', '5_club', '6_club', '7_club', '8_club', '9_club', '10_club', 'jack_club', 
                      'queen_club', 'king_club'}

In [10]:
prob_a_or_b(diamond_cards,face_cards, all_possible_cards)

0.4230769230769231

## 2. Random Variables
A random variable is, in its simplest form, a function. In probability, we often use random variables to represent random events. For example, we could use a random variable to represent the outcome of a die roll: any number between one and six.

### Exercise 1
The following code simulates the outcome of rolling a fair die twice using ``np.random.choice()``:

In [11]:
import numpy as np
die_6 = range(1, 7)
rolls = np.random.choice(die_6, size = 2, replace = True)
print(rolls)

[4 3]


### Exercise 2
Change the value of `num_rolls` so that `results_1` has the results of rolling a die ten times.

In [12]:
results_1 = np.random.choice(die_6, size = 10, replace = True)
print(results_1)

[2 4 5 3 4 4 5 4 6 5]


### Exercise 3
- Using the `range()` function, create a 12-sided die called `die_12`. Use similar logic as die_6.
- Simulate rolling `die_12` ten times, and save the rolls as `results_2`. Use the ``np.random.choice()`` function to simulate the rolls, and be sure to print out your results!

In [13]:
die_12 = range(1, 13)

In [14]:
results_2 = np.random.choice(die_12, size = 10, replace = True)
results_2

array([ 5,  7,  3,  7,  4,  5, 11,  6,  5,  6])

## 3. Discrete and Continuous Random Variables
1. Discrete Random Variables

    - Random variables with a countable number of possible values are called discrete random variables. For example, rolling a regular 6-sided die would be considered a discrete random variable because the outcome options are limited to the numbers on the die.
    
    - Discrete random variables are also common when observing counting events, such as how many people entered a store on a randomly selected day. In this case, the values are countable in that they are limited to whole numbers (you can’t observe half of a person).
    
    
2. Continous Random Variables

    - When the possible values of a random variable are uncountable, it is called a continuous random variable. These are generally measurement variables and are uncountable because measurements can always be more precise – meters, centimeters, millimeters, etc.
    
    - For example, the temperature in Los Angeles on a randomly chosen day is a continuous random variable. We can always be more precise about the temperature by expanding to another decimal place (96 degrees, 96.44 degrees, 96.437 degrees, etc.).

## 4. Calculating Probabilities using Python
A probability mass function (PMF) is a type of probability distribution that defines the probability of observing a particular value of a discrete random variable.

The `binom.pmf()` method from the `scipy.stats` library can be used to calculate the PMF of the binomial distribution at any value. This method takes 3 values:
- `x`: the value of interest
- `n`: the number of trials
- `p`: the probability of success

### Exercise 1
For example, suppose we flip a fair coin 10 times and count the number of heads. We can use the `binom.pmf()` function to calculate the probability of observing 6 heads as follows:

In [15]:
import scipy.stats as stats

In [16]:
stats.binom.pmf(6, 10, 0.5)

0.2050781249999999

### Exercise 2
Calculate `prob_2` to be the probability of observing 7 heads out of 20 fair coin flips. However, this time, directly input values into the `stats.binomial.pmf()` method. Then print `prob_2`.

In [17]:
prob_2 = stats.binom.pmf(7, 20, 0.5)
prob_2

0.07392883300781249

### Exercise 3
We can calculate the probability of observing between 2 and 4 heads from 10 coin flips as follows:

In [18]:
sum(stats.binom.pmf([2, 3, 4], 10, 0.5))

0.36621093749999994

## 5. Cumulative Distribution Function
The cumulative distribution function for a discrete random variable can be derived from the probability mass function. However, instead of the probability of observing a specific value, the cumulative distribution function gives the probability of observing a specific value OR LESS.

### Exercise 1
Calculating the probability of observing 6 or fewer heads from 10 fair coin flips (0 to 6 heads) mathematically looks like the following:

In [19]:
import scipy.stats as stats

In [20]:
stats.binom.cdf(6, 10, 0.5)

0.828125

### Exercise 2
Calculating the probability of observing between 4 and 8 heads from 10 fair coin flips can be thought of as taking the difference of the value of the cumulative distribution function at 8 from the cumulative distribution function at 3:

In [21]:
stats.binom.cdf(8, 10, 0.5) - stats.binom.cdf(3, 10, 0.5)

0.8173828125

### Exercise 3 
To calculate the probability of observing more than 6 heads from 10 fair coin flips we subtract the value of the cumulative distribution function at 6 from 1. Mathematically, this looks like the following:

In [22]:
1 - stats.binom.cdf(6, 10, 0.5)

0.171875

## 6. Probability Density Functions
Similar to how discrete random variables relate to probability mass functions, continuous random variables relate to probability density functions. They define the probability distributions of continuous random variables and span across all possible values that the given random variable can take on.

When graphed, a probability density function is a curve across all possible values the random variable can take on, and the total area under this curve adds up to 1.

We can calculate the area under the curve using the cumulative distribution function for the given probability distribution.

### Exercise 1
We know that women’s heights have a mean of 167.64 cm with a standard deviation of 8 cm, which makes them fall under the Normal(167.64, 8) distribution.

Let’s say we want to know the probability that a randomly chosen woman is less than 158 cm tall. We can use the cumulative distribution function to calculate the area under the probability density function curve from 0 to 158 to find that probability.

In [23]:
import scipy.stats as stats

In [24]:
stats.norm.cdf(158, 167.64, 8)

0.11410165094812996

### Exercise 2
Following the same Normal(167.64, 8) distribution, assign the variable prob the probability that a randomly chosen woman is less than 175 cm tall. You should use the stats.norm.cdf() method.

Be sure to print prob.

In [25]:
prob = stats.norm.cdf(175, 167.64, 8)
prob

0.8212136203856288

## 7. Probability Density Functions and Cumulative Distribution Function
We can take the difference between two overlapping ranges to calculate the probability that a random selection will be within a range of values for continuous distributions. This is essentially the same process as calculating the probability of a range of values for discrete distributions.

### Exercise 1
Let’s say we wanted to calculate the probability of randomly observing a woman between 165 cm to 175 cm, assuming heights still follow the Normal(167.74, 8) distribution. 

In [26]:
stats.norm.cdf(175, 167.74, 8) - stats.norm.cdf(165, 167.74, 8)

0.45194145326220503

### Exercise 2
Let’s say we wanted to calculate the probability of observing a woman taller than 172 centimeters, assuming heights still follow the Normal(167.74, 8) distribution. 

In [27]:
1 - stats.norm.cdf(172, 167.74, 8)

0.2971898709083267

### Exercise 3
The weather in the Galapagos islands follows a Normal distribution with a mean of 20 degrees Celcius and a standard deviation of 3 degrees.

The weather on a randomly selected day will be between 18 to 25 degrees Celcius

In [28]:
stats.norm.cdf(25, 20, 3) - stats.norm.cdf(18, 20, 3)

0.6997171101802624

### Exercise 4
The weather in the Galapagos islands follows a Normal distribution with a mean of 20 degrees Celcius and a standard deviation of 3 degrees.

the weather on a randomly selected day will be greater than 24 degrees Celsius.

In [29]:
1 - stats.norm.cdf(24, 20, 3)

0.09121121972586788

## 8. Poisson Distribution
- There are numerous probability distributions used to represent almost any random event. In the previous lesson, we learned about the binomial distribution to represent events like any number of coin flips as well as the normal distribution to represent events such as the height of a randomly selected woman.


- The Poisson distribution is another common distribution, and it is used to describe the number of times a certain event occurs within a fixed time or space interval. For example, the Poisson distribution can be used to describe the number of cars that pass through a specific intersection between 4pm and 5pm on a given day. It can also be used to describe the number of calls received in an office between 1pm to 3pm on a certain day.


- The Poisson distribution is defined by the rate parameter, symbolized by the Greek letter lambda, λ.


- Lambda represents the expected value — or the average value — of the distribution. For example, if our expected number of customers between 1pm and 2pm is 7, then we would set the parameter for the Poisson distribution to be 7.

### Exercise 1
Suppose that we expect it to rain 10 times in the next 30 days. The number of times it rains in the next 30 days is “Poisson distributed” with lambda = 10. We can calculate the probability of exactly 6 times of rain as follows:

In [30]:
import scipy.stats as stats

In [31]:
stats.poisson.pmf(6, 10)

0.06305545800345125

### Example 2

For example, if we expect it to rain 10 times in the next 30 days, the number of times it rains in the next 30 days is “Poisson distributed” with lambda = 10. We can calculate the probability of 12-14 times of rain as follows:

In [32]:
stats.poisson.pmf(12, 10) + stats.poisson.pmf(13, 10) + stats.poisson.pmf(14, 10)

0.21976538076223123

### Example 3
We are working in a call center, and we expect the average number of calls in our call center between 9am and 10am to be 15 calls. What is the probability that we would see exactly 15 calls in that time frame?

In [33]:
stats.poisson.pmf(15, 15)

0.1024358666645339

### Example 4
We are working in a call center, and we expect the average number of calls in our call center between 9am and 10am to be 15 calls. What is the probability we would get between 7 and 9 calls in that time frame?

In [34]:
stats.poisson.pmf(7, 15) + stats.poisson.pmf(8, 15) + stats.poisson.pmf(9, 15)

0.062221761061894816

## 9. Calculating Probabilities of a Range using the Cumulative Density Function
We can use the ``poisson.cdf()`` method in the ``scipy.stats`` library to evaluate the probability of observing a specific number or less given the expected value of a distribution.

### Example 1
For example, if we wanted to calculate the probability of observing 6 or fewer rain events in the next 30 days when we expected 10, we could do the following:

In [36]:
stats.poisson.cdf(6, 10)
# This means that there is roughly a 13% chance that there will be 6 or fewer rainfalls in the month in question.

0.130141420882483

### Example 2
If we wanted to calculate the probability of observing 12 or more rain events in the next 30 days when we expected 10, we could do the following:

In [37]:
1- stats.poisson.cdf(11, 10)

0.30322385369689386

### Example 3
For example, while still expecting 10 rainfalls in the next 30 days, we could use the following code to calculate the probability of observing between 12 and 18 rainfall events:

In [39]:
stats.poisson.cdf(18, 10) - stats.poisson.cdf(11, 10)

0.29603734909303947

### Example 4
Working at a call center where the average number of calls between 9am and 10am is 15 calls, what is the probability of observing more than 20 calls?

In [40]:
1- stats.poisson.cdf(20, 15)

0.08297091003146029

### Example 5
What is the probability of observing between 17 to 21 calls when the expected number of calls is 15?

In [41]:
stats.poisson.cdf(21, 15) - stats.poisson.cdf(16, 15)

0.2827703929341844

## Spread of the Poisson Distribution
- Probability distributions also have calculable variances. Variances are a way of measuring the spread or dispersion of values and probabilities in the distribution. For the Poisson distribution, the variance is simply the value of lambda (λ), meaning that the expected value and variance are equivalent in Poisson distributions.

### Example 1
- We can calculate the variance of a sample using the ``numpy.var()`` method.

In [48]:
import numpy as np

In [66]:
rand_vars = stats.poisson.rvs(4, size = 1000)
np.var(rand_vars)

3.997519

### Example 2
Another way to view the increase in possible values is to take the range of a sample (the minimum and maximum values in a set). The following code will take draw 1000 random variables from the Poisson distribution with `lambda = 4 `and then print the minimum and maximum values observed using the `.min()` and `.max()` Python functions:

In [69]:
rand_vars = stats.poisson.rvs(4, size = 1000)
min(rand_vars), max(rand_vars)

(0, 11)

If we increase the value of lambda to 10, let’s see how the minimum and maximum values change:

In [72]:
rand_vars = stats.poisson.rvs(10, size = 1000)
min(rand_vars), max(rand_vars)

# These values are spread wider, indicating a larger variance.

(2, 23)

## Extras
### Example 1
You work at ambulance dispatch where the number of calls that come in daily follows the Poisson distribution with lambda equal to 9. There’s a rule that a team can go on no more than 12 calls a day. But how often could this happen?

In [73]:
calls = 1 - stats.poisson.cdf(12, 9)
calls

0.1242265708290351

### Example 2
Let’s say that you have to call in a backup team if you have 10 or more calls in a given day. But you don’t want to have to call in a backup team unless they really will be needed. But what is the probability that they will be called and not needed?

In [74]:
false_backup = stats.poisson.cdf(12, 9) - stats.poisson.cdf(9, 9)
false_backup

0.2883651848390232

### Example 3
A certain tennis star has a first-serve rate of 62%. Let’s say they serve 80 times in a given match. What is the expected value of the number of serves they make?

In [75]:
expected_serves = 80*0.62
expected_serves

49.6

### Example 4
At the same first-serve rate, what is the variance of this player’s first-serves?

In [76]:
variance_serves = 80*0.62*(1-0.62)
variance_serves

18.848000000000003