# Lab: probability via outcomes

Key point of this lab: if we have a sample space where each outcome is equally likely:

$P(\text{each outcome}) = \dfrac{1}{\mid S \mid}$

where $\mid S \mid$ is the cardinality of $S$, in other words, the number of possible outcomes in the sample space.
then

$P(E) = \dfrac{ \text{number of outcomes in E} }{\text{number of outcomes in S}}= \dfrac{\mid E \mid}{\mid S \mid}$

## Exercise 1

In this exercise, we'll look at possible outcomes when throwing a dice twice.

Next, we'll compute a couple or probabilities associated with doing this.
First, let's create the sample set as a numpy array below.

In [17]:
import numpy as np

In [19]:
def create_dice_tuples():
   i=[1,2,3,4,5,6]
   tuples=[]
   for n in i:
       j=1
       while j<7:
           tuples.append((n,j))
           j+=1
   return tuples

tuples = create_dice_tuples()

In [20]:
tuples

[(1, 1),
 (1, 2),
 (1, 3),
 (1, 4),
 (1, 5),
 (1, 6),
 (2, 1),
 (2, 2),
 (2, 3),
 (2, 4),
 (2, 5),
 (2, 6),
 (3, 1),
 (3, 2),
 (3, 3),
 (3, 4),
 (3, 5),
 (3, 6),
 (4, 1),
 (4, 2),
 (4, 3),
 (4, 4),
 (4, 5),
 (4, 6),
 (5, 1),
 (5, 2),
 (5, 3),
 (5, 4),
 (5, 5),
 (5, 6),
 (6, 1),
 (6, 2),
 (6, 3),
 (6, 4),
 (6, 5),
 (6, 6)]

In [23]:

sample_dice = np.array(tuples)

Look at the shape of the array to reassure we haven't made any mistakes.

In [24]:
sample_dice.shape # should be equal to (36,2)

(36, 2)

In [46]:
sum((sample_dice == 5).any(axis=1))

11

In [51]:
set_5 = sample_dice == (5)

In [56]:
set_5

array([[False, False],
       [False, False],
       [False, False],
       [False, False],
       [False,  True],
       [False, False],
       [False, False],
       [False, False],
       [False, False],
       [False, False],
       [False,  True],
       [False, False],
       [False, False],
       [False, False],
       [False, False],
       [False, False],
       [False,  True],
       [False, False],
       [False, False],
       [False, False],
       [False, False],
       [False, False],
       [False,  True],
       [False, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True, False],
       [ True,  True],
       [ True, False],
       [False, False],
       [False, False],
       [False, False],
       [False, False],
       [False,  True],
       [False, False]])

In [53]:
true_5 = np.any(set_5, axis = 1)

In [55]:
prob_5 = true_5.sum()/len(sample_dice)
print(prob_5)

0.3055555555555556


Use Python to obtain the following probabilities:

#### a. What is the probability of throwing a 5 at least once?

First, use sample_dice to get "True" values for each time a 5 occurs.

In [48]:
set_5 = sum((sample_dice == 5).any(axis=1))
set_5

11

Next, make sure that you get a value `True` for each pair where at least one 5 was thrown.

In [49]:
true_5 = len(sample_dice)
print(true_5)

36


Applying the `sum()` function you can get to the total number of items in the event space. Divide this by the total number in the sample space.

In [50]:
prob_5 = set_5/true_5
print(prob_5)

0.3055555555555556


In [66]:
set_5 =  sample_dice == (5)
set_6 = sample_dice == (6)
set_5_6 = (set_6 + set_5)
set_any_5_6 = set_5_6.any(axis=1)

prob_5_6prob_5_  = set_any_5_6.sum()/len(sample_dice)
print(prob_5_6)

0.6111111111111112


#### b. What is the probability of throwing a 5 or 6 at least once?

In [59]:
set_5 = np.sum((sample_dice == 5).any(axis =1))
set_6 = np.sum((sample_dice == 6).any(axis =1))

In [63]:
set_5_6 = set_5 + set_6

In [None]:
set_any_5_6 = None
print(set_any_5_6) 

In [64]:
prob_5_6 = set_5_6 /len(sample_dice)
print(prob_5_6)

0.6111111111111112


In [79]:
np.sum((sample_dice[:,0] + sample_dice[:,1]) == 8)

5

#### c. What is the probability of the outcome having a sum of exactly 8?

In [81]:

sum_8 = np.sum((sample_dice[:,0] + sample_dice[:,1]) == 8)

In [83]:
prob_sum_8 = sum_8/len(sample_dice)
print(prob_sum_8)

0.1388888888888889


In [84]:

sum_dice= np.sum(sample_dice, axis = 1)
sum_8 = sum(sum_dice == 8)

prob_sum_8= sum_8/len(sample_dice)
print(prob_sum_8)

0.1388888888888889


# Exercise 2

At a supermarket, we randomly select customers, and make notes of whether a certain customer owns a Visa card (event A) or an Amex credit card (event B). Some customers own both cards.
You can assume that:

- P(A) = 0.5
- P(B) = 0.4
- both A and B = 0.25.

1) compute the probability that a selected customer has at least one credit card.

2) compute the probability that a selected customer doesn't own any of the mentioned credit cards.

3) compute the probability that a customer *only* owns VISA card.

(You can use python here, but you don't have to)

# Exercise 3

A teaching assistant is holding office hours so students can make appointments. She has 6 appointments scheduled today, 3 by male students, and 3 by female students. 

In [85]:
import numpy as np

In [91]:
sample_mf= np.array([("M","M","M","F","F","F"), ("M","M","F","M","F","F"), ("M","M","F","F","M","F"),
                     ("M","M","F","F","F","M"), ("M","F","M","M","F","F"), ("M","F","M","F","F","M"),
                     ("M","F","M","F","M","F"), ("M","F","F","M","F","M"), ("M","F","F","M","M","F"),
                     ("M","F","F","F","M","M"), ("F","F","F","M","M","M"), ("F","F","M","F","M","M"), 
                     ("F","F","M","M","F","M"), ("F","F","M","M","M","F"), ("F","M","F","F","M","M"),
                     ("F","M","F","M","M","F"), ("F","M","F","M","F","M"), ("F","M","M","F","M","F"),
                     ("F","M","M","F","F","M"), ("F","M","M","M","F","F") ])

In [92]:
sample_mf.shape # get the shape of sample_mf

(20, 6)

In [93]:
sample_length= len(sample_mf)
print(sample_length)

20


In [117]:
np.sum((np.sum((sample_mf[:,:3] == "F"),axis = 1) > 1))


10

In [127]:
np.sum((np.sum((sample_mf[:,4:] == 'M'),axis =1) ==2))

4

10

#### 1. Calculate the probability that at least 2 out of the first 3 appointments are with female students

First, select the first 3 appointment slots and check for "F".

In [None]:
first_3_F = sample_mf == F
first_3_F

In [None]:
num_F = None
print(num_F)

In [118]:
F_2plus = np.sum((np.sum((sample_mf[:,:3] == "F"),axis = 1) > 1))
print(F_2plus)

10


In [120]:
prob_F_2plus = F_2plus/len(sample_mf)
print(prob_F_2plus)

0.5


In [123]:
first_3_F = sample_mf[:,:3] == "F"
num_F = np.sum(first_3_F, axis=1)
F_2plus = np.sum(num_F > 1)
prob_F_2plus = F_2plus.sum()/sample_length
prob_F_2plus

0.5

#### 2. Calculate the probability that after 4 appointment slots, all the female students have had an appointment

In [128]:
np.sum((np.sum((sample_mf[:,4:] == 'M'),axis =1) ==2)) / len(sample_mf)

0.2

You noticed that coming up with the sample space was probably the most time-consuming part of the exercise, and it would really become unfeasible to write this down for say, 10 or, even worse, 20 appointments in a row. You'll learn about methods that make this easy in the next lecture!

## Sources

https://www.datacamp.com/community/tutorials/statistics-python-tutorial-probability-1

https://www.youtube.com/watch?v=oYXYLljkC48&index=2&list=PLcmJYc2muOR9H96hGlUBV2DkviVZFmHAh