# Probability & data generation process
This chapter provides a basic introduction to probability concepts and a hands-on understanding of the data generating process. We'll look at a number of examples of modeling the data generating process and will conclude with modeling an eCommerce advertising simulation.

# 1. Probability basics
## 1.1 Queen and spade
In this example, you'll use the generalized probability formula $P(A∪B)=P(A)+P(B)−P(A∩B)$ to calculate the probability of two events. Consider a deck of cards (13 cards x 4 suites = 52 cards in total). One card is drawn at random. What is the probability of getting a queen or a spade? Here event A is the card being a queen and event B is the card being a spade. Think carefully about whether the two events have anything in common.

### Possible Answers
1. 17/52
2. 16/52
3. 15/52
4. 18/52

<div align=right>Answer: (2)</div>
Next, let's try to get this answer using simulation.

## 1.2 Two of a kind
Now let's use simulation to estimate probabilities. Suppose you've been invited to a game of poker at your friend's home. In this variation of the game, you are dealt five cards and the player with the better hand wins. You will use a simulation to estimate the probabilities of getting certain hands. Let's work on estimating the probability of getting at least two of a kind. Two of a kind is when you get two cards of different suites but having the same numeric value (e.g., 2 of hearts, 2 of spades, and 3 other cards).

By the end of this exercise, you will know how to use simulation to calculate probabilities for card games.

### Instructions:
* In the for loop, shuffle `deck_of_cards`.
* Utilize the `get()` method of a dictionary to count the number of occurrences of each card in the hand.
* Increment the counter, `two_kind`, when there are at least two cards having the same numeric value in the hand.

In [1]:
import numpy as np
import itertools
deck_of_cards = list(itertools.product(['Heart', 'Club', 'Spade', 'Diamond'],range(13)))
np.random.seed(123)

# Shuffle deck & count card occurrences in the hand
n_sims, two_kind = 10000, 0
for i in range(n_sims):
    np.random.shuffle(deck_of_cards)
    hand, cards_in_hand = deck_of_cards[0:5], {}
    for card in hand:
        # Use .get() method on cards_in_hand
        cards_in_hand[card[1]] = cards_in_hand.get(card[1], 0) + 1
    
    # Condition for getting at least 2 of a kind
    highest_card = max(cards_in_hand.values())
    if highest_card>=2: 
        two_kind += 1

print("Probability of seeing at least two of a kind = {} ".format(two_kind/n_sims))

Probability of seeing at least two of a kind = 0.4952 


## 1.3 Game of thirteen
A famous French mathematician Pierre Raymond De Montmart, who was known for his work in combinatorics, proposed a simple game called as Game of Thirteen. You have a deck of 13 cards, each numbered from 1 through 13. Shuffle this deck and draw cards one by one. A coincidence is when the number on the card matches the order in which the card is drawn. For instance, if the 5th card you draw happens to be a 5, it's a coincidence. You win the game if you get through all the cards without any coincidences. Let's calculate the probability of winning at this game using simulation.

By completing this exercise, you will further strengthen your ability to cast abstract problems into the simulation framework for estimating probabilities.

### Instructions:
* For each drawing, draw __all__ the cards in `deck` __without__ replacement and assign to `draw`.
* Check if there are any coincidences in the draw and, if there are, increment the `coincidences` counter by 1.
* Calculate winning probability as the fraction of games without any coincidences and use `prob_of_winning` to print your results.

In [2]:
np.random.seed(111)

# Pre-set constant variables
deck, sims, coincidences = np.arange(1, 14), 10000, 0

for _ in range(sims):
    # Draw all the cards without replacement to simulate one game
    draw = np.random.choice(deck, size=13, replace=False)
    # Check if there are any coincidences
    coincidence = (draw == list(np.arange(1, 14))).any()
    if coincidence == True: 
        coincidences += 1

# Calculate probability of winning
prob_of_winning = 1 - coincidences/sims
print("Probability of winning = {}".format(prob_of_winning))

Probability of winning = 0.36950000000000005


# 2. More probability concepts
## 2.1 The conditional urn
As we've learned, conditional probability is defined as the probability of an event given another event. To illustrate this concept, let's turn to an urn problem.

We have an urn that contains 7 white and 6 black balls. Four balls are drawn at random. We'd like to know the probability that the first and third balls are white, while the second and the fourth balls are black.

Upon completion, you will learn to manipulate simulations to calculate simple conditional probabilities.

### Instructions:
* Initialize the counter `success` to 0 and `sims` to 5000.
* Define a list, `urn`, with 7 white balls (`'w'`) and 6 black balls (`'b'`).
* Draw 4 balls without replacement and check to see if the first and third are white and second and fourth are black.
* Increment `success` if the above criterion is met.

In [3]:
import numpy as np
np.random.seed(123)

# Initialize success, sims and urn
success, sims = 0, 5000
urn = ['w', 'w', 'w', 'w', 'w', 'w', 'w', 'b', 'b', 'b', 'b', 'b', 'b']

for _ in range(sims):
    # Draw 4 balls without replacement
    draw = np.random.choice(urn, replace=False, size=4)
    # Count the number of successes
    if (draw[0]=='w')&(draw[1]=='b')&(draw[2]=='w')&(draw[3]=='b'): 
        success +=1

print("Probability of success = {}".format(success/sims))

Probability of success = 0.0722


## 2.2 Birthday problem
Now we'll use simulation to solve a famous probability puzzle - the birthday problem. It sounds quite straightforward - _How many people do you need in a room to ensure at least a 50% chance that two of them share the same birthday?_

With 366 people in a 365-day year, we are 100% sure that at least two have the same birthday, but we only need to be 50% sure. Simulation gives us an elegant way of solving this problem.

Upon completion of this exercise, you will begin to understand how to cast problems in a simulation framework.

### Instructions 1/2:
* Initialize the sample space `days` which is an array from 1 - 365.
* Define a function `birthday_sim()` that takes as input the number of `people` and returns the probability that at least two share the same birthday.

In [4]:
# Draw a sample of birthdays & check if each birthday is unique
days = np.arange(1, 366)
people = 2

def birthday_sim(people):
    sims, unique_birthdays = 2000, 0 
    for _ in range(sims):
        draw = np.random.choice(days, size=people, replace=True)
        if len(draw) == len(set(draw)): 
            unique_birthdays += 1
    out = 1 - unique_birthdays / sims
    return out

### Instructions 2/2:
* Call `birthday_sim()` in a while loop and break when the probability is greater than 50%.

In [5]:
# Break out of the loop if probability greater than 0.5
while (people > 0):
    prop_bds = birthday_sim(people)
    if prop_bds > 0.5: 
        break
    people += 1

print("With {} people, there's a 50% chance that two share a birthday.".format(people))

With 23 people, there's a 50% chance that two share a birthday.


Yes, 23 seems surprisingly low, but it's enough to have a 50% chance!

## 2.3 Full house
Let's return to our poker game. Last time, we calculated the probability of getting at least two of a kind. This time we are interested in a full house. A full house is when you get two cards of different suits that share the same numeric value and three other cards that have the same numeric value (e.g., 2 of hearts & spades, jacks of clubs, diamonds, & spades).

Thus, a full house is the probability of getting exactly three of a kind conditional on getting exactly two of a kind of another value. Using the same code as before, modify the success condition to get the desired output. This exercise will teach you to estimate conditional probabilities in card games and build your foundation in framing abstract problems for simulation.

### Instructions:
* Shuffle `deck_of_cards`.
* Utilize a dictionary with `.get()` to count the number of occurrences of each card in the hand.
* Increment the counter `full_house` when there is a full house in the hand (2 of one kind, 3 of the other).

In [6]:
import numpy as np
import itertools
np.random.seed(123) 
deck = list(itertools.product(['Heart', 'Club', 'Spade', 'Diamond'],range(13)))

#Shuffle deck & count card occurrences in the hand
n_sims, full_house, deck_of_cards = 50000, 0, deck.copy() 
for i in range(n_sims):
    np.random.shuffle(deck_of_cards)
    hand, cards_in_hand = deck_of_cards[0:5], {}
    for card in hand:
        # Use .get() method to count occurrences of each card
        cards_in_hand[card[1]] = cards_in_hand.get(card[1], 0) + 1
               
    # Condition for getting full house
    condition = (max(cards_in_hand.values()) ==3) & (min(cards_in_hand.values())==2)
    if  condition == True: 
        full_house += 1
print("Probability of seeing a full house = {}".format(full_house/n_sims))

Probability of seeing a full house = 0.0014


Look at how small this probability is compared to that of at least two of a kind.