# WiDS Probability and Naive Bayes - Probability Exercises

## Import Libraries

In [None]:
from __future__ import division     # Make the division operation behave
import math                         # Get access to math functions that can help with probability calculations

## Basic Probability Calculations

Calculate the answers to these basic probability problems using operation methods in Python.

#### Tip: use round() function and string formatting to help with tidying up long float results.

Q1. What is the likelihood of rolling a 2 *or* a 3 on a fair six-sided dice? 

In [None]:
# The number of favorable outcomes is 2, the number of possible outcomes is 6

print "The probability of rolling a 2 or a 3 on 6-sided dice is:", round(2/6, 2)  # using round function to get 2 dec pl

Q2. What is the probability of getting a ham sandwich, when randomly selecting from a platter that has 5 turkey, 3 vegetarian, 4 beef, 2 ham, and 5 chicken?

In [None]:
total_outcomes = 5 + 3 + 4 + 2 + 5  # all of the sandwich possibilities
favorable_outcomes = 2              # all of the ham handwich possibilities   
ham_prob = favorable_outcomes / total_outcomes
print "Probability of getting a ham sandwich is:", "%.2f" %ham_prob  # using string formatting to get 2 decimal place float


Q3. How likely is it that you will get both heads on a fair coin flip AND an even number on a fair six-sided dice?

In [None]:
prob_of_H = 1/2     # 1 favorable outcome (Heads) over 2 Total outcomes (Heads and Tails) gives 1/2
prob_of_even = 1/2  # Even dice roll options are 2,4,6. Total options 1,2,3,4,5,6. This gives 3/6, fraction reduces to 1/2

prob_of_H_and_even = prob_of_H * prob_of_even    # Assumes that these probabilities are independent.
print "The probability of getting both 'Heads' and an even number is:", "{:.0%}".format(prob_of_H_and_even)
                                                                    # using built-in format method to get this as a percentage

Q4. Aliens come down to earth and start abducting people, but not entirely randomly. Women have a 70% chance of being beamed up, and people over the age of 30 have a 45% likelihood. Given your gender and age, how likely is it that you will be transported to the inside of an alien ship?

In [None]:
# Your answer will vary, depending on your age and gender charactistics. Below answer assumes female aged over 30.

prob_of_beaming = 0.7 * 0.45   # Assumes independence
print "The probability of me being beamed up to the alien ship is:", "{:.0%}".format(prob_of_beaming)

Q5. Work out the probability of getting a score of greater than 75 on a test using the data shown in the table below:

In [None]:
#Run Me
import numpy as np
import pandas as pd

df = pd.DataFrame({'Count' : [2,2,10,8,6,4,1], 'Score': [50,65,70,75,85,90,95]})
df.index = df['Score']
df

In [None]:
# Can solve this problem visually by manually observing and adding together the counts for the scores that fit the criteria 
# in the dataframe as follows:

greater_than_75 = 6 + 4 + 1
total = 2 + 2 + 10 + 8 + 6 + 4 + 1 
prob_above_75 = greater_than_75 / total

# Or we can use the dataframe structure that we already have to get this information. Below is one approach for this:

df2 = df.iloc[4:]                                  # can use pandas iloc get dataframe rows that we know have scores of > 75 
greater_than_75 = df2["Count"].sum()               # pandas sum function used to get the total count of scores > 75 
total = df["Count"].sum()                          # pandas sum on the counts in overall dataframe to get the sample space
prob_above_75 = greater_than_75 / total

print "The probability of getting a score higher than 75 is:", "{:.0%}".format(prob_above_75)

## Permutations and Combinations

Calculate the answers to these problems, selecting the correct approach of either permutation or combination in Python.

#### Tip: use math.factorial() function to help with calculations.

In [None]:
mf = math.factorial    # Assign an alias to be more efficient in performing calculations

def find_perm(N, r):
    """Perform permutation calculation"""
    return int(mf(N) / mf(N - r))

def find_comb(N, r):
    """Perform combination calculation"""
    return int(mf(N) / (mf(r) * mf(N - r)))

Q6. How many different ways could you arrange a bouquet of 12 different flowers?

In [None]:
# Beacuse we're talking about 'ways of arranging things', we know that this is a question involving permutations

# We can manually perform calculation, using what we know about permutations:
flower_arr = 12 * 11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1

# Or we can shorten the calculation, using the formula for permutations, with math.factorial():
flower_arr = math.factorial(12) / math.factorial(12-12)

# Or we can shorten even more using the 'mf' function alias:
flower_arr = mf(12) / mf(0)

# Lastly, we could use our newly created 'find_perm' function:
flower_arr = find_perm(12,12)

print "There are", flower_arr , "ways to arrange 12 flowers"


Q7. Using a normal deck of 52 playing cards, how many different options are there for getting 6 cards that are Spades?

In [None]:
# The question hasn't indicated that sequence matters, so we can be confident that combinations is the right approach
# Using our formula for combinations, we have 13 total spades to choose from, so 13 is our 'N' value. 
# We are choosing 6 spades, so 6 is our 'r' value. This results in C(13,6), aka the combination of 13 choose 6.

# Calculated using 'mf' as alias for 'math.factorial'
six_spades = mf(13) / (mf(6) * mf(13 - 6))   

# Calculated using our 'find_comb' function
six_spades = find_comb(13,6)

print "There are", six_spades, "different combinations of 6 spades in a 52-card deck of playing cards"

Q8. You are playing a guessing game (death match). Your have a score of 3 points, your opponent has 5. You have both played your last official turn, but due to misconduct by the other player, you have been awarded a bonus six turns before the game ends, giving you the opportunity for a comeback. If your total score is more than 5 by the end of your bonus turns, you win the game.

Assuming there is no penalty for a wrong answer, and that you get 1 point per correct answer, how many possible groups of right answers are there that will lead you to victory and save your life?

In [None]:
# Order doesn't matter, only the number of ways that to get 3 or more right answers within 6 turns. Thus we use combinations.
# To get all combinations of 3 or more right answers, we would need to get the sum of C(6,3), C(6,4), C(6,5), C(6,6).

three_or_more_points = find_comb(6,3) + find_comb(6,4) + find_comb(6,5) + find_comb(6,6)

print "There are", three_or_more_points, "different combinations of 3 or more right answers out of 6 bonus questions."

Q9. Martha, Mary, Monkey, and Mozilla all went together to get a slice of cake from the bakery downtown. Mary and Mozilla like chocolate the best, whereas Martha and Monkey hate chocolate and would never choose it. There is only one piece of chocolate cake left. 

If Mozilla doesn't get chocolate, he will have a meltdown. What is the total number of ways that the four friends can queue at the bakery that won't result in Mozilla crashing through a stack of pastries?

In [None]:
# Since order matters in this question, we would use the concept of permutations to answer it.
# We don't need to consider at every possible arrangement, just the ones that put Mozilla in front of Mary

# Option 1: Mozilla is 1st in queue, all permutations of others for 2nd - 4th in queue positions result in no tantrum.
first_in_queue = 1 * 3 * 2 * 1

# Option 2: Mozilla is 2nd in queue. Mary has to be 3rd or 4th in order to have no tantrum. Which requires that one of 
# Martha and Monkey is 1st in queue, the other is either 3rd or 4th (we don't care which).
second_in_queue = 2 * 1 * 2 * 1

# Option 3: Mozilla is 3rd in queue. This would require that Mary is 4th in order to have no tantrum. We are agnostic as to 
# what order the other two appear in in 1st and 2nd positions.
third_in_queue = 2 * 1 * 1 * 1

# Option 4: Mozilla is 4th in queue. As this would always put Mozilla after Mary, there are 0 ways this results in no tantrum.
fourth_in_queue = 0

no_tantrums = first_in_queue + second_in_queue + third_in_queue + fourth_in_queue

print "There are", no_tantrums, "ways of queueing that do not result in Mozilla having a tantrum at the bakery."

Q10. You have 5 Red cups, 5 Green cups and 5 Blue cups. How many ways can you get 2 cups that have the same color, with another 2 cups that share a color also (but a different color than the first 2 cups)?


In [None]:
# This is a combination question, since order doesn't matter, we're just examining different ways to group things.
# We have 3 different colors, so we can get 3 diff color combinations when choosing 2 colors: RG, RB, GB
num_color_comb = find_comb(3,2)

# Because the number of cups for each color is the same (ie 5), we can calculate the possible combinations for 1 instance of 
# a color combination e.g. C(5,2) for each color within the combo: C(5,2)  
color_instance = find_comb(5,2)

# Given one pair of colors, the number of possibile combinations of cups is C(5,2) * C(5,2).
# To see that multiplication is the correct choice, fix 5 cups of one color, and count how 
# many ways you pick 5 cups of a different color to finish off the choice of choosing 10 cups.
ways_of_choosing_4_appropriate_cups = color_instance * color_instance

# Multiply the 3 ways of picking the color combinations to get our result for all possible color combinations
print "There are", num_color_comb * ways_of_choosing_4_appropriate_cups , "ways of getting 2 cups of one color and 2 cups of a different color."

## Dependence and Independence

Calculate the answers to these problems in Python, identifying whether the events are dependent or independent. 

Q11. You have 41 Black stones and 40 White stones. What's the probability of 7 randomly selected stones all being White (assuming the stones are not replaced)? 

In [None]:
# Since the stones are not replacesd, the probability of each stone draw is dependent on the outcome of the last one.
# For each stone draw, we take it as given that the last stone was a white stone, and thus the total # white stones available
# is reduced for each draw instead of staying the same (which it would if we had it as given that we just chose a black stone).

prob_seven_white = 40/81 * 39/80 * 38/79 * 37/78 * 36/77 * 35/76 * 34/75

# We can use permutations to perform this calculation more efficiently:

seven_white = find_perm(40,7)
total = find_perm(81,7)
prob_seven_white = seven_white / total

print "The probability of choosing 7 white stones, assuming the stones are not replaced, is:", "%0.4f" %prob_seven_white

Q12. John is trying to persuade Jim to go to a Planet of the Apes movie marathon. Jim says that his chances of deciding to go are 60% if Karen isn't going. However, if she is going, then Jim's chances of going increase to 90%. Karen says that she hasn't made up her mind. She has said that, at this point, she is equally likely to go as to not go.

What are the chances that Jim attends the marathon and Karen does not?

In [None]:
# From the question, the probability of Jim attending changes *depending* on whether Karen does, or does not attend.
# To get the probability of Jim attending AND Karen not attending correctly, we need to make use of the information we have 
# about the likelihood that Jim will attend, given the assumption that Karen will not. 
# We also need to incorporate detail provided about Karens chances of not going:

# P(Jim, not Karen) = P(not Karen) * P(Jim | not Karen)

Jim_and_not_Karen = 0.5 * 0.6   # Karen said go vs not go is equally likely. As these are the only options available to her, 
                                # there is a 50% chance she will not go.

print "The probability that Jim will attend the marathon, given that Karen does not attend is:", "%0.1f" %Jim_and_not_Karen

Q13. A puzzle store owner is waiting on a shipment of new puzzles to put on an event called Puzzle-palooza. There is a 60% chance that the stack of puzzles will arrive on time. If the puzzles don't arrive on time, there is a 20% chance that event can proceed.

What are the chances that the event will be cancelled if the shipment doesn't arrive?

In [None]:
# Instead of calculating probability of events A and B, we need to calculate probability of A given B (A | B)
# Using logic, we can infer that probability of event A is equivelant to: (1 - not event A) 
# We don't need to incorporate probability info for the shipment arriving on time, as the question assumes it will not.

# P(no Puzzle-palooza | no shipment) = 1 - P(Puzzle-palooza | no shipment)

no_PP_given_no_shipment = 1 - 0.2

print "Chances of Puzzle-palooza being cancelled, given puzzles not arriving, is:", "{:.0%}".format(no_PP_given_no_shipment)

Q14. What are the chances of drawing the Queen of Diamonds from a normal 52 card playing deck if we have just flipped tails twice on two coin tosses, and rolled a 1 on ten-sided dice?

In [None]:
# These probabilities are independent, so it doesn't matter what comes up on the coin tosses or dice rolls, 
# the likelihood of getting a Queen of Diamonds will remain the same as it otherwise would.

# P(Queen_of_D's | 2Heads, 1_on_Dice) = P(Queen_of_D's) 

prob_QD = 1 / 52

print "Chances of drawing Queen of D's, given that we have both 2H's and 1 on 10-sided dice is:", "{:.0%}".format(prob_QD)

Q15. The probability of getting a rose is 1/4, and the probability of getting a carnation is 1/6. What are the chances of getting both a rose, given that you already have a carnation, if the probability of getting a rose and a carnation is 1/21?

In [None]:
# To get the probability of getting a rose, given that you already have a carnation, rearrange the dependence formula:

# P(R,C) = P(C) * P(R|C) rearranges to:
# P(R|C) = P(R,C) / P(C)

prob_r_and_c = 1 / 21                   
prob_c = 1 / 6                      

prob_r_given_c = prob_r_and_c / prob_c

print "Chances of getting a rose, given that you have a carnation is:", "%0.2f" %prob_r_given_c