***Introduction to Machine Learning: Probability Part 1***   
Topics Covered:   
1. Marginal Probability   
2. Conditional Probability
3. Joint Probability   
4. Bayes Theorem
5. Nomenclature
6. Independence

***Probability Overview***   
This example follows *Pattern Recognition and Machine Learning* Section 1.2: Probability Theory by Christopher Bishop. A free online copy can be accessed with this [link.](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf) In Bishops example our problem is represented by two colored boxes, one box red and the other box blue. Inside the both we have a set of apples and oranges. Each of these boxes will be represented by python lists, shown below.

In [1]:
# note the following numbers will not match those in the Bishop book.
# as seen he confuses the number of fruits from Figure 1.9 in his example.

red_box = [
    'apple', 
    'apple', 
    'orange', 
    'orange', 
    'orange', 
    'orange',
    'orange',
    'orange'
]

rba = red_box.count('apple') # number of apples in red box
rbo = red_box.count('orange') # number of oranges in red box
rbt = len(red_box) # total number of fruits in red box

blue_box = [
    'apple',
    'apple',
    'apple',
    'orange',
]

bba = blue_box.count('apple') # number of apples in blue box
bbo = blue_box.count('orange') # number of oranges in blue box
bbt = len(blue_box) # total number of fruits in blue box

ta = rba + bba # total apples in both boxes
to = rbo + bbo # total oranges in both boxes

print('Red box has {} fruits total'.format(rbt))
print('Blue box has {} fruits total'.format(bbt))

Red box has 8 fruits total
Blue box has 4 fruits total


***Marginal Probability of selecting from Red Box:***   
For the first part of this problem we are going to ignore the different type of fruits in each box. We will answer the following question:   
Probability that the red box is randomly selected. Written in mathematical notation this would be: P(Box = Red) or P(Red), which is the number of fruits in the red box divided by the total number of fruits in both boxes.    
$$ P(Red Box) = \frac{\displaystyle\sum_{RedBox}^{}Fruits}{\displaystyle\sum_{RedBox}^{}Fruits + \displaystyle\sum_{BlueBox}^{}Fruits} $$   
Where $ \sum_{RedBox}^{}Fruits + \sum_{BlueBox}^{}Fruits $ is the total number of observations or N. So the equation becomes $$ P(Red Box) = \frac{\displaystyle\sum_{Redbox}^{}Fruits}{N} $$

In [2]:
# representing the probability of selecting an apple with code

mp_rd = rbt/(rbt+bbt)

print('P(RedBox) = {:.3f}'.format(mp_rd))

P(RedBox) = 0.667


***Marginal Probability of selecting from Blue Box:***   
We can repeat the above, but using blue box as what we are trying to select. $$ P(Blue Box) = \frac{\displaystyle\sum_{Bluebox}^{}Fruits}{N} $$ Since our problem is binomial we can also take advantage of the following: $$ P(Blue Box) = 1 - P(Red Box) $$ 

In [5]:
# finding probability of an orange

mp_bl = bbt/(rbt+bbt)
mp_blbi = 1 - mp_rd

print('P(BlueBox) = {:.3f}'.format(mp_bl))
print('1 - P(RedBox) = {:.3f}'.format(mp_blbi))

P(BlueBox) = 0.333
1 - P(RedBox) = 0.333


***Conditional Probability***   
Conditional probability is the cornerstone of machine learning. The notation $P(Y|X)$ means the probability of Y given X. In our example we will use it to calculate the probability an apple is selected given the color of box is red. Written mathematically: $$P(Fruit = Apple| Box = Red)$$. Using the above date this is pretty trival to solve. As you must narrow your dataset to only the fruits from the red box.

In [6]:
cp_ap_rd = rba/rbt # number of apples divided by total fruits in red box

print('P(Apple|RedBox) = {:.3f}'.format(cp_ap_rd))

P(Apple|RedBox) = 0.250


***Remaining Conditional Probability***   
Calculate the remaining three conditional probability statements. Which are $$P(Fruit = Orange | Box = Red)$$ $$P(Fruit = Apple | Box = Blue)$$ $$P(Fruit = Orange | Box = Blue)$$ Again, the last statement reads "Probability that an orange is selected given the box is blue."

In [7]:
cp_or_rd = rbo/rbt # P(Orange|Redbox)
cp_ap_bl = bba/bbt # P(Apple|BlueBox)
cp_or_bl = bbo/bbt # P(Orange|BlueBox)

print('P(Orange|RedBox) = {:.3f}'.format(cp_or_rd))
print('P(Apple|BlueBox) = {:.3f}'.format(cp_ap_bl))
print('P(Orange|BlueBox) = {:.3f}'.format(cp_or_bl))

P(Orange|RedBox) = 0.750
P(Apple|BlueBox) = 0.750
P(Orange|BlueBox) = 0.250


***Joint Probability and Product Rule***   
Joint probability is the probability of X and Y both occuring together. It can be written as $P(X, Y)$ or $P(X \cap Y)$. It differs from conditional probability as you are not told that one or the other event has occured and you must calculate the chance that both random events occur at the same time. The equation, known as *product rule*, for joint probability builds on that of conditional probability and marginal probability. $$P(X, Y) = P(Y|X)P(X)$$ So you must first find the conditional probability that Y occurs, given X has occured, then multiple by the marginal probability of X. For the first example we will look at the probability that an apple is selected and that the box is red. Written as $$P(RedBox, Apple) = P(Apple|RedBox)P(RedBox)$$

In [8]:
jp_ap_rd = cp_ap_rd*mp_rd

print('P(RedBox, Apple) = {:.3f}'.format(jp_ap_rd))

P(RedBox, Apple) = 0.167


***Remaining Joint Probability***   
Calculate the remaining three joint probability statements. Which are $$P(RedBox, Orange) = P(Orange|RedBox)P(RedBox)$$ $$P(BlueBox, Apple) = P(Apple|BlueBox)P(BlueBox)$$ $$P(BlueBox, Orange) = P(Orange|BlueBox)P(BlueBox)$$

In [9]:
jp_or_rd = cp_or_rd*mp_rd
jp_ap_bl = cp_ap_bl*mp_bl
jp_or_bl = cp_or_bl*mp_bl

print('P(RedBox, Orange) = {:.3f}'.format(jp_or_rd))
print('P(BlueBox, Apple) = {:.3f}'.format(jp_ap_bl))
print('P(BlueBox, Orange) = {:.3f}'.format(jp_or_bl))

P(RedBox, Orange) = 0.500
P(BlueBox, Apple) = 0.250
P(BlueBox, Orange) = 0.083


***Summation Rule***   
At this point we have almost gone full circle, coming back to another way to calculate marginal probability, using the summation rule. $$P(X) = \sum_{Y} P(X, Y)$$ Stated, the probability of X is all the joint probabilities of X and Y summed over Y. We will illustrate this with the apple example. $$P(Apple) = \sum_{Box} P(Apple, Box)$$ Stated simply, the overall probability of selecting an apple is the probability of selecting an apple summed over all the different box types. For our specific example, this is written as: $$P(Apple) = P(Apple, RedBox) + P(Apple, BlueBox)$$

In [10]:
mp_ap = jp_ap_rd + jp_ap_bl

print('P(Apple) = {:.3f}'.format(mp_ap))

P(Apple) = 0.417


***Marginal Probability of Orange***   
$$P(Orange) = \sum_{Box} P(Orange, Box)$$

In [11]:
mp_or = jp_or_rd + jp_or_bl

print('P(Orange) = {:.3f}'.format(mp_or))

P(Orange) = 0.583


***Bayes Theorem***   
The above relationships between terms forms the basis for *Bayes' Theorem*, a fundamental relationship for prediction. $$P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)}$$ Where the marginal probability term $P(X)$ is represented by summation and product rule. $$P(X) = \sum_{Y}^{}P(X|Y)P(Y)$$ What makes Bayes' theorem so special is that it allows the analyst to make predictions of new observations by using old data. For our example we will pretend that we are blindfolded and told to randomly select a piece of fruit, not knowing whether it is out of the blue or red box. Afterwards the box is covered and you see it is an orange. You are tasked with predicting whether that box was red or blue.   
To set up the problem we calculate the probability that the box is red, given that the fruit is an orange. $$P(RedBox|Orange) = \frac{P(Orange|RedBox)P(RedBox)}{P(Orange)}$$

In [12]:
cp_rd_or = cp_or_rd*mp_rd/mp_or

print('P(RedBox|Orange) = {:.3f}'.format(cp_rd_or))

P(RedBox|Orange) = 0.857


The above probability is great, but doesn't mean much if we have nothing to compare it too. We now have to calculate the probability that the box is blue, given the fruit is an orange. $$P(BlueBox|Orange) = \frac{P(Orange|BlueBox)P(BlueBox)}{P(Orange)}$$

In [13]:
cp_bl_or = cp_or_bl*mp_bl/mp_or

print('P(BlueBox|Orange) = {:.3f}'.format(cp_bl_or))

P(BlueBox|Orange) = 0.143


In most problems it won't be as simple as predicting whether the box is red or blue. You may have ten different box colors. In these scenarios it is ususally better to find the conditional probability with the maximum value, instead of slowly comparing one against another. For this we create a simple lookup table in python using a dictionary.

In [14]:
# dictionary used to store conditional predicted values
cp_or_dict = {
    'RedBox':cp_rd_or,
    'BlueBox':cp_bl_or,
}

# box color with highest probability
p_or_box = max(cp_or_dict, key=cp_or_dict.get)

print('The orange probably came from a {}'.format(p_or_box))

The orange probably came from a RedBox


**Nomenclature Discussion**   
One of the hardest parts of understanding machine learning is the different jargon presented. We'll introduce four terms, three which will be referenced extensively in future discussions:   
1. Posterior   
2. Likelihood
3. Prior   
4. Evidence
   
Referring back to the fruit and box problem. As seen we are writing the equation in more general terms. $$\color{Blue}{P(Box|Fruit)} = \frac{\color{Green}{P(Fruit|Box)}\color{Fuchsia}{P(Box)}}{\color{Red}{P(Fruit)}}$$    
$\color{Blue}{Posterior}$ - $\color{}{P(Box|Fruit)}$ box probability once the fruit type is given.   
$\color{Green}{Likelihood}$ - $\color{}{P(Fruit|Box)}$ how probable the fruit dataset is for different boxes.   
$\color{Fuchsia}{Prior}$ - $\color{}{P(Box)}$ information obtained before observing the identity of the fruit.   
$\color{Red}{Evidence}$ - $\color{}{P(Fruit)}$ how probable is the new evidence under all possible hypotheses'
   
   A quick note on the *evidence* term. Most of the time it doesn't matter, as you are holding the fruit type constant. This causes all the P(Fruit) to be the same in your comparison. So another way of writing Bayes Theorem is the following: $$\color{Blue}{posterior}\propto \color{Green}{likelihood} \times \color{Fuchsia}{prior}$$   
   Stated as "The posterior is proportional to the likelihood multiplied by the prior".

In [15]:
# repeatinng the same exercise as above
# but neglecting denominator in the calculation

psr_rd_or = cp_or_rd*mp_rd
print('Proportional posterior of Redbox|Orange = {:.3f}'.format(psr_rd_or))

psr_bl_or = cp_or_bl*mp_bl
print('Proportional posterior of BlueBox|Orange = {:.3f}'.format(psr_bl_or))

Proportional posterior of Redbox|Orange = 0.500
Proportional posterior of BlueBox|Orange = 0.083


From above it is evident that our probabilities would still predict that the orange came from a red box.

**Independence Rules**   
We'll close out this review of probability with a small discussion on independence between parameters. For two variables to be considered *independent* the joint distribution must be equal to the product of the marginal probabilities. $$Independent \rightarrow P(X, Y) = P(X)P(Y)$$ If the two parameters are independent the product rule simplifies to: $$Independent \rightarrow P(Y|X) = P(Y)$$ For this we are going to compare whether selecting a fruit is independent from the box color.

In [16]:
# we'll use boolean logic to compare whether the following are equal
# this will allow us to determine whether the parameters are independent
# the round function is used to prevent a False value being returned from
# a very small difference between numbers

deci = 3
ind_ap_bl = round(jp_ap_bl, deci) == round(mp_ap*mp_bl, deci)
ind_ap_rd = round(jp_ap_rd, deci) == round(mp_ap*mp_rd, deci)
ind_or_bl = round(jp_or_bl, deci) == round(mp_or*mp_bl, deci)
ind_or_rd = round(jp_or_rd, deci) == round(mp_or*mp_rd, deci)

print('Apples and blue boxes independent? {}'.format(ind_ap_bl))
print('Apples and red boxes independent? {}'.format(ind_ap_rd))
print('Oranges and blue boxes independent? {}'.format(ind_or_bl))
print('Oranges and red boxes independent? {}'.format(ind_or_rd))

Apples and blue boxes independent? False
Apples and red boxes independent? False
Oranges and blue boxes independent? False
Oranges and red boxes independent? False


From the results above, it is apparent that the fruit you select is not independent from the box color you choose it from. It is important that even if one of those questions from the four was true, it would still not mean that the fruit and box color were independent. All the questions would need to be true for independence to be recognized.

**Example of actual Independence**   
To conclude our discussion of independence we'll give a quick discussion of when we would observe independence in the above problem. This would occur if you still had the two boxes, red and blue, but instead you had the same distribution of oranges and apples. So it wouldn't matter which box you chose from, you would still have the same probability of an orange or apple.