In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline

## Probability--The Science of Uncertainty

Some common questions often asked and answered in probability and statistics:

+ What is the probability that when rolling a pair of dice the sum will be a "seven"?
+ What is the probability that when rolling a single die the result will be a "two" knowing that the result is an even number?
+ What is the probability that you will win a state lottery if you buy a ticket?

Suppose that you use a sample of 20 medium to large consumer electronics coporations to estimate the average gross sales in US dollars of all medium to large consumer electronics corporations.  In this situation, you would be using the sample mean to make an inference about the true population mean, and we would want to have some idea of how accurate the sample mean is as a true reflection of the population mean.  Probability is required for a measure of reliability of such inferences.

Such inferences are not usually made without some measure of reliabilty of the inference.  Measures of reliabilty provide us with a sense of how confident we are that the inference is a good one.  In case that the confidence of the inference is low, then usually the inference would not be used or at least not trusted to be very accurate.  Either way a measure of reliability is important to know.  We use probability to obtain measures of reliability.

### Key terms:

+ __Experiement__: Some course of action taken without being sure of what the eventual outcome will be.
+ __Outcome__: What happens after the experiment
+ __Event__: One of the things that could happen when an experiement is performed.
+ __Probability__: A measure of how likely it is that a particular event will occur.

Suppose that we take a pair of dice and take one of the dice, called a die.  THen we perform the __experiment__ of rolling the die.  Next we record the __outcome__ of the experiment.  This will be either a 1, 2, 3, 4, 5, or a 6.  Now define the __event__ of interested to be that we get a 2 when we roll the die.

We explain explain the basic and intuitive idea of "probability" by continuing with the example of the experiment of rolling a die with an interest in finding the __probability__ that the __event__ of obtaining an __outcome__ of 2 occurs.  When a tie is actually rolled the __outcome__ can be any one of the first six numbers 1, 2, 3, 4, 5, or 5.  So the total number of possible __outcomes__ is 6.

By using the roll function below, simulate a die rolling and call the function as many times as needed until you have a value of 2 returned.

In [2]:
def roll():
    die = np.random.randint(1, 7)
    return die

In [3]:
roll()

6

In [4]:
roll()

4

In [5]:
roll()

1

Now, we could have obtained a 2 on the first roll, or 5, or 4, et cetera.  One could loose patience and give up--the goal of probability is to figure out the probability of a particular event without actually performing an experiment.

Since there are six possible outcomes when rolling a single die, then intuitively the probability of obtaining a 2 on a single toss is defined to be one chance in six.  We justify this based on there being one way of obtaining a 2 out of six total outcomes, or:

$$\frac{1}{6}$$

Another way to look at this is that if one rolls the die many times and counts the number of times the die comes up 2, then the ration of the number of times one obtains a 2 to the number of times one rolls the die should be close to one-sixth.  Here's our basic probability formula:

$$probability\;\;of\;\;event = \frac{number\;\;of\;\;outcomes\;\;in\;\;the\;\;event}{total\;\;number\;\;of\;\;possible\;\;outcomes}$$

In general, a __probability__ is a number between __0__ and __1__ which describes the likellihood that an outcome or an event occurs when an experiment is performed.  Probabilities are assigned to outcomes based on the frequency of occurence if the experiement were to be repeated many times.  For example, if a balanced coin were tossed twice, you would expect that each of the outcomes HH, HT, TH, and TT would occur with a frequency of about 1 in 4.  That is, if you tossed the coin twice and repeated that for 100 times, you would expect that approximately 25 times you would obtain 2 heads.  Although the number could be different, the final count should be somewhere near 25 occurences of two heads.  Otherwise, the coin you are tossing is said to be not balanced or not fair.

#### Two properties that are always true for probabilities:
1. The probability of each outcome is always a number between 0 and 1
2. The sum of the probabilities of all the outcomes must equal 1.

### Probability Rule:

__The probability of an event is the sum of the probabilities of each of the outcomes that are contained in the event__

For example, the probability of flipping a coin an getting heads twice:

$$P(HH) = \;\;?$$

First, we have to think of all the possible combinations of unique outcomes to find our desiered event:

$$Outcomes = \{HH, HT, TH, TT\}$$

So, _HH_ occurs only 1 time out of four possible times, so:

$$P(HH) = \frac{1}{4}\;\; or \;\; .25$$

This satisfies our rule because

$$.25 + .25 + .25 + .25 = 1\;\; (outcomes\;\; must\;\; equal\;\; 1)$$
$$.25 + blank + blank + blank  = .25\;\; (sum\;\; of\;\; events\;\; equals\;\; probability\;\; of\;\; event)$$

__Compliment of an event__: is all of the outcomes that are _not_ in the event

For example, we return to the experiment of tossing a coin twice which has the outcomes of:

$$\{HH, HT, TH, TT\}$$

If we decide the event, call it event A, will be an outcome that contains only 1 head, then we will have only two outcomes in our event:

$$A = \{HT, TH\}$$

Therefore, the __compliment__ of the event, which is often denoted with an exponent 'c', contains the remaining outcomes:

$$A^{c} = \{HH, TT\}$$

So, the event and the compliment if summed equal 1, which means:

$$P(A) + P(A^{c}) = 1$$

If we use subtraction, we've obtained another probability rule!

__Rule__:

$$P(A) = 1 - P(A^{c})$$

For an example of using this rule, suppose that you take a true/false quiz that consists of three questions.  If passing the quiz is getting at least one of the questions correct, what is the probability that you pass the quiz by guessing?

__First__ we note that each question can be either a Y or N corresponding to the question being Y for correct and N for being incorrect.  And since there are three questions to answer, the set of all possible outcomes are:

$$\{YYY, YYN, YNY, YNN, NYY, NYN, NNY, NNN\}$$

Each of thsee outcomes has the same probability of 1/8 of occuring.  These probabilities are all equal because we're guessing.

__Next__ let

$$A = the\;event\;that\;you\;pass\;the\;quiz \\
A =\{YYY, YYN, YNY, YNN, NYY, NYN, NNY, NNN\}$$

__Then__ using the rule:
$$P(A) = 1 - P(A^{c}) \\
P(A) = 1 - P(NNN) \to because\;A^{c} = \{NNN\}\;because\;all\;three\;were\;incorrect \\
P(A) = 1 - \frac{1}{8} \\
P(A) = \frac{7}{8}$$

Of course we could always solve the problem more directly, however, there is more computation involved:

$$P(A) = P(YYY) + P(YYN) + P(YNY) + P(YNN) + P(NYY) + P(NYN) + P(NNY) + P(NNN) \\
P(A) = \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} + \frac{1}{8} = \frac{7}{8}$$

### Conditional Probability 

__Conditional Probabilities__ occur when some additional information is known about the set of outcomes for an experiment, e.g., suppose you select a person at random and wish to find the probability that the person can thrown a baseball in excess of 95 mph.  If you happen to know some additonal information about the set of outcomes that tells you the person is a professional baseball player, then the probability the person can really throw a baseball in excess of 95 mph would definitely be larger than if the person had never played baseball.

__For example__ we'll use dice again:

Let A = the event that we roll a 2
Let B = the event that we roll an even number, e.g., 2, 4, or 6

$$=P(rolling\;a\;2) \\
=P(A) \\
=\frac{number\;of\;outcomes\;in\;the\;event}{total\;number\;of\;possible\;outcomes} \\
=\frac{1}{6}$$

Now, let's set up a conditional probability--the probability of rolling a 2 given the outcome is an even number:

$$=P(rolling\;a\;2\;given\;the\;outcome\;is\;an\;even\;number) \\
=P(A\;givenB) \\
=P(A \mid B) \\
=\frac{number\;of\;outcomes\;in\;the\;event}{total\;number\;of\;possible\;outcomes} \\
=\frac{1}{3}$$

The conditional probaiblity is larger because the total set of outcomes is a smaller number, in this case only 3 outcomes as compared to 6 outcomes.

__Independent Events__

A and B are independent events provided
$$P(A \mid B) = P(A)$$

This definition is a formal way of saying that if B is known to have occured, then the probability of A occuring is the same as if B had not occured.  So, the knowledge that one event has occured (B in this case) has no effect on probability of the other event (A in this case) occuring.  The two probabilities are exactly the same.

Consider the following experiment:

Toss a fair coin twice and record the results H for heads and T for tails.  The set of possible outcomes is:

| HH | HT |
|------|-----|
| __TT__  | __TH__ |

HT means heads for the first flip, tails for the second

Let A = the event that the first toss is a H
Let B = the event that the second toss is a H

We'll show how the two are are independent:

$$P(A) = \frac{number\;of\;outcomes\;in\;the\;event}{total\;number\;of\;possible\;outcomes} = \frac{2}{4} = \frac{1}{2} \\
P(A \mid B) = \frac{number\;of\;outcomes\;in\;the\;event}{total\;number\;of\;possible\;outcomes} = \frac{1}{2}$$

Since both A and B are the same, we have shown that A and B are independent events.

Another way to deal with __independent events__

A and B are independent events provided:

$$P(A \land B) = P(A) \times P(B)$$

A and B means that both events A and B occur, which means in this example HH

$$A \land B = \{HH\} \\
P(A \land B) = \frac{1}{4} \\
P(A) \times P(B) = \frac{1}{2} \times \frac{1}{2} = \frac{1}{4}$$

So, the two events are independent!

In [6]:
def coinToss(times):
    outcomes = []
    for i in range(times):
        flip = np.random.randint(2)
        if flip == 0:
            outcomes.append("H")
        elif flip == 1:
            outcomes.append("T")
    return outcomes 

In [7]:
coinToss(2)

['T', 'T']