Science and engineering have seen amazing progress over the last few centuries. We are now able to launch a spacecraft from Earth and predict it will arrive on Mars at a certain time and location. However, it looks like not everything is as easy to predict as the trajectory of a spacecraft.

Take tossing a coin, for instance — as ridiculous as it may sound, we're not able to predict with certainty whether the coin is going to land on heads or tails. And that's because a coin toss is a very complex phenomenon. The outcome depends on mutiple factors — the strength and the angle of the toss, the friction of the coin with air, the landing angle, the surface the coin lands on, etc.

<Img src="https://github.com/rhnyewale/INFO-7390-Advance-Data-Science-Architecture/blob/master/Images/Head_tails.jpg?raw=true">

Although we can't tell beforehand the outcome of a coin toss, we'll learn in this course that we're able to at least estimate the probability (the chances) of a coin landing on heads or tails. This may sound like a limitation, and in a way it is, but estimating probabilities is an extremely powerful technique that can enable us to build non-trivial applications, including:

* Image recognition systems (used for self-driving cars, medical diagnosis, etc.)
* Spam filters for inboxes
* Statistical hypothesis tests

Throughout this course, we'll learn:

* How to estimate probabilities theoretically and empirically.
* What are the fundamental rules of probability.
* Counting techniques — tree diagrams, the rule of product, permutations, and combinations.
    
    
A **random experiment** is any process for which we can't predict outcomes with certainty.

Although we can't predict the outcome of a random experiment, we can at least estimate the probability (the chances) associated with its outcomes. A coin toss has two possible outcomes, and we can estimate the probability associated with the coin landing on heads or tails.
    
The **empirical probability** of an event is nothing but the **relative frequency** (proportion or percentage) of that event with respect to the total number of times the experiment was performed.

In [2]:
from numpy.random import seed, randint

In [3]:
seed(1)

In [4]:
def coin_toss():
    if randint(0,2)==1:
        return 'HEAD'
    else:
        return 'TAIL'
    
probabilities = []
heads = 0

In [5]:
for n in range(1,10001):
    outcome = coin_toss()
    if outcome == 'HEAD':
        heads += 1
    current_probability = heads/n
    probabilities.append(current_probability)

In [6]:
print(probabilities[:10])

[1.0, 1.0, 0.6666666666666666, 0.5, 0.6, 0.6666666666666666, 0.7142857142857143, 0.75, 0.7777777777777778, 0.7]


In [7]:
print(probabilities[-10:])

[0.4993494144730257, 0.49939951961569257, 0.4993495446812769, 0.4993996397838703, 0.4993496748374187, 0.4992997198879552, 0.49934980494148246, 0.4993998799759952, 0.49934993499349933, 0.4994]


Probability value estimated by performing an experiment is called empirical (or experimental) probability. To find the empirical probability of any event E (like a coin landing heads up), we learned to use the formula:

However, properly calculating empirical probabilities requires us to perform a random experiment many times, which may not always be feasible in practice. An easier way to estimate probabilities is to start with the assumption that the outcomes of a random experiment have equal chances of occurring. This allows us to use the following formula to calculate the probability of an event E:


P(E)=1/total number of possible outcomes

For instance, the total number of possible outcomes for a coin toss is two: heads or tails. Let H be the event that a coin lands on heads, and T the event that a coin lands on tails. We can use the formula above to find P(H) and P(T):


Let's also consider the rolling of a die, where there are six possible outcomes: 1, 2, 3, 4, 5 or 6. Assuming each outcome has the same chance of occurring, the probability of getting a 2 or a 4 is:


When we calculate the probability of an event under the assumption that the outcomes have equal chances of occurring, we say that we're calculating the **theoretical probability** of an event.

Theoretical probabilities are much easier to calculate, but in practice it doesn't always make sense to assume the outcomes of a random experiment have equal chances of occurring. If you were playing the lottery, it wouldn't be reasonable to assume that the two possible outcomes (you win or you don't) have equal chances.

If you were a scientist trying to calculate the probability of a human becoming infected with the HIV virus, it wouldn't be reasonable to assume the two possible outcomes (becoming infected with HIV or not) have equal chances of occurring. Both theoretical and empirical probabilities are helpful and important in practice.

Remember that a random experiment is any process for which we can't predict outcomes with certainty. An outcome is a possible result of a random experiment, while an event can include more than one outcome.

In probability theory, the outcomes of a random experiment are usually represented as a set. For example, this is how we can represent the outcomes of a die roll as a set:

A set is a collection of distinct objects, which means each outcome must occur only once in a set:

* {Heads, Tails} is an example of a valid set because all the elements are distinct.
* {Heads, Heads} is not a proper set because two elements are identical.
Notice we also use curly braces to write a set: {Heads, Tails} is a set, while [Heads, Tails] is not a set.

In probability theory, the set of all possible outcomes is called a sample space. A sample space is often denoted by the capital Greek letter Ω (read "omega"). This is how we represent the sample space of a die roll

<Img src="https://github.com/rhnyewale/INFO-7390-Advance-Data-Science-Architecture/blob/master/Images/Prob1.jpg?raw=true">
    
<Img src="https://github.com/rhnyewale/INFO-7390-Advance-Data-Science-Architecture/blob/master/Images/Prob2.jpg?raw=true">
    
<Img src="https://github.com/rhnyewale/INFO-7390-Advance-Data-Science-Architecture/blob/master/Images/mut_exclusive.jpg?raw=true">
    
    

Events that don't intersect are called **mutually exclusive** — events A and B are mutually exclusive. If two events are mutually exclusive, it means they can't happen both at the same time — if one of the events happens, the other cannot possibly happen and vice-versa. Examples of mutually exclusive events include:

* etting a 5 (event one) and getting a 3 (event two) when we roll a regular six-sided die — it's impossible to get both a 5 and 3.
* A coin lands on heads (event one) and tails (event two) — it's impossible for a coin to land on both heads and tails.
    
Events that intersect are called mutually **non-exclusive** — events C and D on the Venn diagram above are mutually non-exclusive. Mutually non-exclusive events can happen at the same time, and examples include:

* Getting a number greater than 2 (event one) and getting an odd number (event two) when we roll a regular six-sided die — we could get a 5, which is both greater than 2 (event one) and odd (event two).
* A customer buys a red shirt (event one) and a blue shirt (event two) — the customer can buy both a red shirt (event one) and a blue shirt (event two).
    
<Img src="https://github.com/rhnyewale/INFO-7390-Advance-Data-Science-Architecture/blob/master/Images/mut_exclusive_non.jpg?raw=true">

1. What is the probability that it takes three flips or more for a coin to land heads up?
2. What is the probability of a coin landing heads up 18 times in a row?
3. What is the probability of getting at least one 6 in four throws of a single six-sided die?
4. What is the probability of getting at least one double-six in 24 throws of two six-sided dice?
5. What is the probability of getting four aces in a row when drawing cards from a standard 52-card deck?

To find P(H1 ∩ H2), we can use a new rule called the **multiplication rule of probability** and multiply P(H1) by P(H2)

The multiplication rule, however, is a bit more nuanced, and it doesn't work for all kinds of events — at least not in this form. Consider the following two events, which are associated with flipping a fair coin:

* H1: the coin lands heads up on the first flip
* H2: the coin lands heads up on the second flip

Taken individually, P(H1) = 0.5 and P(H2) = 0.5. If event H1 happens (the coin lands heads up), P(H2) keeps the same value (0.5) — the fact the we get heads up on the first flip doesn't influence in any way the probability of getting heads up on the second flip

Events that don't influence each other's probability are called **independent events**. If H1 happens, P(H2) stays the same, so H1 and H2 are independent. The multiplication rule we learned only works for independent events.

Consider now the following two events, which are associated with rolling a fair six-sided die:

* A: we get a number less than 4; event A corresponds to the outcomes {1, 2, 3}
* B: we get an even number; event B corresponds to the outcomes {2, 4, 6}


Taken individually, P(A) = 3/6 and P(B) = 3/6
However, if event A happens, then we know for sure the outcome is some number from the set associated with A: {1, 2, 3}. If we know event A happened and the die showed a number less than 4, then what's the probability of B? Still 3/6?

Event B (getting an even number) corresponds to the outcomes {2, 4, 6}. We know for sure we got one of the numbers {1, 2, 3} — because we know event A happened. Only 2 is an even number in {1, 2, 3}, so event B can only happen in this case if the die showed a 2 — because event B only happens if the die shows an even number ({2, 4, 6}). There are three possible outcomes ({1, 2, 3}) and only one successful outcome ({2}), so P(B) becomes:

P(B)=number of successful outcomes/total number of possible outcomes=1/3


We can combine these two rules to solve two of the probability problems we posed in the beginning:

* What is the probability of getting at least one 6 in four throws of a single six-sided die?
* What is the probability of getting at least one double-six in 24 throws of two six-sided dice (the two dice are thrown simultaneously)?

Let's begin with the first question and use "A" to refer to the event "getting at least one 6 in four throws of a single six-sided die". To find P(A), we can use the formula:



In [1]:
#probability of getting at least one double-six in 24 throws of two six-sided dice

p_one_double_6 = 1 - (35/36)**24

## Permutation

Each PIN code represents a certain arrangement where the order of the individual digits matters. Because order matters, the code 1289 is different than the code 9821, even though both are composed of the same four digits: 1, 2, 8 and 9. If the order of digits didn't matter, 1289 would be the same as 9821.

More generally, a certain arrangement where the order of the individual elements matters is called a permutation. For instance, there are 10,000 possible permutations for a 4-digit PIN code (in other words, there are 10,000 digit arrangements where the order of the digits matters).

<Img src="https://github.com/rhnyewale/INFO-7390-Advance-Data-Science-Architecture/blob/master/Images/permutation.jpg?raw=true">

In [2]:
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

permutations_1 = factorial(6) # because there are 6 letters
permutations_2 = factorial(52)

In [3]:
print(permutations_1,permutations_2)

720 80658175170943878571660636856403766975289505440883277824000000000000


<Img src="https://github.com/rhnyewale/INFO-7390-Advance-Data-Science-Architecture/blob/master/Images/permutation2.jpg?raw=true">

In [4]:
def factorial(n):
    final_product = 1
    for i in range(n, 0, -1):
        final_product *= i
    return final_product

def permutation(n, k):
    numerator = factorial(n)
    denominator = factorial(n-k)
    return numerator/denominator

total_n_outcomes = permutation(127, 16)
p_crack_pass = 1/total_n_outcomes

In [5]:
print(p_crack_pass)

5.851813813338265e-34
