<a href="https://colab.research.google.com/github/rahiakela/data-science-research-and-practice/blob/main/data-science-bookcamp/case-study-1/01_computing_probabilities.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Computing probabilities

**Few things in life are certain; most things are driven by chance**. Whenever we cheer
for our favorite sports team, or purchase a lottery ticket, or make an investment in
the stock market, we hope for some particular outcome, but that outcome cannot
ever be guaranteed.

**Randomness permeates our day-to-day experiences**. Fortunately,
that randomness can still be mitigated and controlled.

We know that some
unpredictable events occur more rarely than others and that certain decisions carry
less uncertainty than other much-riskier choices. **Driving to work in a car is safer than riding a motorcycle.**

**These behaviors have been rigorously studied using
probability theory.** 

Probability theory is an inherently complex branch of math. However,
aspects of the theory can be understood without knowing the mathematical underpinnings. 

In fact, difficult probability problems can be solved in Python without
needing to know a single math equation. **Such an equation-free approach to probability requires a baseline understanding of what mathematicians call a sample space.**

##Sample space analysis

**Certain actions have measurable outcomes. A sample space is the set of all the possible outcomes an action could produce.**

Let’s take the simple action of flipping a coin. The
coin will land on either heads or tails. Thus, the coin flip will produce one of two measurable outcomes: `heads` or `tails`.

In [1]:
sample_space = {"heads", "tails"}

Well, our sample space holds two possible elements.
Each element occupies an equal fraction of the space within the set. Therefore, we expect Heads to be selected with a frequency of `1/2`.

**That frequency is formally defined as the probability of an outcome. All outcomes within `sample_space` share an identical probability, which is equal to `1 / len(sample_space)`.**

In [2]:
probability_heads = 1 / len(sample_space)
print(f"Probability of choosing heads is {probability_heads}")

Probability of choosing heads is 0.5


The probability of choosing Heads equals 0.5. This relates directly to the action of flipping a coin.

Thus, a coin flip is conceptually equivalent to choosing
a random element from sample_space. The probability of the coin landing on heads
is therefore 0.5; the probability of it landing on tails is also equal to 0.5.

**An event is the subset of those elements within `sample_space` that satisfy some event condition.**An event condition
is a simple Boolean function whose input is a single `sample_space` element.

Let’s define two event conditions: one where the coin lands on either heads or tails, and another where the coin lands on neither heads nor tails.

In [3]:
def is_heads_or_tails(outcome):
  return outcome in sample_space

In [4]:
def is_neither(outcome):
  return not is_heads_or_tails(outcome)

Also, for the sake of completeness, let’s define event conditions for the two basic
events in which the coin satisfies exactly one of our two potential outcomes.

In [5]:
def is_heads(outcome):
  return outcome == "heads"

def is_tails(outcome):
  return outcome == "tails"

We can pass event conditions into a generalized `get_matching_event` function.
Its inputs are an event condition and a generic sample space.

In [6]:
def get_matching_event(event_condition, sample_space):
  return set([outcome for outcome in sample_space if event_condition(outcome)])

Let’s execute `get_matching_event` on our four event conditions. Then we’ll output the four extracted events.

In [7]:
event_conditions = [is_heads_or_tails, is_heads, is_tails, is_neither]

for event_condition in event_conditions:
  print(f"Event Condition: {event_condition.__name__}")
  event = get_matching_event(event_condition, sample_space)
  print(f"Event: {event}\n")

Event Condition: is_heads_or_tails
Event: {'tails', 'heads'}

Event Condition: is_heads
Event: {'heads'}

Event Condition: is_tails
Event: {'tails'}

Event Condition: is_neither
Event: set()



The probability of a single-element outcome for a fair coin is `1 / len(sample_space)`. This property can be generalized to
include multi-element events. The probability of an event is equal to `len(event) / len(sample_space)`, but only if all outcomes are known to occur with equal likelihood.

In other words, **the probability of a multi-element event for a fair coin is equal to the event size divided by the sample space size.**

We now use event size to compute the four event probabilities.

In [8]:
def compute_probability(event_condition, generic_sample_space):
  # The compute_probability function extracts the event associated with an inputted event condition to compute its probability
  event = get_matching_event(event_condition, generic_sample_space)
  return len(event) / len(generic_sample_space)  # Probability is equal to event size divided by sample space size

In [9]:
for event_condition in event_conditions:
  prob = compute_probability(event_condition, sample_space)
  name = event_condition.__name__
  print(f"Probability of event arising from {'name'} is {prob}")

Probability of event arising from name is 1.0
Probability of event arising from name is 0.5
Probability of event arising from name is 0.5
Probability of event arising from name is 0.0


The outputs is a diverse range of event probabilities, the smallest of
which is 0.0 and the largest of which is 1.0.

These values represent the lower and
upper bounds of probability; no probability can ever fall below 0.0 or rise above 1.0.

###Analyzing a biased coin