```



















```

Congrats, you've made it to the second half of this textbook. Let's start off by making a bold claim: *the study of probability is the key to tackling uncertainty*.

What exactly does the study of probability entail? And if it's so powerful, how can we use computers to leverage that power?

Let's dive in.

## Probabilities in our daily lives

Simply, a {dterm}`probability` is a measure of how likely something is to happen.

We rely on likelihood in our daily lives whenever we're faced with a question that doesn't have a definite answer. These often come in the categories of questions about the future, ["Can I get an A without going to class?"], or questions subject to debate, ["Does ...", "Did jury unfair?"].

While all of the questions we could possibly ask have some fundamental, universal truth, much like quantum physics the true answer remains unknown until it's been observed! This means, in order to get a 100% certain answer, we need to wait for the future or arrive, or for an investigation so thorough that there's not even a shred of doubt remaining.

That's less than ideal. There must be a way to answer the question without waiting.

Enter *probabilities*. In real life when we're faced with a question with an uncertain answer we usually offer whichever answer is *most likely*. We say things like, ["I'm 80% sure that I can get an A"], or ["There's a pretty good chance the election was rigged."] How we arrive at that measure of '50%', and the math that lead us to 'pretty good chance' all boils down to calculating probabilities.

## A semi-formal introduction to probability

Formally, a probability of an event, written as $P(\text{event})$ is the likelihood of an event expressed as a value between $0$ and $1$. A probability of $0$ means the event will theoretically never happen, $1$ means the event is theoretically certain to happen, and $0.5$ means the event is just as likely to happen or not happen.

For instance, when we flip a fair coin, the probability that it lands showing Heads is $P(\text{Heads}) = \frac{1}{2}$ (or $0.5$ or $50\%$). We know this intuitively, but let's formalize the math a little bit to figure out where that number comes from.

```{tip}
You may notice that probabilities are defined as between 0 and 1, but sometimes we use a percentage. Mathematically they're the same, $100\% = 100 \textit{ per cent} = \frac{100}{100} = 1.$ But while percents can be nice in spoken language, you should stay consistent and always work with probabilities expressed as fractions or decimals between 0 and 1. This will make your life much easier and prevent you from pulling out your hair due to decimal place errors!
```

Since we're learning about data *science*, let's define the terminology of probability in a scientific manner.

Each time we flip a fair coin, we're essentially conducting an *experiment*. An {dterm}`experiment` is a process with a set of distinct possible {dterm}`outcomes`, only one of which can be the true result at a given time. We can conduct many trials of the experiment, but the each time we are uncertain which specific outcome will be the result. In this example, our experiment is a single coin flip and the possible outcomes are Heads and Tails.

### All things equal

In the case of flipping a coin, rolling a die, or any other experiment where all outcomes are equally likely, the probability of any given outcome is one divided by the total number of possible outcomes. So, for flipping a coin the probability of any outcome is $\frac{1}{2}$, for a six-sided die the probability of any outcome is $\frac{1}{6}$.

$$P(\text{equally-likely outcome}) = \frac{1}{\text{# of possible outcomes}}$$

```{margin}
If all outcomes of a process are equally likely, we usually call that process 'random'
```

In Python, we'll use NumPy's `.random.choice` function to choose one outcome from a list of possible outcomes. The code below essentially conducts an experiment of a single coin flip by using NumPy, then checks if it landed on 'Heads' by using a {dterm}`comparison operator`. If we run the cell a bunch of times, we should expect that it'll return True roughly half of the time.

In [None]:
import numpy as np

outcome = np.random.choice(['Heads', 'Tails'])

outcome == 'Heads'

Before you tire your fingers running the cell dozens of times, you should know that NumPy makes it easy to run multiple trials of the experiment by specifying a second argument, `size=`. The result is an array of outcomes from each trial.

In [None]:
# Random seed of zero results in 2 heads out of 10... could be useful.
# np.random.seed(0)

[! should we introduce the for loop here instead?]

In [None]:
outcomes = np.random.choice(['Heads', 'Tails'], size=10)
outcomes

And since we're working with an array we can do element-wise comparisons in order to check if each one is Heads or not.

In [None]:
outcomes == 'Heads'

Did you know that you can count how many `True`s there are in a sequence by just taking the `sum` of that sequence? Let's try that now in order to find out how many of our ten coin flips resulted in Heads.

In [None]:
sum(outcomes == 'Heads')

### Collections of outcomes

Let's move on to something with more than two outcomes, like rolling a fair six-sided die. If you're playing a game that involes a die, you might be able to win if you roll a one *or* if you roll a two, so you'd be interested in the chances of getting either of these outcomes. The probability one-of-multiple outcomes occurring, like rolling a one or a two, is equal to the sum of each individual probability.

$$P(\text{one or two}) = P(\text{one}) + P(\text{two}) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6}$$

What you've just stumbled upon is called an *event*. An {dterm}`event` is a collection of outcomes where we're interested in finding the probability of any of those outcomes occurring. If we define the event $\text{win}$ as rolling a one or a two, (often expressed in Set notation) $\text{win} = \{\text{one, two}\}$, then calculating the probability of $\text{win}$ takes on the exact same calculation as above.

$$P(\text{win}) = P(\{\text{one, two}\}) = P(\text{one or two})$$

From these two facts, it follows that the probability of any event is always just the sum of the probabilities of each outcome that satisfies the event.

$$P(\text{event}) = \sum_{\text{all outcomes in event}} P(\text{outcome})$$

When we're in the scenario of equally-likely outcomes, it so happens that it doesn't matter *which* outcomes are in our event, merely *how many* outcomes are in our event. By recognizing the repeated addition of $1$ in the numerator when the probabilities of two equally-likely outcomes are added, we can deduce a simpler way to calculate the probability of an event when outcomes are equally likely.

$$P(\text{event with equally-likely outcomes}) = \frac{\text{# outcomes in event}}{\text{total # possible outcomes}}$$

```{margin}
[because it's based on counting how many possible outcomes there are and how many outcomes we're interested in (in our event), probability with all outcomes equally likely, often referred to as 'Classical Probability' is tackled primarily using combinatorics]
```

Just like an event can be broken down into multiple *or*'s mathematically, we can do the exact same thing in our code by using the `or` operator to string together multiple equality checks. In the code below, we're running an experiment of a single die roll, and checking if the winning event is satisfied by the roll.

In [None]:
outcome = np.random.choice([1, 2, 3, 4, 5, 6])

outcome == 1 or outcome == 2

[Again, we can run multiple trials of the experiment and sum up the Trues] [we need to use the `|` operator (and parentheses!) when working with arrays though]

In [None]:
outcomes = np.random.choice(range(1,7), 10)
outcomes

In [None]:
sum(
    (outcomes == 1) | (outcomes == 2)
)

Humorously, we can also find the probability of the event containing zero outcomes. Asking for the probability of this empty event is essentially asking for the probability that our experiment produces none of the possible results thus violates all universal laws. If this sounds impossible to you, you're right. The probability of an empty event is zero. Whew, existential crisis avoided.

On the other end of the spectrum, we could ask for the probability that any of our outcomes happen by specifying the event containing all outcomes. Necessarily our experiment must produce some form of outcome, so the probability of the [full event] is one. Because we also know that the probability of an event is equal to the sum of the probabilities of each outcome it contains, we can equivalently state that the probabilities of all possible outcomes sum to one.

$$\sum_{\text{all outcomes}} P(\text{outcome}) = 1$$

Finally, the probability that some event does *not* happen is one minus the probability that it does. You can think of this as taking the [full event] and removing the event that we're interested in not happening.

$$P(\text{not event}) = 1 - P(\text{event})$$

## A general way to find probabilities

In the code examples above, we ran an experiment many times and expected a certain number of them to satisfy our event because we already knew the probability of the event.

What if we ran our code for flipping a coin 100 times and only saw that 20 of them showed up as Heads? Naturally we would start to suspect that the probability of flipping a Heads is closer to $0.2$ rather than $0.5$. Right?

Aha! If you agreed, then you already know about the primary definition of probability that beginning data scientists use, called the 'frequentist' approach. This definition operates without the need for any assumptions about the likelihoods of our outcomes. We calculate the probability of some event happening as approximately the number of times we observed it divided by the total number of observations we made. This is often called the 'experimental probability' (to distinguish it from the universal truth that it approximates).

$$P(\text{event}) \approx \frac{\text{# times event observed}}{\text{total # observations}}$$

If you remember our introduction to {dterm}`proportions` in Exploring Data, you'll notice that the calculation of an experimental probability is the exact same as the calculation of a proportion! [leverage computers to crunch probabilities] [This is mirrored in intuition]

The approximation gets closer and closer to the true underlying probability as the number of observations increases. Formally, the equality holds once you've made infinite observations... in practice it can be pretty challenging to make infinite observations!

Naturally, if you don't make many observations there's a good chance (!) that you'll calculate the incorrect probability simply due to randomness inherent to the experiment. You wouldn't expect to get exactly five Heads every time you flip ten coins. Furthermore, the fewer observations that are made, the more extreme each deviation seems, and the less precise our answer can be. If you only make ten observations, then a single unit change in the numerator causes a change of 0.1 (10%) in the resulting probability, and we can only be precise up to the nearest 0.1. Whereas if we were to make one-hundred observations, then we can be precise up to the nearest 0.01.

Because it's never possible to make infinite observations, we need to be cognizant that we lose both accuracy and precision as the number of observations decreases -- we'll talk about how to deal with this more when we discuss sampling.

## Repeated experiments and simulation

Above, we made a statement relating to flipping a coin ten times and claiming you can find the probability that the coin will land on Heads.

> "if you don't make many observations there's a good chance that you'll calculate the incorrect probability simply due to randomness"

Any time you see the word 'chance', know that there is a probability that can be unearthed! In this scenario, we're looking for the probability that you flip a coin ten times and *don't* see exactly five Heads.

Like before, we can use the frequentist approach to approximate this probability by defining a new experiment: *flip a coin ten times and record how many Heads showed up*, with a new set of outcomes: $\text{Heads} = \{0,1,\ldots,10\}$, specifying a desired event: $\text{Heads} = \{5\}$, then conducting many trials and seeing how many observations satisfy our event.

- How do we flip a coin ten times and repeat that same process multiple times? The answer is called 'iteration' and comes in the form of a **for-loop**.

- for loop
    - write the code for a single run of the experiment
    - put it in a function
    - call that function multiple times
        - how to do that? for loop

- this simulation approach blends the classical and frequentist approaches
    - classical for coin flip probability (using assumptions!)
    - frequentist for prob of result

[! seems like this could easily segue into either distributions or HT]

```






















































```

- if you meet someone new, when does their birthday occur? (ignoring leap years)
- unsure, a bit like Schrodinger's cat -- we don't know the truth until it is observed!
- now is a good time for a 'probabilistic model'
- assume that birthdays are uniformly distrbuted -- equally likely
- with all possible outcomes equally likely, the probability of their birthday being on any given day is 1/365 (again, ignoring leap years)
- we could find probabilities of an event -- a collection of outcomes
- other math properties somehow
- if we were to ask people we'd expect to find

- but should we question the validity of our model?
- say we're trying to determine which day is actually most likely for us to guess correct
- we essentially do the reverse of our expectation -- welcome to the 'frequentist' approach
- the probability is approximately the number of times observed over the number of trials -- getting closer to truth as the number of trials gets closer to infinity

- we can't ask infinite people though! there are only 7bn people. Wait, we can't ask 7bn people either! Enter: sampling.

"how many people got an A without going to class last quarter?" -> take proportion -> frequentist approach

[might not know underlying chance of outcomes]
[frequentist]
[our frequency might suggest unfair -- but 'is it fair' is another question of uncertainty that we can answer by asking what the chance is that we saw this result even if we *did* have a fair coin]

[while there is math we can use, an arguably simpler approach is to simulate, and leverage computers]

[notice that 

# Probabilities and Simulation

Welcome to the second half of this textbook! Let's start off with a bold statement related to what we're about to learn.
*The study of probability is the key to tackling uncertainty.*

What does that mean? And, if it's true that probabilities are so powerful, then how can we use computers to leverage this power?

## What are probabilities, and why do we use them?

So far, we've learned how to use Python and Babypandas to answer specific questions about our data -- questions that have a definite answer. Given the proper data set, we could answer a question like "*how many people between the age of 30-40 have diabetes?*". However, many of the decisions we face in life arise from questions that don't have such clear-cut answers, like "*a 35 year old patient just walked into the doctors office, do they have diabetes?*". The outcome of this question is subject to uncertainty -- we can't be sure of the truth until it's been observed.

Instead of trying to give an absolute answer to these questions, we tackle them by finding *how likely* each possible outcome is.

A {dterm}`probability` is simply a measurement representing how likely something is to happen. Probabilities range from $0$, meaning that thing will theorically never be happen, to $1$, meaning the event is theoretically certain to be observed.

## Frequency-based probability

Until you delve into upper division statistics courses, the realm of probabilities that we often operate in is the 'Frequentist' approach. 

$$P(\text{event}) = \frac{\text{# outcomes in event}}{\text{total # of equally likely outcomes}}$$

$$P(\text{event}) = \frac{\text{# times event observed}}{\text{total # of observations}}$$

Let's entertain an example that we should all be familiar with: flipping a coin. Unbeknownst to most, flipping a coin is an incredibly complex process whose outcome depends on a nigh infinite number of factors: the starting orientation of the coin, the strength with which it's flipped, the presence of a breeze, the wear and tear of the coin... et cetera... all culminating with whether or not the catcher chooses to keep the coin resting in their palm or slap it on to the back of their other hand after they catch it! But in practice, we don't think about a coin flip this way. Not only is it infeasible to attempt to calculate all of those factors (lest even name them all!), but we can nicely summarize the coin flipping process with a simple {dterm}`probabilitistic model`: we expect that half of the time the coin will show Heads, and half of the time it will show Tails.

The probability of a Heads is thus 0.5, written mathematically as $P(\text{Heads})=0.5$. The probability of Tails is in this case the same, $P(\text{Tails})=0.5$.

In life we call processes like flipping a coin *random*, with outcomes that are subject to *chance*. Any time you hear these words, understand that a probabilistic model is at play, and therefore any questions about this process must be answered by working with the likelihood (probabilities) of each outcome using that model.

## Conducting an experiment

In the coin flip example above, we knew the probabilities by assumption that the coin was fair. But what if we're the scrupulous type, and want to see for ourselves what the probabilities of flipping a Heads or Tails on a given coin truly is.

In practice, we can find probabilities by conducting an experiment. In our experiment, we conduct multiple trials and keep track of how many times the particular event we're interested in occurs. For example, we could flip a coin ten times, and see how many times we get Heads. The probability of that event boils down to a simple form.

$$
P(\text{event}) = \frac{\text{# of times event observed}}{\text{# of observations}}
$$

It may come as no surprise that probabilities calculated this way are called *experimental probabilities*, as opposed to their theoretical, universal-truth counterparts.

If in our hypothetical example we flip the coin $10$ times and we get Heads $3$ times, then we conclude that our experimental probability of getting a Heads is $P(\text{Heads}) \approx 0.3$. Since the only other possibility is Tails, then we conclude that our experimental probability of getting a Tails is $P(\text{Tails}) \approx 0.7$.

[Notice that when a probability is calculated it's impossible for us to observe the event less than $0$ times and also impossible to see it more times than the number of observations. Therefore we arrive at a first important rule -

$$0 <= P <= 1$$]

[is our coin unfair?] [you may be tempted to think so, but it turns out there's a pretty good chance of us 
[what's the probability that we observe 3 out of 10 even with a fair coin?] [that's another experiment that we can conduct!] [in the mean time, it's probably best for us to just increase the number of trials]

[as the number of trials goes up, we're more likely to get close to the actual underlying probability] [one way to think about the underlying theoretical probability is -- if there were an *infinite* nummber of observations, what would the empirical probability converge to?]

In [None]:
times_event_seen = 0
trials = 10_000

for i in range(trials):
    flips = np.random.choice([True, False], 10)
    if sum(flips) == 3:
        times_event_seen += 1
times_event_seen / trials

## The math of probabilities

Before we get too far ahead of ourselves, there are a couple properties that all probabilities satisfy which prove useful whenever you're working with them.

- All probabilities are between $0$ and $1$ (inclusive)

    $$0 \leq P(\text{event}) \leq 1$$
    
- The probabilities of all possible *outcomes* of an event will sum to $1$

    $$P(\text{first possibility}) + \cdots + P(\text{$n$th possibility}) = 1$$
    
    In the example of a coin flip, the two possibilities are Heads and Tails. Therefore, no matter whether or not the coin is fair, P(Heads) + P(Tails) will always sum to 1.
    
- From the above we can conclude that the probability that something *doesn't* happen is $1 - P(does happen)$, [explain this (?)]

    $$P(\text{not event}) = 1 - P(\text{event})$$
    
- If we know that all outcomes are equally likely, we can calculate probabilities as

    $$P(event) = \frac{P(# of possibilities that satisfy the event)}{P(# of possibilities)}$$
    
    This is probably the form of probability you're most familiar with, as it pertains to many things like picking cards or rolling dice.

As you progress as a data scientist, you'll take courses that revolve entirely around probabilities -- this is not that course. For now, only the very basics of formal probability theory needs to be introd

[The exact same concept holds true when applied to problems that data scientists are tasked with.]

[When faced with an outcome that you're uncertain about, the best tool we have at our disposal is to think about how likely each possible outcome is.]

In [None]:
import numpy as np

[[pitfalls of probability -- not always immediately intuitive]] [test questions -- just as likely to get five C's as you are four C's followed by a A] [Monty Hall]

[[how probability helps us solve problems -- introduce Monty Hall, motivate simulation!]]

## Using Python to simulate probabilities

[one of the pitfalls of probability is that they're notoriously unintuitive]

Often times it can be challenging to calculate exact probabilities using math -- and sometimes it's actually impossible! Here's where computers come to the rescue.

If you're still unconvinced about the result of Monty Hall, we can run a simulation to compare the two choices.

Write code for a single trial

In [None]:
import numpy as np

doors = np.random.choice(['Goat', 'Goat', 'Car'], size=3, replace=False)
choice = np.random.choice([0, 1, 2])

initial_guess = doors[choice]

if initial_guess == 'Goat':
    winning = 'Switch'
elif initial_guess == 'Car':
    winning = 'Stay'
    
winning

Wrap it in a function

In [None]:
def run_monty_hall(do_switch=False):
    
    doors = np.random.choice(['Goat', 'Goat', 'Car'], size=3, replace=False)
    choice = np.random.choice([0, 1, 2])

    result = doors[choice]

    if do_switch:
        if result == 'Car':
            result = 'Goat'
        elif result == 'Goat':
            result = 'Car'

    return result

In [None]:
run_monty_hall(True)

Call the function a bunch of times. How do we do this? Using a concept called 'iteration' in the form of a **for-loop**. The syntax for running something many times is as follows.

```html
for i in range(<number_of_trials>):
    <code_that_you_want_run_each_time>
```

Need to save our result

In [None]:
trials = 1000

yes_switch = []
no_switch = []

for i in range(trials):
    yes = run_monty_hall(True)
    no = run_monty_hall(False)
    yes_switch 

Now check how many times you should have switched, versus how many times you should have stayed.