## What is Set Theory?

Set theory is the mathematical logic behind objects.  Almost anything can be thought of in the context of set theory.

<img src='images/prob2-venn_b.png' width=550>

### What is a set?

A set is a collection of items.

### Why do we need to know what a set is?

Probability and statistics are based on counting the elements in sets and manipulating set operations





<br />
<br/>

This image should remind you of a [similar concept](https://www.w3schools.com/sql/sql_join.asp) we've talked about in SQL.


Visualization of [Set Theory](https://seeing-theory.brown.edu/compound-probability/index.html#section1)

### *Activity:  We are trying to create buddies based on staff interest for a staff trip. <br>
Who should buddy with whom based on interests?

This is another way to look at sets.<br>
And we can still use the math!

In [10]:
Robin = ["art", "traveling", "wine", "doodling", "tech", "gadgets"]
Andy = ["rock-climbing", "traveling", "dad jokes", "ice cream"]
Alison = ["wine", "traveling", "schitts creek", "dogs"]
Su = ["schitts creek", "dogs" "tarot card reading", "croquet", "taxonomy"]
Ammar = ["wine", "ice cream", "dogs", "zookeeping", "traveling"]

**Task**:

- In groups of 2-3, draw the venn diagram of interests of each person and how they overlap. 
- Then try the set notation learned in the Learn.co find the overlap answers with python

In [None]:
#your code here

## Key Concepts and Symbols

From that perspective, the fundamental ingredient of probability theory is an **experiment** that can be repeated, at least hypothetically, under essentially identical conditions. This experiment may lead to different outcomes on different **trials** or single performances of an experiment. The set of all possible outcomes or results of an experiment is then called a **"sample space"**. An **event** is a well-defined subset of the sample space.



<img src="images/prob_symbols.png" width=550>

## What is probability?

Probability is the measure of the likelihood that an event will occur.

<img src='images/prob_scale.gif'>

Probability is quantified as a number between 0 and 1, where 0 indicates impossibility and 1 indicates certainty.



## Why probability is important?

Uncertainty and randomness occur in many aspects of our daily life and having a good knowledge of probability helps us make sense of these uncertainties. Learning about probability helps us make informed judgments on what is likely to happen, based on a pattern of data collected previously or an estimate.

## How Probability is used in Data Science?

Data science often uses statistical inferences to predict or analyze trends from data, and statistical inferences uses probability distributions of data. Hence knowing probability and its applications are important to work effectively on data science problems.

## Calculating Probability For Single Events


To calculate this probability, you divide the number of possible event outcomes by the sample space.

$P(A) = \frac{Event \ outcomes \ favorable \ to \ A}{Sample \ space}$

In [3]:
# Sample Space
cards = 52

# Outcomes
aces = 4

# Divide possible outcomes by the sample set
ace_probability = aces / cards

# Print probability rounded to two decimal places
print(f"{ace_probability:.2%}")

7.69%


In [13]:
# Create function that returns probability percent rounded to one decimal place
def event_probability(event_outcomes, sample_space):
    probability = (event_outcomes / sample_space)
    return probability

# Sample Space
cards = 52

# Determine the probability of drawing a heart
hearts = 13
heart_probability = event_probability(hearts, cards)

# Determine the probability of drawing a face card
face_cards = 12
face_card_probability = event_probability(face_cards, cards)

# Determine the probability of drawing the queen of hearts
queen_of_hearts = 1
queen_of_hearts_probability = event_probability(queen_of_hearts, cards)

# Print each probability
#print(str(heart_probability) + '%')
#print(str(face_card_probability) + '%')
#print(str(queen_of_hearts_probability) + '%')

print(f"{heart_probability:.2%}")
print(f"{face_card_probability:.2%}")
print(f"{queen_of_hearts_probability:.2%}")

25.00%
23.08%
1.92%


![xkcd](images/increased_risk_2x.png)

[xkcd comic 1252](https://xkcd.com/1252/)

## Probability with Combinations and Permutations

[View concepts online](https://seeing-theory.brown.edu/compound-probability/index.html#section1)

### Permutations
Permutations are the number of ways a subset of a specified size can be arranged from a given set, generally without replacement. An example of this would be a 4 digit PIN with no repeated digits. The probability of having no repeated digits can be calculated by executing the following calculation:

$10 \times 9 \times 8 \times 7$



When calculating the permutations, this means that you consider the full set of the numbers to choose from, which is in reality

$10 \times 9 \times 8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1$

and you divide the result of this calculation by the difference in the numbers to choose from (10) and the numbers that you actually choose (4). 

$6 \times 5 \times 4 \times 3 \times 2 \times 1$

Note that you can also write the above as

$10P4 = \frac{10!}{(10 - 4)!}$

Generalizing the calculations above, this means that the formula to calculate permutations is the following:

$nPk = \frac{n!}{(n - k)!}$

### Combinations


You have seen that when you're working with permutations, the order matters. With combinations, however, this isn't the case: the order doesn't matter. Combinations refers to the number of ways a subset of a specified size can be drawn from a given set.

An example here is the following situation where you have your deck of cards, which consists of 52 cards. Three cards are going to be taken out of the deck. How many different ways can you choose these three cards?

This means that if you want to figure out how many combinations you actually have, you just create all the permutations and divide by all the redundancies.

$52C3 = \frac{\frac {52!}{(52-3)!}}{3!}$

or 

$nCk = \frac{nPk}{k!}$

#### *Activity:  Figure out the unique combinations of the letters in your name

## Independent versus Dependent Events


Events can be classified into two categories: dependent or independent.

Independent events are events that don't impact the probability of the other event(s). Two events A and B are independent if knowing whether event A occurred gives no information about whether event B occurred.

[View online](https://seeing-theory.brown.edu/compound-probability/index.html#section1)

[Conditional Probability](http://setosa.io/conditional/)

For example, draw an Ace from the deck, replace the card, shuffle the deck, and then drawing another card. The probability of drawing an Ace the first draw is the same as the second.

Dependent events, then, are events that have an impact on the probability of the other event(s).

For example, you draw a card from the deck and then draw a second card from the deck without replacing the first card. In this case, the probability of drawing an Ace the fist draw is not the same as the probability of drawing an Ace on the second draw.



Events A and B (which have nonzero probability) are independent if and only if one of the following equivalent statements holds:

$P (A ∩ B) = P(A)P(B)$

The probability of events A and B to occur equals the product of the probabilities of each event occurring.

$P (A|B) = P(A)$

The probability of event A to occur if an event B has already occurred is equal to the probability of an event A to occur.

$P (B|A) = P(B)$

The probability of an event B to occur if an event A has already occurred is the same as the probability of an event B to occur.

Let's consider the following example, where you already know the probability of drawing an Ace on the first draw. Now you need to determine the probability of drawing an Ace on the second draw, if the first card drawn was either a King or an Ace:

In [15]:
# Sample Space
cards = 52
cards_drawn = 1 
cards = cards - cards_drawn 

# Determine the probability of drawing an Ace after drawing a King on the first draw
aces = 4
ace_probability1 = event_probability(aces, cards)

# Determine the probability of drawing an Ace after drawing an Ace on the first draw
aces_drawn = 1
aces = aces - aces_drawn
ace_probability2 = event_probability(aces, cards)

# Print each probability
print(f"{ace_probability1:2%}")
print(f"{ace_probability2:2%}")

7.843137%
5.882353%


## Multiple Events


An example of multiple events is the question "what is the probability of eating three oatmeal cookies followed by a chocolate chip cookie when you eat four cookies out of a cookie jar filled with these two types of cookies?" Eating four cookies is actually four events.

To calculate the probability for multiple events, you basically determine the number of events (4 in this case), you then determine the probability for each event occurring separately and you multiply all of these probabilities to get your final answer. In the example that was described above, this would be 0.5 x 0.5 x 0.5 x 0.5 or 0.0625.

$P(Event A \cap Event B)=P(Event A) \times P(Event B)$

For your deck of playing cards, you could ask yourself the question "What is the probability of getting three Hearts when choosing without replacement?". When you sample or choose without replacement, it means that you choose a card but do not put it back, so that your final selection cannot include that same card. In this case, your probability calculation will be the following:

13/52 x 12/51 x 11/50.

## Mutually Exclusive Events
When you're working with multiple events, you might also have events that are mutually exclusive or disjoint: they cannot both occur. In such cases, you might want to calculate the probability (or the union) of any of multiple mutually exclusive events occurring. In such cases, you don't multiply probabilities, but you simply add together the probability of each event occurring:

$P(Event A \cup Event B) = P(Event A) + P(Event B)$



To determine the probability of drawing a heart or drawing a club, add the probability of drawing a heart to the probability of drawing a club.

$P(Heart \cup Club) = (\frac{13}{52}) + (\frac{13}{52}) $ 

Now it's time for you to determine the probability of the following mutually exclusive events:

1. Drawing a heart or drawing a club.
2. Drawing an ace, a king or a queen.

In [19]:
# Calculate the probability of drawing a heart or a club
hearts = 13
clubs = 13
heart_or_club = event_probability(hearts, cards) + event_probability(clubs, cards)

# Calculate the probability of drawing an ace, king, or a queen
aces = 4
kings = 4
queens = 4
ace_king_or_queen = event_probability(aces, cards) + event_probability(kings, cards) + event_probability(queens, cards)

print(f'{heart_or_club:.2%}')
print(f'{ace_king_or_queen:.2%}')

50.98%
23.53%


## Non-Mutually Exclusive Events
You can imagine that not all events are mutually exclusive: Drawing a heart or drawing an ace are two non-mutually exclusive events. The ace of hearts is both an ace and a heart. When events are not mutually exclusive, you must correct for the overlap.

$P(Event A \cup Event B) = P(Event A) + P(Event B) - P(EventA \cup EventB)$

To calculate the probability of drawing a heart or an ace, add the probability of drawing a heart to the probability of drawing an ace and then subtract the probability of drawing the ace of hearts.

$P(Heart \cup Ace) = (\frac{13}{52}) + (\frac{4}{52}) - (\frac{1}{52})$

Calculate the probability of the following non mutually exclusive events:

1. Drawing a heart or an ace.
2. Drawing a red card or drawing a face card.

In [13]:
# your code here

## Intersection of Independent Events
The probability of the intersection of two independent events is determined by multiplying the probabilities of each event occurring.

$P(Event A \cap Event B) = P(Event A) \times P(Event B)$

If you want to know the probability of drawing an Ace from a deck of cards, replacing it, reshuffling the deck, and drawing another Ace, you multiply the probability of drawing and Ace times the probability of drawing an Ace.

$P(Ace \cap Ace) = (\frac{4}{52}) \times (\frac{4}{52})$

In [8]:
# Sample Space
cards = 52

# Outcomes
aces = 4

# Probability of one ace
ace_probability = aces / cards

# Probability of two consecutive independant aces 
two_aces_probability = ace_probability * ace_probability

print(f"{two_aces_probability:.2%}")

0.59%


## Intersection of Dependent Events
The probability of the intersection of two non independent events (Event A & Event B given A) is determined by multiplying the probability of Event A occurring times the probability of Event B given A.


$P(Event A \cap Event B | A) = P(Event A) \times P(Event B | A)$

The best starting hand you can have in Texas Hold’em is pocket Aces. If you're sitting at a table with three other players, what is the probability of being dealt two Aces?

In [15]:

#your code here

Your hand:
<img src="images/yourhand.png">

Community Cards:
    
<img src="images/community.png">

How can you determine the probability of getting a Flush by the River?