# Conditional Probability

## Dataset

The study titled "Adolescents Understanding of Social Class", is a study examining teens' beliefs about social class. 
- Sample: 48 working class and 50 upper middle class 16 year olds
- "objective" assignment to social class based on self reported measures of parents' occupation, education and household income
- "subjective" association based on survey questions

**Contingency Table:**

|                |                    | Objective<br>Working class | Objective<br>Upper middle class | Total |
| :------------: | :----------------- | -------------------------: | ------------------------------: | ----: | 
|                | poor               |                          0 |                               0 |     0 |
|                | working class      |                          8 |                               0 |     8 |
| **subjective** | middle class       |                         32 |                              13 |    45 |
|                | upper middle class |                          8 |                              37 |    45 |
|                | upper class        |                          0 |                               0 |     0 |
|                | Total              |                         48 |                              50 |    98 |

For simplicity I transform words in acronyms:  
- For Subjective:
    - Poor: SP
    - Working class: SWC
    - Middle class: SMC
    - Upper middle class: SUMC
    - Upper class: SUP
- For Objective:
    - Working class: OWC
    - Upper middle class: OUMC

### Marginal Probabilities

*What is the probability that a student's __objective__ social class position is upper middle class (OUMC)?*  

$P(\text{OUMC}) = \frac{50}{98} = 0.5102$  

Note that the term **marginal probability** comes from the fact that the counts we use to calculate this probability came from the **margins of the contingency table**.

### Joint Probability

*What is the probability that a student's **objective position and subjective identity** are both upper middle class?*  

$P(\text{OUMC & SUMC}) = \frac{37}{98} = 0.3776$

The term **joint probability** comes from the fact that we're considering the students who are at the **intersection of the two events** of interest.

### Conditional Probability

*What is the probability that a student who is objectively in the working class associates with upper middle class?*  

$P(\text{SUMC | OWC}) = \frac{8}{48} = 0.1667$

We call this a conditional probability because we first conditioned on the working class and then calculated the probability based on counts only in this column. 

### Bayes' Theorem

We calculate conditional probabilities using Baye's Theorem, which states that probability of `A` given `B` is probability of `A` and `B` divided by probability of `B`. So that's the joint probability on the numerator divided by what you're conditioning on in the denominator.

$$P(A\ |\ B) = \frac{P(A\ \text{&}\ B)}{P(B)}$$

Using the previous question, the probability of subjective upper middle class (SUMC) given objective working class (OWC) is going to be equal to the joint probability of subjective upper middle class and objective working class `P(SUMC & OWC)`, divided by probability of objective working class `P(OWC)`, what we're conditioning on.

$P(\text{SUMC | OWC}) = \frac{P(\text{SUMC & OWC})}{P(OWC)}$<br>
$P(\text{SUMC | OWC}) = \frac{8/98}{48/98} = 0.16667$<br>

**Practice:**

*The American Community Survey is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American community survey estimates that 14.6% of Americans live below the poverty line. 20.7% speak a language other than English at home, and 4.2% fall into both categories. Based on this information, what percent of Americans live below the poverty line given that they speak a language other than English at home?*

$P(\text{below poverty | speak other language}) = \frac{P(\text{below poverty & speak other language})}{P(\text{speak other language})}$<br>
$P(\text{below poverty | speak other language}) = \frac{0.042}{0.207} = 0.2028$<br>

One use of this information would be to compare to the general public. Remember, we also know that 14.6% of all Americans live below the poverty line. So it seems like living below the poverty line is more prevalent for people who speak a language other than English at home. We're the comparing the 14.6% for the general public to the 20% that we arrived at, for the part of the public that speaks a language other than English at home. This finding suggests that language spoken at home, and poverty level may be dependent.  

**General Rule**

Since Bayes' Theorem does not have an independent condition, we can actually simply rearrange it and calculate the joint probability of `A` and `B` as a product of the conditional probability of `A` given `B` `P(A | B)`, multiplied by the marginal probability of `B` `P(B)`. So all we've done is taken the Bayes' Theorem, shuffled things around, and come up with a new rule for calculating joint probabilities.

$$P(A\ |\ B) = \frac{P(A\ \text{&}\ B)}{P(B)}\ \ \ \rightarrow\ \ \ P(A\ \text{&}\ B) = P(A\ |\ B) \times P(B)$$

Generically, if `P(A | B) = P(A)`, then the events `A` and `B` are said to be independent. We can explain this in two ways.  
- Conceptually: `B` tells us nothing about `A`, then `A` and `B` are independent, meaning that, whether we have the probability with `B` given, or not, the probabilities are exactly the same.
- Mathematically: if events `A` and `B` are independent, then `P(A and B) = P(A) x P(B)`. Then,

$$P(A\ |\ B) = \frac{P(A\ \text{&}\ B)}{P(B)} = \frac{P(A) \times P(B)}{P(B)} = P(A)$$


### Example

Consider the following hypothetical distribution of gender and major of students in that introductory class. We have 100 students in this class. 60 of them are social science majors, and 40 of them are not.

|            |        | Major<br>Social Science | Major<br>Non-Social Science | Total |
| :--------- | :----- | ----------------------: | --------------------------: | ----: |
|            | female |                      30 |                          20 |    50 |
| **gender** | male   |                      30 |                          20 |    50 |
|            | Total  |                      60 |                          40 |   100 |

If I wanted to find the overall probability of social science majors in this class, that would be 60 out of 100, so the probability that a randomly-selected student is a social science major is 0.6.

$P(SS) = \frac{60}{100} = 0.6$<br>

Now let's condition on the gender. What is the probability that a randomly-selected female in this student is a social science major?

$P(SS\ |\ F) = \frac{30}{50} = 0.6$<br>

What about the males? 50 males in the class, 30 of which are social science majors. So once again, probability of social science given male is 30 out of 50, 60% as well.

$P(SS\ |\ M) = \frac{30}{50} = 0.6$<br>

So what we're seeing here is that all of these probabilities are exactly the same. So this goes back to `P(A | B)`. If that equals `P(A)`, then we know that the events are independent. In this case, `P(SS) = P(SS | F)` or `P(SS) = P(SS | M)`. So we would determine that the two variables, gender and major are independent of each other, given this hypothetical distribution. 

### Questions

Consider for the next questions the following table:

<img src="images/exerc_2.1.png" align="center" width="700"/>

1) What is the probability that a student's subjective social class identity is upper middle class?  
&#9744; $\frac{8}{48} \approx 0.17$  
&#9745; $\frac{45}{98} \approx 0.46$  
&#9744; $\frac{8}{45} \approx 0.18$  
&#9744; $\frac{37}{50} \approx 0.74$  
&#9744; $\frac{37}{45} \approx 0.82$

*Solving:*

$P(\text{SUMC}) = \frac{45}{98} = 0.4591$

2) What is the probability that a student's objective and subjective class is working class?  
&#9744; $\frac{48}{98} \approx 0.49$  
&#9744; $\frac{8}{48} \approx 0.17$  
&#9745; $\frac{8}{98} \approx 0.08$  
&#9744; $\frac{8}{8} \approx 1$  

*Solving:*

$P(\text{SWC & OWC}) = \frac{8}{98} = 0.8163$

3) If a student's objective class position is upper middle class, what is the probability that they associate with middle class?  
&#9744; $\frac{32}{48} \approx 0.67$  
&#9744; $\frac{13}{45} \approx 0.29$  
&#9744; $\frac{13}{98} \approx 0.13$  
&#9745; $\frac{13}{50} \approx 0.26$  

*Solving:*

$P(\text{SMC | OUMC}) = \frac{13}{50} = 0.26$

4) Same data: "The American Community Survey is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American Community Survey estimates that 14.6% of Americans live below the poverty line, 20.7% speak a language other than English at home, and 4.2% fall into both categories.". Based on this information, what percent of Americans who live below the poverty line also speak a language other than English at home?  
&#9744; $\frac{0.207}{0.146} \approx 1.42$  
&#9744; $\frac{0.042}{0.207} \approx 0.2$  
&#9744; $\frac{(0.146 * 0.207)}{0.146} \approx 0.207$  
&#9744; $\frac{0.146}{0.207} \approx 0.71$  
&#9745; $\frac{0.042}{0.146} \approx 0.29$  

*Solving:*

$P(\text{speak other languages | below poverty}) = \frac{P(\text{speak other languages & below poverty})}{P(\text{below poverty})}$<br>
$P(\text{speak other languages | below poverty}) = \frac{0.042}{0.146} = 0.2877$<br>

---
## Probability Trees