# Conditional Probability

# Contents
- [Percentiles](#Percentiles)
- [The Data](#The-DataFrame)
- [Headers](#Headers)
- [Charting the Data, Percent Below Poverty](#Charting-the-Data)
  - [By Age](#Percent-Below-Poverty)
	  - [< 18](#Percent-Below-<-18yrs-old)
	  - [18-36](#Percent-Below-18-36)
	  - [Percent Below 35 - 64](#Percent-Below-35---64)
	  - [Percent Below 60 +](#Percent-Below-60-+)
  - [By Education](#Education)
      - [Below High School](#Below-High---School)
      - [High School](#High---School-Grads)
      - [Some College And Associates degree](#Some-College-And-Associates-degree)
      - [Bachelor's +](#Bachelor's-degree-+)

**What is it?**  
I can make a statement about the **probability** of a 2nd even **given that a first event occured**

**Notation**
- P      = Probibility-of  
- P(B|A) = Prob of B GIVEN that A has occured, implying a dependency 
- P(A,B) = Probability of A & B occuring both independently of each other 

**Equation**
P(B|A) = _numerator_ ( P( A,B ) ) / _denominator_ P(A)  
P(B|A) = P( A,B ) / P(A)

**in "English""**
The probability of B given a is equal to...
 - The probability of A & B together
 - divided by the probability of A|
 
 
 ## An Example
 **Scenario**
 - A teacher gives students 2 tests
 - 60% of students pass BOTH tests
 - 80% of students pass FIRST test
 
 **Question**
What percentage of students...  
 - who passed the first test
 - also passed the second test
 
**The Solution In Notation**  
A = % passed the 1st test  
B = % passed the 2nd test  
Re-asking the question with notation in mind,  
what is the probability of B (_folks who passed the 2nd test_)  
given A (_folks who passed the first test_)

## Here
A = % below poverty in specifc education range  
B = % below poverty in specific age range  
Re-asking the question with notation in mind,  
what is the probability of B (_folks in poverty in a specific age_)  
given A (_folks within a specific range_)


# The DataFrame
arrays with additional pandas methods attached

In [2]:
%matplotlib inline
import numpy as np
import pandas as pd
from scipy import stats

# save file-data to var, using pandas read_file method
# assign to a DataFrame
df = pd.read_csv("./cleaned.csv")

# output a short-exampe of the data
df.head()
# df

Unnamed: 0,id,Geographic Area Name,Total:,Below poverty level:,Percent below poverty level:,Total:AGE:Under 18 years,Below poverty level:AGE:Under 18 years,Percent below poverty level:AGE:Under 18 years,Total:AGE:Under 18 years!!Under 5 years,Below poverty level:AGE:Under 18 years!!Under 5 years,...,Percent below poverty level:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:200 percent of poverty level,Total:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:300 percent of poverty level,Below poverty level:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:300 percent of poverty level,Percent below poverty level:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:300 percent of poverty level,Total:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:400 percent of poverty level,Below poverty level:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:400 percent of poverty level,Percent below poverty level:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:400 percent of poverty level,Total:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:500 percent of poverty level,Below poverty level:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:500 percent of poverty level,Percent below poverty level:ALL INDIVIDUALS WITH INCOME BELOW THE FOLLOWING POVERTY RATIOS:500 percent of poverty level
0,0400000US02,Alaska,720869,78620,10.9,180258,25327,14.1,51329,8384,...,(X),296900,(X),(X),393946,(X),(X),480098,(X),(X)
1,0400000US23,Maine,1301941,151541,11.6,240662,34878,14.5,61960,9594,...,(X),604470,(X),(X),798827,(X),(X),954763,(X),(X)
2,0400000US37,North Carolina,10100431,1417873,14.0,2257634,455971,20.2,586457,127425,...,(X),5151267,(X),(X),6549114,(X),(X),7602253,(X),(X)
3,0400000US29,Missouri,5943658,786330,13.2,1347491,247209,18.3,362024,72100,...,(X),2916185,(X),(X),3823677,(X),(X),4469204,(X),(X)
4,0400000US42,Pennsylvania,12394000,1517870,12.2,2594554,434736,16.8,687014,121224,...,(X),5450455,(X),(X),7240999,(X),(X),8637022,(X),(X)


# Gathering Totals

In [50]:
# get column-specific data sub-sets
totals_columns = df.filter(like="Total")

#TOTALS

# 322349744
totals = totals_columns['Total:'].sum()

# 43215981
totals_below_poverty = df['Below poverty level:'].sum()
perc_below_poverty = ( totals_below_poverty / totals ) * 100

print(str(totals)+ ' -> total BELOW POVERTY')
print(str(totals_below_poverty) + ' -> total BELOW POVERTY')
print(str(perc_below_poverty) + '% below poverty')

322349744 -> total BELOW POVERTY
43215981 -> total BELOW POVERTY
13.406550432998019% below poverty


# Grouping Age Totals

In [53]:
# Ages
age_cols = df.filter(like="AGE")
age_cols

age_totals = {
  "under18": age_cols['Total:AGE:Under 18 years'].sum(),
  "18-34": age_cols["Total:AGE:18 to 64 years!!18 to 34 years"].sum(),
  "35-64": age_cols["Total:AGE:18 to 64 years!!35 to 64 years"].sum(),
  "65+": age_cols["Total:AGE:65 years and over"].sum()
}
print('---Age Totals---')
age_totals

#{'under18': 72751768, '18-34': 72613209, '35-64': 125194122, '65+': 51790645}

---Age Totals---


{'under18': 72751768, '18-34': 72613209, '35-64': 125194122, '65+': 51790645}

# Grouping Education Totals

In [54]:
#Educations
ed_cols = df.filter(like="EDUCATIONAL")
total_education_columns = totals_columns.filter(like="EDUCATIONAL")
ed_cols

education_totals = {
  "totals": df['Total:EDUCATIONAL ATTAINMENT:Population 25 years and over'].sum(),
  "noHS": total_education_columns['Total:EDUCATIONAL ATTAINMENT:Population 25 years and over!!Less than high school graduate'].sum(),
  "HS": total_education_columns['Total:EDUCATIONAL ATTAINMENT:Population 25 years and over!!High school graduate (includes equivalency)'].sum(),
  "someCollecte": total_education_columns["Total:EDUCATIONAL ATTAINMENT:Population 25 years and over!!Some college, associate's degree"].sum(),
  "bachelors+": total_education_columns["Total:EDUCATIONAL ATTAINMENT:Population 25 years and over!!Bachelor's degree or higher"].sum()
}
#   {'totals': 221861976, 'noHS': 25605115, 'HS': 59111224, 'someCollecte': 64108358, 'bachelors+': 73037279}

print('--Education totals--')
education_totals


--Education totals--


{'totals': 221861976,
 'noHS': 25605115,
 'HS': 59111224,
 'someCollecte': 64108358,
 'bachelors+': 73037279}

# Totals below Poverty

In [53]:
# JUST TOTALS columns

ValueError: Item wrong length 137 instead of 52.

## Starting with Mean Below Poverty

In [20]:
# storing the column as a dataframe
perc_below = df['Percent below poverty level:']

# get the 'avg' / mean of the dataframe
perc_below_avg = perc_below.mean()
perc_below_avg

# ... result is AVG 13.49% are below poverty

13.492307692307698

# Collecting Age-Based Poverty Means

In [35]:
# under 18, 18.05%
perc_below_under_18 = df['Percent below poverty level:AGE:Under 18 years']
perc_below_18_mean = perc_below_under_18.mean()

# under 18, 16.5%
perc_below_18_34 = df['Percent below poverty level:AGE:18 to 64 years!!18 to 34 years']
perc_below_18_34_mean = perc_below_18_34.mean()

#18-35, 10.80%
perc_below_35_64 = df['Percent below poverty level:AGE:18 to 64 years!!35 to 64 years']
perc_below_35_64_mean = perc_below_35_64.mean()

# 60 +, 10.11&
perc_60_plus = df['Percent below poverty level:AGE:60 years and over']
perc_60_plus_mean = perc_60_plus.mean()

# storing res in an obj
age_poverty_means = {
    "Under18": perc_below_18_mean, 
    "18-34": perc_below_18_34_mean, 
    "35-64":perc_below_35_64_mean,
    "60+": perc_60_plus_mean
}
age_poverty_means

{'Under18': 18.050000000000008,
 '18-34': 16.576923076923077,
 '35-64': 10.803846153846154,
 '60+': 10.111538461538462}

# Collecting Education-Level Poverty Means

In [36]:
# No-HS, 25.46%
perc_no_hs = df['Percent below poverty level:EDUCATIONAL ATTAINMENT:Population 25 years and over!!Less than high school graduate']
perc_no_hs_mean = perc_no_hs.mean()

#HS, 14.098%
perc_hs = df['Percent below poverty level:EDUCATIONAL ATTAINMENT:Population 25 years and over!!High school graduate (includes equivalency)']
perc_hs_mean = perc_hs.mean()

# # Some College, 9.95%
perc_some_college = df["Percent below poverty level:EDUCATIONAL ATTAINMENT:Population 25 years and over!!Some college, associate's degree"]
perc_some_college_mean = perc_some_college.mean()

# #Bachelors + , 4.51%
perc_bach = df["Percent below poverty level:EDUCATIONAL ATTAINMENT:Population 25 years and over!!Bachelor's degree or higher"]
perc_bach_mean = perc_bach.mean()

# storing res in an obj
education_poverty_means = {
    "noHS": perc_no_hs_mean, 
    "HS": perc_hs_mean, 
    "someCollege":perc_some_college_mean,
    "bachelorsPlus": perc_bach_mean
}
education_poverty_means

{'noHS': 25.46153846153846,
 'HS': 14.098076923076924,
 'someCollege': 9.953846153846152,
 'bachelorsPlus': 4.507692307692308}

# Calculating Conditional Probabilities

In [38]:
#...not what i was going for here...