# 🧠 What the Data Says About the American Dream  
**Audience:** Non-technical — policy makers and business leaders  
**Goal:** Reveal what factors help (or hinder) Americans from earning more than $50,000 per year  
**Tone:** Clear, compelling, and insight-driven  

---

## 🎬 1. Setting the Scene

Imagine two Americans. Both are 29 years old.  
One has a high school diploma and works full-time.  
The other has a college degree and a job in a professional field.  

**Who is more likely to earn over $50,000 per year?**  
We might guess the answer — but can we *prove* it?

With a dataset of over **45,000 Americans**, collected by the **U.S. Census**, we explored what really determines whether someone crosses that $50K income line.

This data includes:
- Age, gender, education, race
- Weekly hours worked
- Job types
- Income category (above or below $50K)

Every row in the dataset tells a small part of a bigger story.

---

## 🔍 2. What We Discovered

### ✅ Education Pays — But It’s Not the Whole Story

- **70% of those earning over** \$50K have **some college education or higher**
- However, **some with only a high school diploma** still earn > \$50K — often in **skilled trades** or **entrepreneurship**

**Policy Implication:** Education is vital — but we must also recognize and support alternative pathways to success.

---

### ⏱ Hours Worked Matter — To a Point

- High earners often work **45–60 hours/week**
- But **more hours ≠ higher pay** — many working long hours in **service roles** still earn under $50K

**Insight:** **Job quality** matters more than just time worked.

---

### 🧑🏾‍🤝‍🧑🏻 Race and Gender Gaps Persist — Even with Equal Qualifications

- **Men** are more likely to earn >$50K than women
- **White and Asian** individuals dominate the high-income bracket
- **Black and Hispanic** individuals are overrepresented in lower income brackets

**Call to Action:** Equal education and effort don’t guarantee equal outcomes. There are **structural barriers** we need to address.

---

### 💼 Occupation is Destiny

Your **job title** may be more predictive than your **education level**:

- “**Exec-managerial**” and “**Prof-specialty**” jobs make up the bulk of >\$50K earners
- “**Service**,” “**farming**,” and “**labor**” roles almost always fall below \$50K — regardless of education or hours

**Conclusion:** Without access to **high-paying professions**, the American Dream remains out of reach for many.

---

## 📈 3. What This Means for Policy & Leadership

The data points to a key truth:

> **The path to upward mobility is about more than education or effort. It’s about access and equity.**

If we want more Americans to reach the $50K+ income level, we must:
- Expand **career pipelines** into professional and managerial jobs
- Support **affordable education** *and* skilled trades
- Eliminate **bias and structural barriers** in hiring and pay
- Rethink how we value labor — especially in essential but low-paying roles

---

## 💬 4. Final Thought

This isn’t just income data — it’s a reflection of **opportunity in America**.

> *The American Dream is still alive — but many are locked out by invisible walls.  
It’s not just about how hard you work — it’s about whether the system lets you win.*  


In [2]:
import pandas as pd
import numpy as np

In [5]:
df = pd.read_csv('adult.csv')

In [8]:
df.head(5)

Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,>50K
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,>50K
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,<=50K


In [12]:
df.shape

(48842, 15)

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48842 entries, 0 to 48841
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   age              48842 non-null  int64 
 1   workclass        48842 non-null  object
 2   fnlwgt           48842 non-null  int64 
 3   education        48842 non-null  object
 4   educational-num  48842 non-null  int64 
 5   marital-status   48842 non-null  object
 6   occupation       48842 non-null  object
 7   relationship     48842 non-null  object
 8   race             48842 non-null  object
 9   gender           48842 non-null  object
 10  capital-gain     48842 non-null  int64 
 11  capital-loss     48842 non-null  int64 
 12  hours-per-week   48842 non-null  int64 
 13  native-country   48842 non-null  object
 14  income           48842 non-null  object
dtypes: int64(6), object(9)
memory usage: 5.6+ MB


In [15]:
df['education'].unique()

array(['11th', 'HS-grad', 'Assoc-acdm', 'Some-college', '10th',
       'Prof-school', '7th-8th', 'Bachelors', 'Masters', 'Doctorate',
       '5th-6th', 'Assoc-voc', '9th', '12th', '1st-4th', 'Preschool'],
      dtype=object)

In [18]:
df['income'].unique()

array(['<=50K', '>50K'], dtype=object)

In [21]:
# 1. Count the total number of individuals in the dataset
total_individuals = len(df)

# 2. Filter the DataFrame to include only those with income <=50K
more_than_50k = df[df['income'] == '>50K']

# 3. Count the number of individuals earning >50K
count_more_than_50k = len(more_than_50k)

# 4. Calculate the percentage
percentage_more_than_50k = (count_more_than_50k / total_individuals) * 100

# 5. Print the result
print(f"Percentage of individuals with income >50K: {percentage_more_than_50k:.2f}%")

Percentage of individuals with income >50K: 23.93%


In [25]:
# 1. Group the DataFrame by the 'education' column
grouped_by_education = df.groupby('education')

# 2. Calculate the count of each income category within each education group
income_counts = grouped_by_education['income'].value_counts()

# 3. Calculate the total count of individuals in each education group
total_counts = grouped_by_education['income'].count()

# 4. Calculate the percentage of '<=50K' income for each education group
percentage_more_than_50k_ed = (income_counts.unstack(fill_value=0)['>50K'] / total_counts) * 100

# 5. Print the result
print("Percentage of individuals with income >50K, grouped by education:")
print(percentage_more_than_50k_ed)

Percentage of individuals with income >50K, grouped by education:
education
10th             6.263499
11th             5.077263
12th             7.305936
1st-4th          3.238866
5th-6th          5.304519
7th-8th          6.492147
9th              5.423280
Assoc-acdm      25.796377
Assoc-voc       25.327511
Bachelors       41.283489
Doctorate       72.558923
HS-grad         15.857831
Masters         54.911554
Preschool        1.204819
Prof-school     73.980815
Some-college    18.964883
dtype: float64


In [26]:
# Calculate the number of individuals with income '>50K' for each education level
greater_than_50k_by_education = df[df['income'] == '>50K']['education'].value_counts()

# Calculate the total number of individuals for each education level
total_by_education = df['education'].value_counts()

# Calculate the percentage of individuals with income '>50K' for each education level
percentage_greater_than_50k = (greater_than_50k_by_education / total_individuals) * 100

# Sort the results in descending order
percentage_greater_than_50k_sorted = percentage_greater_than_50k.sort_values(ascending=False)

# Print the result
print(percentage_greater_than_50k_sorted)

Bachelors       6.783097
HS-grad         5.124688
Some-college    4.223824
Masters         2.987183
Prof-school     1.263257
Assoc-voc       1.068752
Doctorate       0.882437
Assoc-acdm      0.845584
11th            0.188362
10th            0.178125
7th-8th         0.126940
12th            0.098276
9th             0.083944
5th-6th         0.055280
1st-4th         0.016379
Preschool       0.002047
Name: education, dtype: float64


In [28]:
# Calculate the number of individuals with income '>50K' for each race
greater_than_50k_by_race = df[df['income'] == '>50K']['race'].value_counts()

# Calculate the total number of individuals for each race
total_by_race = df['race'].value_counts()

# Calculate the percentage of individuals with income '>50K' for each race
percentage_greater_than_50k_race = (greater_than_50k_by_race / total_by_race) * 100

# Sort the results in descending order (optional, but often useful)
percentage_greater_than_50k_race_sorted = percentage_greater_than_50k_race.sort_values(ascending=False)

# Print the result
print(percentage_greater_than_50k_race_sorted)

Asian-Pac-Islander    26.925609
White                 25.398688
Other                 12.315271
Black                 12.081110
Amer-Indian-Eskimo    11.702128
Name: race, dtype: float64


In [29]:
# Calculate the number of individuals with income '>50K' for each gender
greater_than_50k_by_gender = df[df['income'] == '>50K']['gender'].value_counts()

# Calculate the total number of individuals for each gender
total_by_gender = df['gender'].value_counts()

# Calculate the percentage of individuals with income '>50K' for each gender
percentage_greater_than_50k_gender = (greater_than_50k_by_gender / total_by_gender) * 100

# Sort the results in descending order (optional, but often useful)
percentage_greater_than_50k_gender_sorted = percentage_greater_than_50k_gender.sort_values(ascending=False)

# Print the result
print(percentage_greater_than_50k_gender_sorted)

Male      30.376723
Female    10.925148
Name: gender, dtype: float64
