# 👩🏻‍💻 freeCodeCamp: Demographic Data Analyzer

In this challenge you must analyze demographic data using Pandas. You are given a dataset of demographic data that was extracted from the 1994 Census database. Here is a sample of what the data looks like:

You must use Pandas to answer the following questions:

1. How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
2. What is the average age of men?
3. What is the percentage of people who have a Bachelor's degree?
4. What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
5. What percentage of people without advanced education make more than 50K?
6. What is the minimum number of hours a person works per week?
7. What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
8. What country has the highest percentage of people that earn >50K and what is that percentage?
9. Identify the most popular occupation for those who earn >50K in India.

Link: [https://www.freecodecamp.org/learn/data-analysis-with-python/data-analysis-with-python-projects/demographic-data-analyzer](https://www.freecodecamp.org/learn/data-analysis-with-python/data-analysis-with-python-projects/demographic-data-analyzer)

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv("/Users/katiehuang/Documents/Data Science/Projects/Demographic Data Analyzer/adult_data.csv")

df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


## 1. How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)



In [3]:
race_count = df["race"].value_counts()
race_count

White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64

## 2. What is the average age of men?

In [4]:
avg_age = round(df[df["sex"] == "Male"]["age"].mean(),1)

print(f"The average age of men is {avg_age} years old.")

The average age of men is 39.4 years old.


## 3. What is the percentage of people who have a Bachelor's degree?



In [18]:
bachelors = df[df.education == 'Bachelors']

bachelors_pct = round(len(bachelors) / len(df),2)
bachelors_pct

print(f"{bachelors_pct * 100}% of people has a Bachelor's degree.")

16.0% of people has a Bachelor's degree.


## 4. What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

In [79]:
# Find dfs with Bachelors, Masters, or Doctorate education
higher_education = df[df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]

# Find dfs with Bachelors, Masters, or Doctorate education earning > 50k
higher_education_pct = round(len(higher_education[higher_education['salary'] == '>50K']) / len(higher_education),1)

print(f"{adv_education_pct * 100}% of people with advanced education (Bachelors, Masters, or Doctorate) makes more than $50K.")

11.0% of people with advanced education (Bachelors, Masters, or Doctorate) makes more than $50K.


## 5. What percentage of people without advanced education make more than 50K?



In [78]:
# Create a df without Bachelors, Masters, or Doctorate education
lower_education = df[~df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]

# Filter in lower_education earning > 50k then, divide by no. of rows in lower_education
lower_education_pct = round(len(lower_education[lower_education['salary'] == '>50K']) / len(lower_education),1)

print(f"{no_adv_education_pct * 100}% of people without advanced education makes more than $50K.")

13.0% of people without advanced education makes more than $50K.


## 6. What is the minimum number of hours a person works per week?



In [8]:
# 'hours-per-week' represents number of hours worked in a week. Find the minimum hours in the field.
min_hours_work = df["hours-per-week"].min()

print(f"The minimum number of hours a person works per week is {min_hours_work} hour(s).")

The minimum number of hours a person works per week is 1 hour(s).


## 7. What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?



In [82]:
min_hours = df[df["hours-per-week"] == min_hours_work]

min_hours_pct = round(len(min_hours[min_hours["salary"] == ">50K"]) / len(df),4)

print(f"{min_hours_pct * 100}% of people who work the minimum number of hours per week have a salary of more than 50K.")

0.01% of people who work the minimum number of hours per week have a salary of more than 50K.


## 8. What country has the highest percentage of people that earn >50K and what is that percentage?

In [84]:
# Find the number of countries
country_count = len(df)

# Find the number of countries where people earn > $50k
rich_country_count = df[df['salary'] == '>50K']['native-country'].value_counts()

highest_earning_country = (rich_country_count / country_count * 100).idxmax()
highest_earning_country_percentage = round((rich_country_count / country_count * 100).max(),1)

# .idxmax() returns the index for the maximum value in each column
# Reference: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.idxmax.html

print(f"{highest_earning_country} has the highest percentage of people that earn more than $50K at {highest_earning_country_percentage}%.")

United-States has the highest percentage of people that earn more than $50K at 22.0%.


## 9. Identify the most popular occupation for those who earn >50K in India.

In [103]:
# Create df with India only
india = df[df['native-country'] == 'India']

# Find number of people in India earning > 50k grouped by occupation and find the max value
india_popular_occupation = india[india['salary'] == '>50K']['occupation'].value_counts().idxmax()

print(f"{india_popular_occupation} is the most popular occupation for those who earn >$50K in India.")

Prof-specialty is the most popular occupation for those who earn >$50K in India.
